Gmail API advice

So after much wonderful advice from @patrick, I have managed to get this task further than i imagined, and right now it is already pretty decent, however I am having some strange errors that I am hoping I may find a solution to.

The system I am building needs to query 10 emails all the way up to a million emails if the user desires, i know that in itself sounds like a terrible idea, however for what I am doing it does need it, once it’s done, i think everyone will kind of understand why.

So the google API allows me to get a LIST of all messages up to a limit of 500 in a single query and all it returns as data is something like {“id” : “1742d1dea0ef”} which is like some random ASCII string or something, as well as a single pageToken, which is also like some strange ASCII string, that is non sequential, and by adding that as a parameter, it gets the next 500 message IDs, but at least I can query more than 1 at a time and its pretty fast to just get that.

To get the additional message data, I need to call a different google API, which takes a query parameter of message ID, and as much as I have looked, it is 1 at a time, even though I am not getting the full data back and only a smaller portion of the message headers.

The first time i tried i passed the 500 message IDs to a repeat, which then ran the message detail query 500 times, the issue here is that it hits an error pretty often and just stops, not even a real error, Google just returns something like, the command had an error, they even have the cheek to say “And thats all we know” try again in 30 seconds.
I tried 50 at a time and that seems to run pretty well (most of the time), and I know it is not quota limit, because I am about a billion short of hitting that.

When i tried to add paging into that solution it got a little messy and really wasnt working well. So I tried one at a time and ran the repeat with the message list step, with 1 message ID returned at a time and then passed to the second api for message detail, each iteration takes the pageToken and moves to the next page. This was working alright with initial tests up to about 4000 messages, which is when i stopped it running so I could do something with the returned data.

I want to run a database insert on each iteration on my own database, and I am only inserting and fetching about 5 columns of data per message, so although it could be 30k rows its only header data, no attachments, no message body etc. each column is under 150 chars.

It works, but is unreliable, i get 600 messages sometimes, other times 2700 messages and then it errors with a 504 gateway timeout, its just so random though, after reading up on the error, I think it is something to do with 2 servers, google reading the API call, then another inserting a row at a time of data, so it becomes a little unstable. Has anyone got any other ideas of how i could achieve this.

End result i need all email results from a Google account, only 5 pieces of header data all stored in my database, dont know if using server side for the entire thing, I can wait for 10 api calls, put it in some temp storage and then write 10 rows to the database at a time, to try reduce the reliance on 2 servers or something.
Google says they are good for API calls up to 100 per second per user, and I am only at 1.7 per second, so I assume its not their side but my server.

Anyway, any bright ideas are welcomed, sorry for the long story.

Hey Paul,

Is it possible you are hitting a resource limit like max execution time or memory allocation?

–Ken

I dont think so, because the results are always so far apart, like sometimes i hit an error after 10 seconds and 30 records, then i literally refresh the page right away and suddenly it runs for 4 minutes and gives 1600 results, the longest run i have had was 5200 and the shortest was 30 something.
It is just so strange.

Maybe some hint will show up here:

https://console.cloud.google.com/apis/dashboard

1 Like

I was looking at that earlier, which i also found quite odd in itself



So, this is telling me everything is great, which clearly it isn’t.

These are Googles Quota limits on my free account
Screenshot 2020-03-31 at 14.37.26

I just realised I am going to have to rethink my repeat anyway, for some odd reason Google thought it would be cool if the data was handed back to the user in different orders within their array, so blah[0].value could be anything from the ID to the From or the subject, just depending on how Google felt that hour.
Hmmm, i need a chalk board to follow this procedure.

BTW, i know the repeat step executes, so it would be better to get the data directly from the repeat, but to repeat again inside the repeat seems silly, im sure some other server component also executed, can anyone recall?

Here are my current steps, which i thought was working perfectly besides the timeouts

I also tried a while loop a little earlier, but could not seem to access the properties inside it.
And the $parent.$parent.apiGetProfile_forDB.data.emailAddress also seemed to stop working, with it only works with a single parent so really trying to avoid nesting repeats in repeats.

I am not sure who may be interested or who is following along with this process, but I have managed to figure out what is causing the issue, and I have made a work around for it, basically even though my script is running, Google Chrome is looking for something called TTFB (Time To First Byte)
This runs for exactly 10 minutes and no longer, then errors with a 504 Gateway Timeout.

I have setup the script to auto resume on error until it is really completed it’s task, however I am unsure if there is a better way, or a way to get rid of the error.

The other issue is that the refresh token only lasts 1 hour, so if the user has has 50k emails they are going to be logging in over and over again through the process, because I can not find a way to seamlessly rerun the script without their interaction through the Google Login procedure, even though once they return they can resume from where they were. It is just a little clumsy.

I would rather ask advice before fiddling, but I assume if the page has a progress indicator or something like that updating the whole time then it will be getting bytes and realise the page is not stalled.

Anyone got any experience with things like this where a script could take an hour to run, because it is fetching 16k email message headers?

When I was building an email client in Bubble I limited the initial email import to 30-60 days, depending on the plan the user was signing up for. Gmail is much faster at importing than Microsoft. I would suggest trying to run the imports in batches to limit strain compute resources.

I’m not entirely sure what you’re attempting to build, but the user should only need to login once. Your system should handle the oAuth refresh to get a new access token automatically.

For long running processes, that typically happen on the server, I find it better to let the user know that you will email them once it’s complete and allow them to use other parts of your app while the system is working on it.

All that said, unless you’ve spent the time reviewing the Google API User Data policy’s Gmail Restricted Scopes section and are willing to pay the $15-$75k per year for security auditing, you might want to rethink your app and design it as a desktop/mobile app that stores all gmail data on the user’s local machine.

Because I am not building a conventional email client application, I am really hoping that the scope restrictions and licensing are not going to really apply to me for a while at least, but at some point I am quite sure I will be running into that brick wall. I hope not but suppose I will have to see. There is just so much documentation to read that I am sure I have missed many parts.

Yeah, they should really have a flow chart for it. If you are using the data in anyway the restrictions will apply to you. There are only a few exceptions which I have at the end of this post.

Basically it boils down to this (I’m assuming you are requesting access to Restricted Scopes).

  1. Are you requesting a scope that will read, create, or modify message bodies (including attachments), metadata, or headers ; or Control mailbox access, email forwarding, or admin settings. (otherwise known as Restricted Scopes)

If yes, then go to step 2

  1. Are you building one of the application types below? No other types are allowed access to Restricted Scopes.

If yes to step 2, proceed to step 3

  1. Confirm you will adhere to these guidelines

If you confirm to guidelines in step 3, proceed to step 4

  1. If you’re using a server to store, or process, any of the User data then you must adhere to and pass an annual security audit. The only exemption is for local client applications and you must be able to demonstrate that no data passes through any third-party servers from Google to local client.

You can read more about the verification process for Restricted Scope apps here. It’s a pretty lengthy process which includes providing demonstration videos of your applications, giving Google access to your application with steps on how to sign up and test the application. If all of that passes and you’re storing data on a server, then they provide you a list of third-party security companies to perform the security assessment.

All that said, there are very few exceptions outlined here that could allow you to use your app in limited capacity.

1 Like

Lol, yeah, i read all that, but you certainly gave it in a better more easy to understand way, basically Google are being quite painful, but i do understand, email is private, and should remain secure.

I am only using gmail.readonly at this stage, but even that will have to go through all this, so I will have to see as I go if I can find ways around some of these things.
Where there is a will there is a way, hahahaha.

1 Like