Need help with google scripts: archiving/calling back data

I wanted to ask for a second opinion on my approach to archiving data to stay under the 25K row limit. I am periodically archiving rows to google sheets outside of my main database using a google script. The script creates a new spredsheet per user, then periodically moves data to the archive sheets. If we didn’t do this, we would reach the capacity in just a month once the app scales to 100+ users. I then call this data back to the main database when a user selects the month-year for the data they want to bring back into the app to view. That call back currently leads to about a 1 minute delay bewteen when they set the month-year and when they see the data in the app. Part of the delay is the google script, which has to search for the archive file in a folder by looping through all files, then searches for the right rows by looping through all the archived data, then builds up an array, then sets the array into the sheet, then deletes rows that were from previous searches. The other part of the delay is that Glide doesn’t update right away when the data is retrieved. Is there any part of this process that I can optimize to get the data to call back faster? Or is there another way to do this that I am just not seeing?

1 Like

Ok, so try this. Continue to use the script to store the data but use vlookup on the sheet that stores the data and use IMPORTRANGE. so when they are looking up data those cells that have the data to be looked up is setup as a imported range to the sheet where the data is stored. That sheet does the lookup and has the resulting information displayed in predetermined cells. Those cells are setup as an IMPORTRANGE on the apps sheet and voila, once the sheets are closed out and google handles the query its almost instantaneous :slight_smile:

1 Like

Sounds promising! Now, would this work for my case where I have a new archive file generated whenever a new user joins so the importrange would need to automatically include new file ID’s somehow. Also can importrange build up one single range from all the imports from all these different storage files?

Another thought: do I need to break up the storage files into one per user, especially if this complicates this call back? I’m only doing this because each user generates 8 rows a day, and the goal is to be able to scale to 500 to 1000 users, or 2.9 million rows in a year.

No, neither of these should be an issue if setup correctly, have you considered the capacity of google sheets? If you are creating that many per year is it going to be necessary to recall all of these entries at all time or can they be removed after a span of time. Do you only need 90 days worth of entries,or 180 days,etc? But given you have that many entries I fear in one sheet that wouldnt be possible. You may need to have a script generate a new sheet every day when it stores data and in your original sheet of that stored data the same script writes the vlookups to add the additional pages. Being as large of a database you are speaking of would take any script or formula quite some time to search through 2 million + data points. I think looking into a script that erases data after a span of time would be in your best interest. Google Sheets maintaining any sort of speed in finding your information is not something you could count on.

So an easy way to lookup multiple results for your data would be to use an arrayformula. Alternatively you can have a column add the count number to the persons name and the lookup would do the same. So it would lookup Joe1, Joe2, Joe3, etc. But either way you go should be fine.

There is probably an opportunity for optimisation here. Instead of performing this iterative search every time, you could build a key-value lookup table with a pointer to each file, and then do a direct lookup on that table.

The console logger has a handy time method, which can be useful for identifying bottlenecks and slow functions.

Also, when performing iterative loops over large datasets, you should always look for opportunities to “short-circuit” and exit the loop as early as possible (but you probably already know that :slight_smile:)

Just a few ideas that might help.

FYI Google sheets has a hard limit of 5 Million cells.

Sheets will likely stop working if you hit this threshold.

Best of luck

1 Like

Very interesting! I’m not sure how to build a key-value lookup. Is this what you are talking about? https://www.youtube.com/watch?v=3dGQ4d7JF1U

Yes, a javascript object.
Imagine you create an object where the key is the username, and the value is the ID of the sheets file where that users data is archived. For example…

var file_lookup = {
  'user1@example.com' : 'xxxyyyzzz',
  'user2@example.com' : 'yyyzzzxxx',
  // etc....
};

With that in place, you no longer need to iterate through all the files to find the correct one, as you can do a direct lookup…

  var user = 'user1@example.com';
  var ss = SpreadsheetApp.openById(file_lookup[user]);

Make sense?

3 Likes

Oh wow, going to give it a try today!

I removed the code looping through files, and replaced it with your method. That really helped cut down the loading time! Its about 10 seconds vs 1 minute :slight_smile:

Here’s code (modified because the archived files are located in another folder):

var ssFolder = DriveApp.getFolderById(“XYZ”);
var ssFiles = ssFolder.getFilesByName(email);
var ssFileID = ssFiles.next().getId();
var ss = SpreadsheetApp.openById(ssFileID);

Also, the file names are the user’s email.

My only issue now, is that half of this delay is Glide taking a while to update. Does that have to do with the Sync settings? I am doing this in a free app with refresh settings on “Only on edit”. Would this be better using “While editing or using the app” in Pro?

2 Likes

The “While editing or using the app” may still have a delay up to a few minutes, because what it does is force glide to check the google sheet for updated data every few minutes. Otherwise “Only on edit” does exactly what it says and will only check for new google sheet updates immediately after data changes via the app. On the other hand, sometimes google will actually send updated data to Glide (which may be the case for you), which I believe will update in the app, but that’s at the discretion of google and can be up to a few minutes at times. “While editing or using the app” may or may not help for speed. It’s more to guarantee that data is updated in the event that you have data that’s somehow updated in the sheet, but google does not sent that updated data to glide. I think what it really comes down to is if the script runs fast enough for glide to catch it at the time that it is pulling updates.

3 Likes