Referencing 100k+ rows lists

I’m building a quoting system that adds up the total amount of government grant depending on the quoted items.

Grants amounts depend on the model numbers used in different combinations.

I need the system to lookup a few pdf docs available on government websites and compare the quotes data to match the exact grants amounts.

Those pdf docs have up to 100k+ rows of data.

I tried:

  • Converting docs to text and extracting the info with AI
    Result: String too long

  • Looking up doc link with AI
    Result: Request too big

I’m looking for creative solutions. I tried AI with shorter sample of the same docs and it works great but I can’t import 100k+ rows in my Glide app just for lookup purposes.

Any ideas?

How do you get these files?

You meant pages?

You can try to use an external workflow to split the PDFs into multiple chunks, then use these chunks in the AI action/column.

I think it might be possible with Make, but I am not familiar with Make. I used Make maybe twice in my life :joy:

I get the links for each list here: Archive des listes des thermopompes efficaces admissibles | Hydro-Québec

The first one is 3092 pages and ~40 rows per page.

Is there any online table version of these datas?

Not for now…

Did you try with google vision ?

I just tried it, still returns an error

I think that the best solution would be to split the PDFs…

2 Likes

So splitting the PDF and use regex to find the relevant info, right?

Or just splitting the PDFs into chunks and use them all.

I could split the pdf using make but never used regex. I’ll look into that. Thanks for the info!

1 Like