Google Cloud Vision - Extract text from PDF files

I’m trying to use “Extract text from PDF files” with Google Cloud Vision, and I have it working, but it doesn’t seem to scan the entire PDF (all pages). I have it set per documentation for all pages with a “*” symbol, but it only seems to bring in the first page of the document. I’ve tried playing with the Model (Document vs. Text) and the input for pages.

I can get an extraction if I specify 1-4 pages for a 4-page PDF, but the problem is, I don’t know the number of pages in the PDF when someone uploads it.

Basically I know it’s working, but the “*” all isn’t working. Anyone else have issues with this?

What happens if you leave the field blank?

Sorry I tried that too, no dice.

I haven’t tried this one, but it does sound like a bug. I’d recommend submitting a Support Ticket with Glide. You can do this from the App builder via Settings → Support.

Awesome will do! Thanks Darren!

I’m experiencing the same issue. The default settings of the “Extract text from PDF files” with Google Cloud Vision only returns the first page of the PDF. Same outcome if you try * or 1-*
If you try the workaround of specifying an artificially high number of pages as upper limit. E.g. 1-100 Glide will return an error. Seems like a bug.

I submitted a ticket a few weeks ago about this and they just got back to me last night. So they’re “using Google Vision’s Small Batch File annotation service, which is limited to 5 pages. Additionally, each row of a Glide table is limited to 1MB of data.” They said as of now, they don’t have plans to fix this limit but they might consider it for future features.


