Hi, I wanted to ask for some advice.
I have a 50-page PDF document containing claim files. Glide’s document-to-text extraction is able to extract all the text, but due to column drift issues caused by OCR, some of the text gets jumbled up and spills into the next claim item, resulting in confusion. Some of the Yes/No values also get mixed up.
The behaviour is not consistent because it depends on the structure and formatting of each PDF document.
I also tried setting up a Google Cloud Vision integration, but found that it only extracts 5 pages, which is insufficient for my needs.
Does anyone know of any integrations or workarounds that can resolve this issue?