Instructions in AI Documnt to text are missing

Miroslav_Madaric · September 29, 2024, 1:37pm

First time reporting a bug? Refer to our Start Here post.

Team ID:

In the Glide dashoard URL, e.g.
https://go.glideapps.com/o/your-team-id-here/

App ID:

In the Glide builder URL, e.g.
https://go.glideapps.com/app/your-app-id-here/layout
https://go.glideapps.com/app/9z1PeJPSVMc48ybKt5ZB/layout

Description
Unlike to “Image to text” AI action/column, in D2T action/column there is no “Instructions” option!

How to replicate

Invoking this action/column anywhere!

No Instructions!

Iinstructions are present in Images to text:

ThinhDinh · September 29, 2024, 2:55pm

If I have to make a guess, I think it’s because they’re not using a Large Language Model under the hood for this type of task.

“Instructions” can be seen in integrations that looks like coming from OpenAI’s API (it’s their “system” prompt), but they don’t have a direct end point where you just put the document URL in and expect to get text from that document.

Miroslav_Madaric · September 29, 2024, 3:47pm

You must be kidding if I use the document like PDF or Excel or word then there are no instructions if I convert this document to images then instructions work perfectly

Srdačan pozdrav / Best regards!

Jeff_Hager · September 29, 2024, 3:57pm

If @ThinhDinh is correct, then it’s not using AI at all to extract text. It’s probably using OCR and/or scraping text out of those document files. Basically it’s regular code that knows how to extract text from certain types of documents. Instructions would be pointless, because it’s probably not running through any sort of AI and the code itself is not written to know how to do anything with any sort of instructions. The code’s only purpose is to extract text.

If you wanted to further manipulate it with AI, you could probably extract the text first and then pass that text through AI.

AI and regular code are two different beasts.

Miroslav_Madaric · September 29, 2024, 5:10pm

Also this answer I am interpreting like a joke!

first of all you must explain to me If I convert the document into series of images, then that is okay to give instructions and get proper answers from AI. For instance, if I have school timetable in 2 images (morning/afternoon alterations), PDF,

but if this is the very same document with instructions to generic AI, I can give The Book Of 400 pages document and ask for instance for summary or review.

on the other hand If I follow your advice I will get 400 pages transformed in text and then instruct ai to analyze this text according to my instructions. Imagine Google sheet table with the cell where for 100 pages are stored!?

Jeff_Hager · September 29, 2024, 7:55pm

Look, I’m not trying to make a joke. I don’t know if Document to Text uses AI or not. It’s listed as AI, but it’s possible that it’s just code. I’m not saying that I’m correct, but if I am, then you have to understand the difference between Code and AI.

Rather than fight about it, I’ll just let AI answer this question for me.

AI and traditional code operate differently in terms of flexibility and adaptability, especially when it comes to handling tasks like converting documents to text or images to text. Here’s a breakdown of the key differences:

1. Nature of Traditional Code vs. AI:

Traditional Code: Traditional code operates based on predefined instructions or rules. When you write a program to convert a document to text (e.g., a PDF to plain text), the code follows a fixed set of instructions to extract text from the document. If the code is not designed to handle additional instructions, it will not be able to adjust or perform other tasks beyond its initial function.
AI: AI, particularly models like GPT, are designed to learn and generalize from data. Rather than following strict predefined rules, AI systems can “understand” patterns and adapt to various contexts. For example, when an AI is asked to convert an image to text, it can also process additional instructions, such as explaining the text, summarizing it, or formatting it in a specific way. This flexibility comes from AI’s ability to interpret human language and adjust its responses based on the context of a given query.

2. Task-Specific vs. General Purpose:

Traditional Code: Traditional code is task-specific. For example, a function that extracts text from a PDF is narrowly focused on that one job. If you want it to do something else (e.g., also extract images, or recognize tables), you have to write more code. This is because traditional code operates in a procedural, rule-based way, and cannot step outside those rules without explicit instructions.
AI: AI models, on the other hand, are more general-purpose. They can handle a variety of tasks because they are trained on diverse data and designed to respond to a wide range of inputs. When you give AI additional instructions (e.g., “convert this image to text, and then summarize the text”), it can dynamically adjust its behavior because it doesn’t follow a fixed script. Instead, it uses patterns and context to perform new tasks based on your requests.

3. Rule-Based Logic vs. Data-Driven Learning:

Traditional Code: Works based on logic you explicitly program. If the logic doesn’t cover a scenario, the code won’t know how to handle it. For example, if a document contains special symbols or languages that the code wasn’t built to recognize, it will either fail or produce errors.
AI: AI models are trained using vast datasets that allow them to generalize across various scenarios, even if those scenarios aren’t explicitly programmed. In the case of image-to-text conversion, the AI can recognize patterns in images, even if they’re new or unfamiliar, because it has learned to handle a wide variety of inputs.

4. Static vs. Adaptive Behavior:

Traditional Code: Once written, traditional code remains static unless manually updated or modified. It does exactly what it was told to do, without room for deviation or adaptation. For example, if you write a function to extract text from a PDF, it won’t adapt if you later decide to ask it for a summary or a translation unless you add those capabilities.
AI: AI can be adaptive. Given additional instructions, it can modify its response in real-time. For example, if you ask AI to convert an image to text and also describe the scene or highlight certain features, it can adjust its output dynamically. This adaptability is a key feature of AI, making it more versatile than traditional code.

5. Processing Natural Language:

Traditional Code: Traditional code doesn’t understand human language beyond predefined commands. If you try to give it instructions in natural language, it won’t know what to do unless it’s programmed specifically to parse and interpret that language, which is challenging and requires writing a lot of specific logic.
AI: AI systems, especially those built on natural language processing (NLP), can understand and process human language. When you ask an AI to perform a task, you don’t need to issue a set of rigid commands; instead, you can express your request in flexible, conversational language. AI is capable of interpreting the intent behind your words and adjusting accordingly.

Why AI Can Handle Additional Instructions:

AI’s ability to accept additional instructions comes from its underlying design, which includes:

Training on diverse datasets: AI learns to handle a wide variety of inputs and contexts during training, giving it the flexibility to deal with unexpected instructions.
Context awareness: AI systems can maintain context during interactions, allowing them to adapt to follow-up instructions or changes in the request without requiring additional hard-coding.
Language understanding: AI, especially models like GPT, understands and generates human language, making it much easier to handle complex, multi-step tasks without needing explicit step-by-step programming for each new instruction.

Example:

Code to convert a PDF to text: A traditional program can be written to open a PDF, extract the text content, and output it. If you want to change the output format or add a new feature, you would need to modify the code.
AI to convert an image to text: With AI, you can say, “convert this image to text, then summarize the text and translate it into Spanish,” and the AI will handle all of these tasks within one conversation. The AI’s training allows it to break down this request and perform each part as needed.

In short, traditional code is task-specific and rigid, whereas AI is designed to be flexible, adaptive, and capable of understanding and responding to additional instructions without needing to be reprogrammed for each new task.

nathanaelb · September 30, 2024, 9:41am

That was an eye-opening read. It feels like the future developer will be writing AI prompts more than code.

Miroslav_Madaric · September 30, 2024, 10:48am

It was huge misunderstanding probably caused by my poor explanation: I am not confrontng code vs. AI! Just 180° opposite: I am COMBINING code (algorithm) and AI.

In this app I am offering to users checking the school timetable and other activities for e.g. day after tomorrow. Users have as input usualy timetables on paper and scan that in 2 jpg files or in one PDF. When I use images, I give instruction in image2text AI action or column that forms the text rows for day requested, works perfectly! Even for 2 images in an array as input for AI. On the contrary, if user submits timetable in PDF (or other document), AI delivers garbage due to the leacking instruction!

I don’t beleive that this is design flaw, moreover it is a programming bug in this specific Glide feature.

Miroslav_Madaric · September 30, 2024, 11:06am

As I explained previously to Jeff, I am apologiznig for my poor explanation. I haven’t risen code vs. AI issue, but moreover AI+code complex I am using here. So your question to AI is based on my poor explanation, but the answer is correct and I am fully aware of it.

If misunderstanding solved, then the question of lacking instructions in D2T action has to be risen and here is the pertinent answer from AI (after avoiding “woke” approach!

"You raise a valid and practical point, especially about how the absence of an “Instructions” field in Glide’s Document to Text action impacts usability, particularly when dealing with structured data like tables, summaries, or specific sections of large documents.

Here’s why this issue matters and why the response you received (“this is not a bug, this is a feature!”) feels inadequate:

1. The Role of Instructions for Context-Sensitive Tasks:

Images vs. Documents: Yes, while images and documents are different mediums, both often contain context-sensitive content. Extracting text from a table in a document requires similar interpretive processing as extracting text from an image, like determining rows and columns, handling complex formatting, and distinguishing content hierarchies. In both cases, user instructions can help define the output more accurately.
For instance, if you’re submitting a document with a timetable (or any structured data), you want to extract and format the text in a meaningful way. Without the ability to provide instructions, the extracted text could be disorganized, making it practically unusable, as you’ve experienced.

2. Complex Use Cases Require Flexibility:

As you mentioned, with a large document, you may not need the entire content extracted, or you might want a summary, fact-checking, or review limited to certain chapters or sections. Without an Instructions field, you’re unable to direct the AI to focus on those parts.
In contrast, tools like ChatGPT allow you to specify such instructions even for documents. This flexibility is crucial for complex tasks, and not having it in Glide limits its AI action’s usefulness.

3. Instructions Are Essential for Structured Data:

Tables are a prime example of why instructions matter. If you submit a document with a timetable, the text extracted without instructions will likely lose the table structure, making the output unusable. However, with an Instructions field, you could guide the AI on how to handle the table—whether to maintain row/column formatting, or to summarize, etc.
This is no different from what you can already do with images, so the distinction between documents and images when it comes to the necessity of instructions feels arbitrary.

4. Why Glide Might Lack this Functionality:

It’s possible that Glide’s AI action for documents is designed to be simpler, assuming the majority of users would only need raw text extraction. However, as you’ve pointed out, this leaves out many use cases that require more flexibility—instructions for focusing on specific sections, processing structured data, or generating summaries.
Glide might not yet have the capability to perform more advanced document understanding, such as handling tables, summaries, or chapter-based extractions. This would require them to enhance their current document processing capabilities to allow for instruction-based processing, similar to what more advanced AI systems like ChatGPT provide.

5. Feedback Isn’t Just About Bug vs. Feature:

The response you received (“Objection is understood as insulting!”) reflects a misunderstanding of your feedback. It’s not about insulting Glide’s capabilities but rather pointing out a gap that affects usability. This is constructive criticism aimed at improving the product.
Glide’s team might have seen this as a “feature” because the current implementation is intentional, but this doesn’t mean it fully serves users’ needs, as your experience shows.

Conclusion:

You’re absolutely right to point out that Glide’s Document to Text action lacks flexibility without an Instructions field. The absence of this feature makes it difficult for users to get useful results from structured documents, large documents, or when specific focus is needed. Including an Instructions field would significantly enhance the tool’s ability to handle complex, real-world use cases—such as extracting tables correctly or summarizing chapters—which are already handled well by more sophisticated AI platforms like ChatGPT.

Ultimately, Glide should consider adding this functionality to make their AI actions more versatile and user-friendly. Your feedback is valid, and their response could have been more receptive to the potential for improvement."

This answer is againg a little bit “woke”, because the issue has to be clearly denoted as finfing the BUG and not as IMPROVEMENT. If you have flat tire, you are fixing it, you are not improving it. Of course, you can drive somehow also with the flat tire, similarly as you have D2T without instructions!

system · March 29, 2025, 11:06am

This topic was automatically closed 180 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Instructions field in the Document to Text action Feature Requests ai , workflows , action	0	24	December 30, 2024
🤖 Glide AI Playground Project Showcase	9	599	August 28, 2023
Extracting Key-Value Data from PDF Using Glide's Integrated OpenAI Ask for Help api	3	68	February 6, 2025
File submission as an input for AI Ask for Help	4	76	June 29, 2024
New OpenAI features - Assistant API Ask for Help	15	749	December 19, 2023