In my earlier blog posts about how Python code translates into CPU instructions ((can be read here:  part 1,  part 2), we explored the fundamental con

Teaching a machine to read, how LLM's comprehend text

submited by
Style Pass
2025-01-20 20:30:11

In my earlier blog posts about how Python code translates into CPU instructions ((can be read here: part 1, part 2), we explored the fundamental concept that computers don’t understand human ideas like text or language. Instead, everything is broken down and transformed into machine-readable instructions. In this post, we’ll delve into how large language models (LLMs) like ChatGPT and Claude process your text-based questions and turn them into something they can interpret and respond to.

You might ask, “Why do I need to know this?” After all, most people simply interact with LLMs via a chat interface, and that works just fine. This knowledge might not make you a better prompt engineer, but it will give you insight into how we, as data engineers, can engage with unstructured data in entirely new ways.

Traditionally, working with text meant parsing it, extracting relevant details, and storing them in structured formats like rows and columns. Searching for information required keyword-based queries. This made working with large amounts of unstructured data either less useful or prohibitively expensive to process. Now, thanks to advancements in text comprehension driven by LLMs, we can directly leverage these models for tasks that were once complex and manual. While this post focuses on text, these techniques are also applicable to other data types like images and audio.

Leave a Comment