As Large Language Models (LLMs) have gotten more powerful, we’ve started thinking of them not just as text-in, text-out models, but as “agents” 1 that can take problems, perform actions, and arrive at solutions. Despite the significant advancements in LLM agentic capabilities in the last year ( OpenAI o3, Anthropic Computer Use), it’s still a non-trivial challenge to plug agents effectively into existing institutions and enterprise products.
While LLM-based agents are deceptively capable of low-complexity automations, anyone building real agentic products is likely running into a common set of challenges:
While 90% accuracy might work for something like ChatGPT, that doesn’t cut it for products that aim to approach (or possibly replace) human-level capabilities.
Their efficacy rapidly degrades as you introduce enterprise-specific complexity (e.g., every piece of product-specific context or constraint you prompt the agent with).