Last week, OpenAI  launched a bunch of features (seems on-brand, I guess?) One of the interesting details was  how the realtime API works. While the w

OpenAI's Realtime API is a step towards outer-loop Agents

submited by
Style Pass
2024-10-07 15:00:07

Last week, OpenAI launched a bunch of features (seems on-brand, I guess?) One of the interesting details was how the realtime API works. While the websockets side is cool, one of the most interesting things is how function calling plays into the picture for agent-to-human communication.

Dynamic “Agents” that execute complex, multi-step tool-calling workflows in response to a human query, usually in a multi-modal chat interface.

For example, human asks “ I want to buy a blender”, agent works through several steps, presents options, and ultimately completes the task and/or answers the question.

For the first case, AI operations are initiated by deterministic software. For the second, they are initiated by human interactions. While these use cases may access information spanning long time periods (e.g. long context from prior conversations), the scope of execution for both is usually short. We’re used to a few seconds to maybe a minute or two of execution time between human interactions.

These are AI applications that are launched once by software or a human but then execute some set of instructions/tasks for minutes, hours, days, weeks or even longer. To be actually useful, these agents will need:

Leave a Comment