Not long ago, we couldn’t reliably ask LLMs to provide a response using a specific format. Building tools that used LLM outputs was painful.
Eventually, first through function calling and then through structured outputs, we could instruct LLMs to respond in specific formats1. So, extracting information from LLM outputs in a reliable way stopped being a problem.
But then I started noticing that structured outputs were not always the silver bullet people think they are. Defining response formats adds a sort of safety net, and people often forget that underneath, they’re still dealing with an LLM. Setting up a Pydantic model for your API calls is not the same as setting up a Pydantic model for your LLM outputs.
Suppose I have a physical, solid, equilateral triangle, and I make two cuts. The two cuts are from two parallel lines, and both cuts pass through the interior of the triangle. How many pieces are there after the cuts? Think step by step, and then put your answer in bold as a single integer (for example, 0). If you don’t know, guess.
Do you think that there will be a difference in performance between ResponseFormatA and ResponseFormatB? If so, which one do you think will perform better?