Our post from last week about throwing more AI at your problems turned out to (unexpectedly!) be our second most popular blog post of all time with o

Compound AI, test-time compute, and wasting your users’ time

submited by

Style Pass

2024-10-31 19:00:15

Our post from last week about throwing more AI at your problems turned out to (unexpectedly!) be our second most popular blog post of all time with over 10k views. It generated more strong opinions than we expected — if you’re looking for a laugh, we recommend reading the HackerNews comments (though we wouldn’t recommend trying to make too much sense of them 😉).

Central to that what we discussed was the idea of compound AI systems, which has become increasingly popular in the last few months. We measure the popularity of an idea by the most reliable metric, of course: how frequently it shows up in the headline on startups’ website. By that measure, compound AI has positively taken the world by storm of late.

Composing LLMs with business logic is the natural maturation path of AI applications — it’s critical to AI systems’ ability to perform job functions. The productivity of these applications is also bolstered by the fact that LLMs are increasingly getting better at reasoning over complex problems.

The reasoning capabilities have come as a result of significant investment in “test-time-compute” has increased. Models like OpenAI’s o1-preview have made it so that you can get significantly better answers by waiting longer than we’ve come to expect from LLMs (though not always better!). In our experience at RunLLM, we’ve found that a pipeline customized to o1 can significantly improve reasoning for hard problems but adds up to 10 seconds of latency. Some customers are more than willing to sacrifice a few seconds for higher-quality answers — others are not interested in this tradeoff at all.