A recent approach to improving the reasoning performance of large language models is to not only scale the amount of compute put into training the models, but to increase the amount of useful compute possible at inference time. This new approach is advantageous because not all problems are equally difficult; some can be solved trivially or with memorization, while others require magnitudes more effort. If we can’t scale the amount of compute we are able to put to good use at inference time, then we are stuck with some maximum budget for total flops.
For many problems, namely design problems, this is especially important. Design problems are those problems which can be solved in many (usually an in infinite number of) ways, and for which the quality of solutions is highly multidimensional. Often times with design problems, it may not be clear ahead of time what all the dimensions of quality are (or, they might even be likely to change over time). For these problems, it’s possible that two potential solutions may be very close to one another in design space, yet one of them may meet only some of the quality dimensions in a mediocre fashion yet the other could surpass all of the considered criteria dimensions while being simpler / less costly. This is not the case for many other types of problems; for example: solving closed-form algebra problems is not a design problem, because all solutions work equally well and are equally costly. However, an algorithm for solving closed-form algebra problems IS a design problem. Some examples of design problems include essays, poetry, art, software / programs, user interfaces, clothes. Anything that is used by a human or that can be considered to have an interface is subject to design (and design principles).
Design problems take some minimum bar of compute / intelligence / effort to solve at all, but the nature of them is that the more consideration, effort, experience, or expertise going into the problem, the better the solutions are that can be produced. For non-design problems, solutions (and often the optimality of solutions) are easy to prove. Design problems, on the other hand, require taste. For non-design problems that can be clearly articulated, it may be fine to hire / put someone to work on the problem that is inexperienced, because it can be easy to verify the solution. But putting an inexperienced person on an important design problem is a bad idea because the problem may appear to be solved appropriately, when in fact, major potential benefits of alternative solutions could have been left undiscovered.