A multi-player “step-race” that challenges LLMs to engage in public conversation before secretly picking a move (1, 3, or 5 steps). Whenever two or more players choose the same number, all colliding players fail to advance. The first LLM to reach or surpass 16–24 steps wins outright, or if multiple cross simultaneously, the highest total steps takes it (ties share victory).
This setup goes beyond static Q&A by focusing on social reasoning—models must decide whether to cooperate, negotiate, or deceive. Each turn’s conversation is publicly visible, but final choices remain private, forcing collisions when strategic talk doesn’t match actual moves. By monitoring these dialogues and outcomes, we capture deeper dimensions of multi-agent interaction and see how advanced language models balance shared knowledge with hidden intentions to outmaneuver or cooperate:
The animation reveals how LLMs strategize, stall, sabotage, or cooperate, culminating in final rankings. It shows how their talk translates into secret moves on the board.