The o1 series of models published by OpenAI last week is very impressive, especially in its reasoning ability. As we can see from their website: Simil

Recreating o1 at Home with Role-Play LLMs

submited by
Style Pass
2024-09-21 13:00:07

The o1 series of models published by OpenAI last week is very impressive, especially in its reasoning ability. As we can see from their website:

Similar to how a human may think for a long time before responding to a difficult question, o1 uses a chain of thought when attempting to solve a problem. Through reinforcement learning, o1 learns to hone its chain of thought and refine the strategies it uses.

According to OpenAI, they used Reinforcement Learning to make o1 ponder longer before giving an answer. This makes much sense that we may wonder: can we do the same on open-sourced LLMs?

Unfortunately, OpenAI deliberately stops anyone from obtaining the details of o1's chain-of-thought (CoT). RL or any kind of fine-tuning requires these texts as training data. However, based on limited clues, we can still get some insight into how o1 works to some extent or how to replicate its ability.

In fact, it's even possible to create our own version of o1 by using techniques like in-context learning, prompting, and roleplaying. The following figure shows that by instructing the model to think extra hard like o1, we can further advance reasoning ability even on SoTA LLMs like sonnet 3.5.

Leave a Comment