Cobbe, Karl, Vineet Kosaraju, Mohammad Bavarian, Mark Chen, Heewoo Jun, Lukasz Kaiser, Matthias Plappert, et al. 2021. “Training Verifiers to Solve

Search code, repositories, users, issues, pull requests...

submited by
Style Pass
2024-10-18 14:00:15

Cobbe, Karl, Vineet Kosaraju, Mohammad Bavarian, Mark Chen, Heewoo Jun, Lukasz Kaiser, Matthias Plappert, et al. 2021. “Training Verifiers to Solve Math Word Problems.” arXiv [Cs.LG]. http://arxiv.org/abs/2110.14168.

Gandhi, Kanishk, Denise Lee, Gabriel Grand, Muxin Liu, Winson Cheng, Archit Sharma, and Noah D Goodman. 2024. “Stream of Search (SoS): Learning to Search in Language.” arXiv [Cs.LG]. http://arxiv.org/abs/2404.03683.

Kazemnejad, Amirhossein, Milad Aghajohari, Eva Portelance, Alessandro Sordoni, Siva Reddy, Aaron Courville, and Nicolas Le Roux. 2024. “VinePPO: Unlocking RL Potential for LLM Reasoning Through Refined Credit Assignment.” arXiv [Cs.LG]. http://arxiv.org/abs/2410.01679.

Kirchner, Jan Hendrik, Yining Chen, Harri Edwards, Jan Leike, Nat McAleese, and Yuri Burda. 2024. “Prover-Verifier Games Improve Legibility of LLM Outputs.” arXiv [Cs.CL]. http://arxiv.org/abs/2407.13692.

Kumar, Aviral, Vincent Zhuang, Rishabh Agarwal, Yi Su, John D Co-Reyes, Avi Singh, Kate Baumli, et al. 2024. “Training Language Models to Self-Correct via Reinforcement Learning.” arXiv [Cs.LG]. http://arxiv.org/abs/2409.12917.

Leave a Comment