This project builds upon the rLLM framework developed by UC Berkeley Sky Lab, extending it with custom environments and infrastructure specifically de

Search code, repositories, users, issues, pull requests...

submited by
Style Pass
2025-07-29 11:30:05

This project builds upon the rLLM framework developed by UC Berkeley Sky Lab, extending it with custom environments and infrastructure specifically designed for terminal-based agent training.

This image shows my training code running at full throttle on 32x H100's, distributed across a 4x bare metal node cluster, training Qwen3-32B. Thank you Hyperbolic for such a streamlined experience! This was fun!

Due to the extreme cost of this level of compute, I was not able to run it forever! So I made sure it worked and also ran the code on less extravagent hardware setups too.

My longest training run was using 2xA100s on a single VM instance, where I trained Qwen3-8B for over 60 steps: Note: I did not expect the 8B to begin learning the complex behaviours required to solve the tasks in the dataset. However it was great to run the training through the dataset and ensure the code is stable.

Terminal bench is a brilliant benchmark created by Stanford and Laude Institute to quantify agents' ability to complete complex tasks in the terminal.

Leave a Comment
Related Posts