This project builds upon the rLLM framework developed by UC Berkeley Sky Lab, extending it with custom environments and infrastructure specifically de

Search code, repositories, users, issues, pull requests...

submited by

Style Pass

2025-07-29 11:30:05

This project builds upon the rLLM framework developed by UC Berkeley Sky Lab, extending it with custom environments and infrastructure specifically designed for terminal-based agent training.

This image shows my training code running at full throttle on 32x H100's, distributed across a 4x bare metal node cluster, training Qwen3-32B. Thank you Hyperbolic for such a streamlined experience! This was fun!

Due to the extreme cost of this level of compute, I was not able to run it forever! So I made sure it worked and also ran the code on less extravagent hardware setups too.

My longest training run was using 2xA100s on a single VM instance, where I trained Qwen3-8B for over 60 steps: Note: I did not expect the 8B to begin learning the complex behaviours required to solve the tasks in the dataset. However it was great to run the training through the dataset and ensure the code is stable.

Terminal bench is a brilliant benchmark created by Stanford and Laude Institute to quantify agents' ability to complete complex tasks in the terminal.

Search code, repositories, users, issues, pull requests...

Leave a Comment

Related Posts

Recent Posts

Two Birds with One Tone: I/Q Signals and Fourier Transform – Part 1

When Dreami's Popularity Crashed Our Servers (In 15 Minutes We Fixed It)

macrobean

from counterculture to cyberculture (ft. fred turner)

Search Shift | AI SEO Tool & Consultancy

YouTube to be included in social media ban for under 16s after exemption reversed

Search code, repositories, users, issues, pull requests...

Mark Zuckerberg, Leader of Digital Necrocide

France's warship builder Naval Group investigates 1TB data breach

The world's biggest passenger planes keep breaking down

Anything You Can Do, I Can Do Meta

Intel’s potential exit from advanced manufacturing puts its Oregon future in doubt

Computer Science > Computation and Language

Fixing Ctrl+C in Rust Terminal Apps: Child Process Management

A Pill for Sleep Apnea Could Be on the Horizon

A Pill for Sleep Apnea Could Be on the Horizon

AI in Wyoming may soon use more electricity than state’s human residents

Bookmarkable by Design: URL-Driven State in HTMX

Kubernetes v1.34 Sneak Peek

Innovation starts with consumers, not academia