Today, we’re releasing our cost-effective hyperparameter optimizer, CARBS, which enables researchers to more easily scale small experiments to large

Open-sourcing CARBS: a cost-effective hyperparameter optimizer that helps scale small experiments to large language models

submited by
Style Pass
2024-06-27 23:30:04

Today, we’re releasing our cost-effective hyperparameter optimizer, CARBS, which enables researchers to more easily scale small experiments to large models.

As part of our efforts to train a 70B-parameter language model, we conducted extensive experimentation at a smaller scale. There were two goals of the small scale experiments: to choose optimal hyperparameters for the larger scale run, and to predict the performance of the large scale run.

One challenge with small scale experiments is that, due to the noise in the training and language modeling process, it is difficult to compare the performance of two models on benchmarks that are still relevant at a larger scale. This motivated the development of a metric that is both sensitive (giving meaningful results even for models with fewer than 300M parameters) and repeatable (where the same training procedure gives the same result).

Next, we used this metric with CARBS, our cost-aware hyperparameter tuning algorithm, to tune dozens of hyperparameters over thousands of experiments at small scale. While some hyperparameters, such as the data mix, could be kept fixed while scaling from small experiments to large ones, this was not possible with others, such as the learning rate, the number of attention heads, or the multi-layer perceptron (MLP) width. Because CARBS generates optimal values for a range of costs, we could extrapolate from the Pareto set of observations to find how we should scale each parameter as we scale up our training. This allowed us to accurately predict the performance of our final model.

Leave a Comment