Meta AI recently released a new language model called LLaMA. And by “released a model”, I mean “didn’t really release a model”. They released a really really nice form instead which you can fill out and then Meta will get back to you after snooping on you just to make sure you haven’t been naughty recently (did I mention the form is really nice and it’s public: EVERYBODY can fill out the form). Presumably, no weights for you (or just random weights for you) if they find out you have been a bit too naughty for their liking.
Anyway. So, these LLaMAs come in four different sizes: from 6.7B parameters (smol) to 65.2B parameters (chonky). The largest two models are trained for 1.4T tokens, whereas the smaller ones are trained for 1T tokens (not really sure why). This is roughly ~1 epoch (effectively) over the training data. The largest model roughly follows the Chinchilla compute-optimal recipe. There’s nothing the least bit remarkable about the models or the training setup. It’s just the standard GPT model trained in the standard way. The training data is said to be all public, although I didn’t check this carefully for myself (one hopes that it’s not public in the Meta sense of public. Just kidding, but not really).
The money figure in the LLaMA paper (for me) is the following figure that shows the training loss curves for the four models (Figure 1):