Or at least, you’re interested in knowing what it takes to create one from the ground up. That’s completely understandable — who wouldn’t? How

Why you shouldn’t Train your LLM from Scratch

submited by

Style Pass

2024-10-24 09:30:16

Or at least, you’re interested in knowing what it takes to create one from the ground up. That’s completely understandable — who wouldn’t?

However, you probably already know you can’t but want to know regardless. To be blunt, it is impractical for most individuals and organisations to do this.

Let’s use GPT-4 as an example since that’s the AI model with the most public information on its associated training costs. It took 25,000 Nvidia A100 GPUs running for 90–100 days non-stop to train the model. Considering that each A100 GPU costs around $15K, the total GPU expense alone was about $375M.

If buying the hardware seems too steep, renting might appear more accessible. However, renting A100 GPUs on cloud platforms like AWS costs about $3 per hour. Putting the cost for GPT-4s training at $180M , which is cheaper than buying the training hardware but not cheap either.

Similarly, LLama3 was trained on 24,000 H100 Nvidia GPU, meaning the estimated GPU training costs were $720m. These 2 examples give a good idea of the main cost when it comes to training.