I have been reading about how current LLMs are too big. The current trend shows that Small Language Models (SLM) are just getting better and better. L

Lobotomizing some models

submited by
Style Pass
2024-12-23 14:00:04

I have been reading about how current LLMs are too big. The current trend shows that Small Language Models (SLM) are just getting better and better. Look at Moondream or in a not so small format Llama3.3 70b that is as good as the 405b model.

There are some layers that are more important than others1. We don’t need all of them, so, we could remove some of the layers and still have a good enough model. Using a benchmark as a proxy to determine which models are good and which ones are bad. But, I am as the kids say, GPU poor. So to approach this problem, I had do it without a lot of compute.

The paper from Sakana Labs about Evolutionary Model Merging inspired this idea. The main concept behind this paper is that by using evolutionary algorithms you can merge two different models and get optimal new LLMs. The authors were not sure which amount of a model was more important than the other one. But at the end of the day Evolutionary Algorithms are a good way to search through the possible combinations and get good enough new model. The important thing in here is that you can get a good new model only by evaluating models. Through evolution alone, they were able to merge two fine-tuned model one for Japanese and another one for math into a single model that excelled at both tasks.

Depending on the model size, the search space can be small, but it can get out of hand. If we want to test different models with different amounts of activated layers. How should we do that in the first place?

Leave a Comment