I blog about latest machine learning research topics that have an immediate impact on the work us data scientists/machine learning engineers do every day. Share the newsletter with your friends so that we all grow together.
Outlier parameters are small number of parameters which are disproportionally important to performance of the LLM. A billion parameter LLMs will have miniscule outlier parameters, say 0.01% of the total count of the total parameters. But this too translates to hundreds of thousands of parameters.
Authors [1] point to presence of “Super weights” as a subset of outlier parameters. Pruning as few as a single super weight can ‘destroy an LLM’s ability to generate text – increasing perplexity by 3 orders of magnitude and reducing zero-shot accuracy to guessing’.
Authors [1] state that removing non Super weight outliers, that are sometimes larger than the super weight themselves, affects performance of the LLM by no more than a few percentage points. Interestingly, removing a single super weight results in accuracy drop which is much greater than the effect of removing all other non Super weight outliers combined.