Authors: Mengxia Yu, De Wang, Qi Shan, Colorado Reed, Alvin Wan
Paper: https://arxiv.org/abs/2411.07191
Code: https://github.com/mengxiayu/LLMSup

The Super Weight in Large Language Models

submited by

Style Pass

2024-11-29 15:30:08

Authors: Mengxia Yu, De Wang, Qi Shan, Colorado Reed, Alvin Wan Paper: https://arxiv.org/abs/2411.07191 Code: https://github.com/mengxiayu/LLMSuperWeight

A fascinating study reveals that zeroing out a single weight inside an LLM can catastrophically degrade its performance. The authors call these critical parameters "super weights" and propose a method to find them in just one forward pass.

Trained LLMs contain a group of outlier weights with large magnitudes, comprising about 0.01% of all model weights - still hundreds of thousands in billion-parameter models. This was known before. The current work shows that within this group, there exists a single weight (the super weight, SW) - not necessarily the largest - whose importance exceeds the combined importance of thousands of other outliers. This weight is essential for quality; without it, LLMs cannot generate coherent text. Perplexity increases by several orders of magnitude, and zero-shot task accuracy drops to random when you zero it out.

Previously ( https://arxiv.org/abs/2402.17762), researchers discovered super-activations critical for model quality. These appear in various layers, have constant magnitude, and are consistently found in the same position regardless of input. The current work finds that the activation channel aligns with that of the super weight, with the activation first appearing immediately after the super weight. Pruning this super weight significantly reduces the activation, suggesting the activation is caused by it rather than merely correlated. These activations are called super activations ( SA).

Why Go?

Comment

marvin

Comment

Getting Started with Large Format Film Photography by David Rose

Comment

neuml / txtai Public

Comment

Extend Your Language, Don’t Alter It

Comment

Apple co-founder Steve Wozniak: 'It's time to recognize the right to repair'

Comment

Relating Natural Language Aptitude to Individual Differences in Learning Programming Languages

Comment

Class action lawsuit filed against Dell for misleading advertising on the upgradenability of Alienware laptop

Comment

Microsoft Launches JIT-Free 'Super Duper Secure Mode' Edge Browser Experiment | SecurityWeek.Com

Comment

Fine-tune and deploy the ProtBERT model for protein classification using Amazon SageMaker

Comment

The Super Weight in Large Language Models

Leave a Comment

Related Posts

Why Go?

marvin

Getting Started with Large Format Film Photography by David Rose

neuml / txtai Public

Extend Your Language, Don’t Alter It

Apple co-founder Steve Wozniak: 'It's time to recognize the right to repair'

Relating Natural Language Aptitude to Individual Differences in Learning Programming Languages

Class action lawsuit filed against Dell for misleading advertising on the upgradenability of Alienware laptop

Microsoft Launches JIT-Free 'Super Duper Secure Mode' Edge Browser Experiment | SecurityWeek.Com

Fine-tune and deploy the ProtBERT model for protein classification using Amazon SageMaker

Recent Posts

How gen AI is moving from the Napster to the Spotify era; Trump eyes AI czar; we need a better way to benchmark AI models; new competitors emerge to OpenAI's o1; artists leak Sora in protest

The science of why you can remember song lyrics from years ago

Static website hosting with Cloudflare pages

Chinese pebble-bed reactor passes “meltdown” test

Beware of Econ Grad School - by Arnold Kling - In My Tribe

Canadian media companies sue OpenAI in case potentially worth billions

Don't Worry About the Vase

Computer Science > Machine Learning

Making Computer Faster: a Deep Dive into Dynamic Dispatch (part 1) - Deviant/Abstraction

Surprising 16-year-long ADHD study reveals opposite of what researchers expected

West Is Best — How Westvleteren 12 Accidentally Became “The Best Beer In The World”

Wick Allison’s Raspberry Secret Is Out

World’s largest piracy network taken down after 100 homes raided across 10 countries

Planned Release Date: 2025

Yes, There Are Antitrust Voters in a Swing State

Who is supposed to teach the law to the citizens?

Handheld consoles are the industry's next battleground | Opinion

Computer Science > Distributed, Parallel, and Cluster Computing

CUBE | Notion

Cinnamon, spice and ‘everything nice’ – why lead-tainted cinnamon products have turned up on shelves, and what questions consumers should ask