Aligner : Achieving Efficient Alignment through Weak-to-Strong Correction

submited by

Style Pass

2024-04-04 09:30:05

Efforts to align Large Language Models (LLMs) are mainly conducted via Reinforcement Learning from Human Feedback (RLHF) methods. However, RLHF encounters major challenges including training reward models, actor-critic engineering, and importantly, it requires access to LLM parameters. Here we introduce Aligner, a new efficient alignment paradigm that bypasses the whole RLHF process by learning the correctional residuals between the aligned and the unaligned answers. Our Aligner offers several key advantages. Firstly, it is an autoregressive seq2seq model trained on the query-answer-correction dataset via supervised learning; this offers a parameter-efficient alignment solution with minimal resources. Secondly, the Aligner facilitates weak-to-strong generalization; finetuning large pretrained models by Aligner's supervisory signals demonstrates strong performance boost. Thirdly, Aligner functions as a model-agnostic plug-and-play module, allowing for its direct application on different open-source and API-based models. Remarkably, Aligner-7B improves 11 different LLMs by \(21.9\%\) in helpfulness and \(23.8\%\) in harmlessness on average (GPT-4 by \(17.5\%\) and \(26.9\%\)). When finetuning (strong) Llama2-70B with (weak) Aligner-13B's supervision, we can improve Llama2 by \(8.2\%\) in helpfulness and \(61.6\%\) in harmlessness.

1.Aligner achieves noticeable alignment with less training data. For instance, with 50K training data, Aligner trained a 7B model, enhancing GPT-4's helpfulness by 19% and safety by 26%, and boosting Vicuna 33B's helpfulness and safety by 51% and 56%, respectively.

Aligner : Achieving Efficient Alignment through Weak-to-Strong Correction

Leave a Comment

Related Posts

Recent Posts

Being a middle manager is getting more and more toxic

Cheaper Snapdragon X Is Here! First Look at Lenovo's 8-Core X Plus Arm Laptops

USC study confirms the rotation of Earth’s inner core has slowed

Thoughts on Simplicity and Its Quiet Strength

Cismela

A top 'engineer' faked his degrees and only had a high-school education. He got away with it for years.

Portable VR Welding Simulator

If Trump wins the election, it will doom our efforts to slow climate disaster

Islands are engines of language diversity

Steve Wozniak Reunites With the Historic Homebrew Computer Club

Search code, repositories, users, issues, pull requests...

Saving Southeast Asia’s Sunken Warships

Generate Videos in One Place (Beta)*

Nature’s ghosts: how reviving medieval farming offers wildlife an unexpected haven

Google says replacing C/C++ in firmware with Rust is easy

Death Valley National Park has its hottest summer on record

Real-Time Self-Assembly of Stereomicroscopically Visible Artificial Constructions in Incubated Specimens of mRNA Products Mainly from Pfizer and Moderna: A Comprehensive Longitudinal Study

Nvidia Part I: The GPU Company (1993-2006)

Investigating Cleo | Notion

LLMs are a dead end to AGI, says François Chollet