This post is heavily informed by prior work, most notably that of Owain Evans, Owen Cotton-Barratt and others (Truthful AI), Beth Barnes (Risks from A

Truthful LMs as a warm-up for aligned AGI

submited by

Style Pass

2022-01-20 14:00:10

This post is heavily informed by prior work, most notably that of Owain Evans, Owen Cotton-Barratt and others (Truthful AI), Beth Barnes (Risks from AI persuasion), Paul Christiano (unpublished) and Dario Amodei (unpublished), but was written by me and is not necessarily endorsed by those people. I am also very grateful to Paul Christiano, Leo Gao, Beth Barnes, William Saunders, Owain Evans, Owen Cotton-Barratt, Holly Mandel and Daniel Ziegler for invaluable feedback.

In this post I propose to work on building competitive, truthful language models or truthful LMs for short. These are AI systems that are:

WebGPT is an early attempt in this direction. The purpose of this post is to explain some of the motivation for building WebGPT, and to seek feedback on this direction.

Truthful LMs are intended as a warm-up for aligned AGI. This term is used in a specific way in this post to refer to an empirical ML research direction with the following properties: