In recent years, large language models (LLMs) have achieved success at a range of tasks such as question answering, summarisation, and dialogue. Dialo

Building safer dialogue agents

submited by
Style Pass
2022-09-22 14:30:09

In recent years, large language models (LLMs) have achieved success at a range of tasks such as question answering, summarisation, and dialogue. Dialogue is a particularly interesting task because it features flexible and interactive communication. However, dialogue agents powered by LLMs can express inaccurate or invented information, use discriminatory language, or encourage unsafe behaviour.

To create safer dialogue agents, we need to be able to learn from human feedback. Applying reinforcement learning based on input from research participants, we explore new methods for training dialogue agents that show promise for a safer system.

In our latest paper, we introduce Sparrow – a dialogue agent that’s useful and reduces the risk of unsafe and inappropriate answers. Our agent is designed to talk with a user, answer questions, and search the internet using Google when it’s helpful to look up evidence to inform its responses.

Sparrow is a research model and proof of concept, designed with the goal of training dialogue agents to be more helpful, correct, and harmless. By learning these qualities in a general dialogue setting, Sparrow advances our understanding of how we can train agents to be safer and more useful – and ultimately, to help build safer and more useful artificial general intelligence (AGI).

Leave a Comment