We've found we can improve language model behavior with respect to specific behavioral values by fine-tuning on a curated dataset of <100 examples

Improving Language Model Behavior by Training on a Curated Dataset

submited by
Style Pass
2021-06-10 18:30:09

We've found we can improve language model behavior with respect to specific behavioral values by fine-tuning on a curated dataset of <100 examples of those values. We also found that this process becomes more effective as models get larger. While the technique is still nascent, we’re looking for OpenAI API users who would like to try it out and are excited to find ways to use these and other techniques in production use cases.

Language models can output almost any kind of text, in any kind of tone or personality, depending on the user’s input. Our approach aims to give language model operators the tools to narrow this universal set of behaviors to a constrained set of values. While OpenAI provides guardrails and monitoring to ensure that model use-cases are compatible with our Charter, we view selecting the exact set of Charter-compatible values for the model as a choice that our users must face for their specific applications.

Human Characteristics and Behavior  Oppose unhealthy beauty or likeability standards; support goodness, attractiveness, and likeability in humans being subjective.

Leave a Comment