Introducing Gretel’s Synthetic Safety Dataset, a resource designed to align large language models (LLMs) to produce safe and ethical responses. Deve

Building Datasets to Enable Safer AI Responses

submited by
Style Pass
2024-12-13 21:30:05

Introducing Gretel’s Synthetic Safety Dataset, a resource designed to align large language models (LLMs) to produce safe and ethical responses. Developed using Gretel Navigator's Data Designer toolkit, the dataset features 8,361 triplets of “prompt”, “response” and “safe response” spanning significant risk categories. Our goal was to create a transparent and modular dataset that the AI community can utilize to align models for secure and public-interest-focused interactions. 👉Available on HuggingFace @ gretelai/gretel-safety-alignment-en-v1 👉6k train / 1.2k validation / 1.2k test split 👉Generated with Apache 2.0 licensed models Disclaimer: This dataset may include content that is offensive or inappropriate. It covers topics such as discrimination, harassment, propaganda, religious intolerance, gender bias, and more, to help AI systems learn to prioritize appropriate content. We urge you to approach the data with caution.

Language models often undergo instruction tuning before being released for public use. This process typically includes safety alignment techniques such as Reinforcement Learning from Human Feedback (RLHF), Reinforcement Learning from AI Feedback (RLAIF), Direct Preference Optimization (DPO), Relative Preference Optimization (RPO), and sometimes simply using Supervised Fine-Tuning (SFT). Creating datasets for alignment can be time-consuming and requires creativity to generate or source prompts. Prompt generation benefits tremendously from human expertise in jailbreaking (attempts to bypass model restrictions) and red teaming (simulated attacks to test system security). Additionally, each prompt must correspond to (often) two responses, marked as more desirable vs. less desirable (or chosen vs. rejected) by crowdworkers or AI.

Leave a Comment