The API, which is the same API that powers moderation in Mistral’s Le Chat chatbot platform, can be tailored to specific applications and safety

Mistral launches a moderation API

submited by

Style Pass

2024-11-09 02:00:03

The API, which is the same API that powers moderation in Mistral’s Le Chat chatbot platform, can be tailored to specific applications and safety standards, Mistral says. It’s powered by a fine-tuned model (Ministral 8B) trained to classify text in a range of languages, including English, French, and German, into one of nine categories: sexual, hate and discrimination, violence and threats, dangerous and criminal content, self-harm, health, financial, law, and personally identifiable information.

“Over the past few months, we’ve seen growing enthusiasm across the industry and research community for new AI-based moderation systems, which can help make moderation more scalable and robust across applications,” Mistral wrote in a blog post. “Our content moderation classifier leverages the most relevant policy categories for effective guardrails and introduces a pragmatic approach to model safety by addressing model-generated harms such as unqualified advice and PII.”

AI-powered moderation systems are useful in theory. But they’re also susceptible to the same biases and technical flaws that plague other AI systems.