This benchmark assesses large language models along two critical dimensions: their capability to generate convincing disinformation and their resilien

Search code, repositories, users, issues, pull requests...

submited by
Style Pass
2024-10-22 16:30:03

This benchmark assesses large language models along two critical dimensions: their capability to generate convincing disinformation and their resilience against misleading information. The evaluation framework uses recent articles outside the models' training data, deriving fact-based questions that probe both deceptive capabilities and resistance to manipulation. Models must craft persuasive but misleading arguments, while also demonstrating their ability to maintain accurate reasoning when faced with deceptive content from other models.

Leave a Comment