Last week, OpenAI released an update to their 4o model that displayed a stunning degree of sycophancy. I had considered the tendency of LLMs to praise

Syco-bench: A Simple Benchmark of LLM Sycophancy

submited by

Style Pass

2025-07-28 17:30:13

Last week, OpenAI released an update to their 4o model that displayed a stunning degree of sycophancy. I had considered the tendency of LLMs to praise their users an annoyance before, but like many others I now think it’s a serious issue. One way to get AI companies to take something seriously is to make a benchmark for it, so that’s what I decided to do. It’s a bit rough around the edges for now, but if it proves useful I’d be happy to put a lot more work in.

So far, the benchmark consists of three tests, described in the charts for them below. A higher score is worse across all benchmarks:

I’m not sure how system prompts for web chat versions of these models compare to the ones used in the API, so this may not reflect things like system prompt changes in ChatGPT.

The prompts for each of the tests were made by Gemini 2.5 Pro, which may bias results for that model. The data is also not very good.

Syco-bench: A Simple Benchmark of LLM Sycophancy

Leave a Comment

Related Posts

Recent Posts

Stop Writing Prompts From Scratch

Image to Video AI Generation && Ai Image To Video

Study May Undercut Idea That Cash Payments to Poor Families Help Child Development

C.D.C. Ties 85 Cases of THC-Related Symptoms to Wisconsin Restaurant

`whoami`

Export passwords to another password manager on iPhone

Pluralistic: How twiddling enshittifies your brain (28 Jul 2025)

Get started with Sequoia

Harrowing photos show one B-17 bombing another

Fluid: How we built serverless servers

1. Prominent features

Fluid: How we built serverless servers

Search code, repositories, users, issues, pull requests...

The mitotic chromosome periphery modulates chromosome mechanics

Large Danish Study: No link between vaccines and autism or 49 other health conditions

Nadia Odunayo & Scaling Rails for Millions of Users as a Solo Dev

Intercepting 3I/ATLAS at Its Closest Approach to Jupiter with the Rejuvenated Juno Spacecraft

When LLMs autonomously attack

Ship FastAPI10x Faster

I Built KPIs for 20+ Startups, and Still Nearly Lost My Mind