Foreign Language LLM Jailbreak

submited by

Style Pass

2025-07-30 08:30:06

During the final finetuning phase, many LLMs are adjusted for “compliance”. This is actively used to avoid offensive content, but also to prevent criminal use of LLMs (“How do I build a bomb?”). LLMs from China either refuse to answer political questions or replicate the Chinese official rhetoric. Using e.g. Kimi K2, this is obvious:

Kimi even suggests further questions like Where did the protests start? which is surprising. Asking the question leads to protests in Taiwan, but even after answering that question the model deletes the answer (which is done on the frontend).

Is there a way to get around this? Does the model know more? Asking the same question in German leads to completely different answers:

The same is true for the other question. This even leads to results which blame China for not allowing further investigations!

What is happening here? It seems like the final (alignment) phase of finetuning is language-specific. Potentially, this has only been done in English and Chinese and does not map to other languages. Obviously, all of this knowledge is still contained in the model itself. It will be interesting to see if this method can also be applied to recover other information which has been actively hidden by the LLM creators.

Foreign Language LLM Jailbreak

Leave a Comment

Related Posts

Recent Posts

The upstart company that wants to build the world's largest aircraft

Tracking source locations

The leaked email that blows apart the BBC’s impartiality claims over Gaza

Search code, repositories, users, issues, pull requests...

Use DNSBL to block AI crawlers in Caddy

Money by Vile Means | Compact

Oulipo - Wikipedia

PlayReady DRM Leak Triggers Microsoft Takedown and Amazon Account Suspensions

Improve PostgreSQL performance: Diagnose and mitigate lock manager contention

'Communities' of strange, extreme life seen for first time in deep ocean

Debug WebSocket Like a Pro

SUSPENDING DUTY-FREE DE MINIMIS TREATMENT FOR ALL COUNTRIES

The Sunday Morning Post: ‘Science Is the Belief in the Ignorance of Experts’

Amazon Promotes Malphas to Senior Vice President of Bad Decisions, Unveils 17th Leadership Principle

Computer Science > Information Retrieval

Roundup of August 2025 Bootstrapper Events

Climate | Department of Energy

Friction and not being touched

Scientists May Have Found Humanity’s Sixth Sense—In Our Gut

There’s A Little Guy Hidden On The Side Of Most U-Haul Trucks