China Censorship 									 								deccp Dataset 									 								Refusals 									 								CCP-Aligned Answers 									 								Chine

An Analysis of Chinese LLM Censorship and Bias with Qwen 2 Instruct

submited by
Style Pass
2024-06-09 13:00:06

China Censorship deccp Dataset Refusals CCP-Aligned Answers Chinese vs English Let's Abliterate OK, so why does this matter? Recommendations All models have biases and most Instruct/Chat models are aligned for "safety", with Western moral biases, etc. There's spirited debate on when and where those lines should be drawn, but for the purposes of this discussion, the most important thing is to recognize that these biases exist. The second important thing, and the topic of the rest of this analysis/exploration is that while alignment for most open models can vary greatly, Chinese models have their own unique (Chinese government mandated) alignment and biases.

I've long had a Tiananmen Square Massacre (and as a counterpoint, a corresponding LA Riots) question in my personal vibecheck tests. Chinese models invariably lose a point on this, but in general, it's not such a big deal - if a model is strong or useful, it'll score well regardless, and most of the models don't perform well enough for this to really matter anyway. Which isn't to say that Chinese LLM/research groups haven't been doing great work - their technical reports and contributions to the open LLM community has been amazing, and in fact, oftentimes of late, much better than Western frontier labs. Still, for me personally, Chinese language alignment just hasn't been much of a practical concern.

Leave a Comment