In MySQL, never use “utf8”. Use “utf8mb4”.

submited by
Style Pass
2022-01-12 15:30:06

This is a UTF-8 client and a UTF-8 server, in a UTF-8 database with a UTF-8 collation. The string, “😃 <…”, is valid UTF-8.

The “utf8” encoding only supports three bytes per character. The real UTF-8 encoding — which everybody uses, including you — needs up to four bytes per character.

Of c ourse, they never advertised this (probably because the bug is so embarrassing). Now, guides across the Web suggest that users use “utf8”. All those guides are wrong.

I’ll make a sweeping statement here: all MySQL and MariaDB users who are currently using “utf8” should actually use “utf8mb4”. Nobody should ever use “utf8”.

Computers store text as ones and zeroes. The first letter in this paragraph was stored as “01000011” and your computer drew “C”. Your computer chose “C” in two steps:

Character sets are a solved problem. Almost every program on the Internet uses the Unicode character set, because there’s no incentive to use another.

Leave a Comment