Humans speak countless different languages.  Not only are these languages incompatible, but runtime transpilation is a real pain.  Sadly, every standa

You can't just assume UTF-8

submited by
Style Pass
2024-04-29 06:30:04

Humans speak countless different languages. Not only are these languages incompatible, but runtime transpilation is a real pain. Sadly, every standardisation initiative has failed.

At least there is someone to blame for this state-of-affairs: God. It was him, after-all, who cursed humanity to speak different languages, in an early dispute over a controversial property development.

Let's take the character "A" for example. It was assigned the number 65 in the American Standard Code for Information Interchange, or ASCII. This numbering was grandfathered into Unicode, except that they the Unicode people write the number 65 in hexadecimal, as U+0041. And they call it a "codepoint".

Easy enough - at least the number for "A" enjoys wide consensus. But computers can't just store decimal numbers, they can only store binary.

Only the second and final bits are 1, or "on". The second bit is worth 64 and the last bit is worth 1. Those two sum up to 65. Easy peasy.

Leave a Comment