Unicode shenanigans: Martine écrit en UTF-8

submited by
Style Pass
2024-10-05 19:00:04

Tracing back where it came from, that title was sent already broken by Planet Haskell, which is itself a feed aggregator for blogs. The blog originally produces the good not broken title. Therefore the blame lies with Planet Haskell. It’s probably a misconfigured locale. Maybe someone will fix it. It seems to be running archaic software on an old machine, stuff I wouldn’t deal with myself so I won’t ask someone else to.

In any case, this mistake can be fixed after the fact. Mis-encoded text is such an ubiquitous issue that there are nicely packaged solutions out there, like ftfy.

But my hobby site is written in OCaml and I would rather have fun solving this encoding problem than figure out how to install a Python program and call it from OCaml.

Humans read and write sequences of characters, while computers talk to each other using sequences of bytes. If Alice writes a blog, and Bob wants to read it from across the world, the characters that Alice writes must be encoded into bytes so her computer can send it over the internet to Bob’s computer, and Bob’s computer must decode those bytes to display them on his screen. The mapping between sequences of characters and sequences of bytes is called an encoding.

Leave a Comment
Related Posts