Problems with iconv on macOS

submited by
Style Pass
2025-01-22 04:00:05

For conversion of strings from a given character encoding to another, R uses iconv, a function defined by POSIX. It is available on Linux and macOS with the operating system and for Windows, R ships with a slightly customized version of win_iconv, which implements the same functionality on top of Windows API.

The differences between iconv implementations, partially allowed by a rather permissive definition of the interface in POSIX, pose a challenge for maintaining R and cause differences between platforms observed by users.

A recent significant challenge has been new iconv implementation that came with macOS 14.0. It not only changed the behavior with characters not representable in the target encoding, but also caused crashes and incorrect conversions. This post focuses on work-arounds in R, some of which were already in R 4.4, but have been extended and improved in R-devel, the development version of R to become R 4.5.0. The work-arounds were part of a bigger effort dealing with libiconv changes on macOS, otherwise mostly by Brian Ripley.

This text includes technical details. The higher-level message to users and package authors is that converting characters to an encoding where they are not representable is platform-dependent and the outcomes can change over time; while R documents what it does with such characters, it won’t happen when the system silently transliterates the characters without telling R. It is best to avoid such conversions, so e.g. to only use characters in plot labels that are representable in the given encoding (more in ?pdf). It is good to use UTF-8 whenever possible. A message specific to R package authors and developers on macOS: when R is built from source, by default the system libiconv will be used, which may behave strangely and change its behavior on any system update. This is currently not the case with R CRAN builds.

Leave a Comment