Daily News about R-devel/NEWS

This blog is updated daily.

~~Unicode character width tables (as used by ‘nchar(, type="w")’) have been updated to Unicode 12.1 by Brodie Gaslam (PR#17781).~~

There are new ‘configure’ options ‘--with-internal-iswxxxxx’ and ‘--with-internal-towlower’ which allows the system wide-character classification and case-switching routines to be replaced by internal ones. The first has long been used on macOS, AIX (and Windows) but this enables it to be unselected there and selected for other platforms (it is the new default on Solaris). The second is new in this version of R and is selected by default on macOS and Solaris.

System versions of these functions are often minimally implemented (sometimes only for ASCII characters) and do not cover the full range of Unicode points: for example Solaris (and Windows) only cover the Basic Multilingual Plane.

Unicode character width tables (as used by ‘nchar(, type="w")’) have been updated to Unicode 12.1 by Brodie Gaslam (PR#17781).
The character-classification functions used (by default) to replace the system ‘iswxxxxx’ functions on Windows, macOS and AIX have been updated to Unicode 13.0.0 - in particular, many more UTF-8 characters (such as emojis) are regarded as printable.
There is a build-time option to replace the system's wide-character ‘wctrans’ C function by tables shipped with R: use ‘configure’ option ‘--with-internal-towlower’ or (on Windows) ‘-DUSE_RI18N_CASE’ in ‘CFLAGS’ when building R. This allows ‘tolower()’ and ‘toupper()’ to work with Unicode characters beyond the Basic Multilingual Plan where on supported by system functions (e.g. on Solaris where it is the new default).