This blog is updated daily.
A general description is here.
The parser now treats ‘\Unnnnnnnn’ escapes larger than the upper limit for Unicode points (‘\U10FFFF’) as an error as they cannot be represented by valid UTF-8.
Where such escapes are used for outputting non-printable characters, 6 (not 8) hex digits are used (as it was decided by Unicode that the first two would always be zero).
The parser now looks for non-ASCII spaces on Solaris, in addition to Windows, macOS, FreeBSD and OSes such as Linux that declare ‘wchar_t’ is encoded as Unicode.
There are warnings (including from the parser) on the use of unpaired surrogate Unicode points such as ‘\uD834’ (which cannot be converted to valid UTF-8).
‘tolower()’, ‘toupper()’ and ‘chartr()’ have more support for inputs with a marked encoding (UTF-8 or Latin-1) in a single-byte locale.
The code for the evaluating default (extended) regular expressions now uses the same character-classification functions as the rest of R; in some cases (Windows, AIX, macOS) these replace limited system ones, the differences being in non-Latin characters.
The parser now treats ‘\Unnnnnnnn’ escapes larger than the upper limit
for Unicode points (‘\U10FFFF’) as an error as they cannot be
represented by valid UTF-8.
Where such escapes are used for outputting non-printable characters, 6 (not 8) hex digits are used (as it was decided by Unicode that the first two would always be zero).
Code converting UTF-8 strings (e.g., ‘tolower()’ and some ‘printing’)
now uses internal routines rather than system functions: these detect
more uses of invalid UTF-8 strings.
There are warnings (including from the parser) on the use of unpaired surrogate Unicode points such as ‘\uD834’ (which cannot be converted to valid UTF-8).