This blog is updated daily..

A general description is here.

There is support for vectors longer than 2^31 - 1 elements on 64-bit platforms. This applies to raw, logical, integer, double, complex and character vectors, as well as lists. (Elements of character vectors remain limited to 2^31 - 1 bytes.)

Use of such vectors is work-in-progress.

Most operations which can sensibly be done with long vectors now work: others may return the error ‘long vectors not supported yet’. Some of these are because they explicitly work with integer indices (e.g. ‘anyDuplicated()’ and ‘match()’) or because of other limits (e.g. of character strings or matrix dimensions) would be exceeded or the operations would be extremely slow.

‘length()’ returns a double for long vectors, and lengths can be set to 2^31 or more by the replacement function with a double value.

Most aspects of indexing are available. Generally double-valued indices can be used to access elements beyond 2^31 - 1.

There is some support for matrices and arrays with each dimension less than 2^31 but total number of elements more than that. Only some aspects of matrix algebra work for such matrices, often taking a very long time. In other cases the underlying Fortran code has an unstated restriction (as was found for complex ‘svd()’).

‘dist()’ can produce dissimilarity objects for more than 65536 rows (but for example ‘hclust()’ cannot process such objects).

‘serialize()’ to a raw vector is no longer limited in size (except by resources) on 64-bit platforms.

The C-level function ‘R_alloc’ can now allocate 2^35 or more bytes on 64-bit platforms.

‘agrep()’ and ‘grep()’ will return double vectors of indices for long vector inputs.

Many calls to ‘.C()’ have been replaced by ‘.Call()’ to allow long vectors to be supported (now or in the future). Regrettably several packages had copied the non-API ‘.C()’ calls and so failed.

‘.C()’ and ‘.Fortran()’ do not accept long vector inputs. This is a precaution as it is very unlikely that existing code will have been written to handle long vectors (and the R wrappers often assume that ‘length(x)’ is an integer).

Most of the methods for ‘sort()’ work for long vectors.

‘rank()’, ‘sort.list()’ and ‘order()’ support long vectors (slowly except for radix sorting).

‘sample()’ can do uniform sampling from a long vector.