As of r65048 R-devel can be compiled to use reference counting instead
NAMED mechanism to determine when objects can be modified in
place or need to be copied. For now
NAMED is the default. To switch
to using reference counting compile with
or uncomment the line
The motivation for this is to allow us to be able to decrement reference counts that go above 1 to help reduce copying and maybe even get to a point where efficient replacement functions can be written in R.
Reference counting is also much easier to understand and think about than the NAMED mechanism. Given that reference counts are managed correctly by default and the exceptions are explicit it is fairly easy to review the whole mechanism. In contrast, with NAMED one has to find the paces where NAMED adjustments might be needed but are missing, which is much harder.
A small number of packages contain C functions called via the
interface that internally check
NAMED on an argument and modify the
argument directly if the
NAMED value permits this. These packages
may need to be more careful or more conservative: In
x <- list(1 + 2) y <- x .Call("foo", x[])
the value received by
foo would have
NAMED = 2, since the
x[] propagates the
NAMED value of the container
to the extracted element. Reference counting only reflects the
immediate references, so the reference count on the value received by
foo will be one. This means that, except in very special and well
understood circumstances, an argument passed down to C code should not
be modified if it has a positive reference count, even if that count is
equal to one.
Only a few packages on CRAN and BIOC call
screening these for possible issues should be fairly easy.
Switching to reference counting seems to have a rather negligible performance impact. On one Linux box I see a 5% hit for tight scalar loop code, on another I see no measurable difference. The table below shows some results for the the first Linux box. The impact on vectorized code is of course even less. Even an across the board 5% hit seems worth while to me for the simplification this change would bring.
The main place reference counts are manipulated is in
exactly the same places where the write barrier is managed. When a
new value is placed in a vector or
CONS cell field the reference
count on the new value is incremented and the reference count on the
previous value is decremented.
In addition, there are a few places that need to not increment reference counts, like
To accomplish this every object has a
TRACKREFS flag; references to
the object's fields are only counted if
TRACKREFS is true. The
CONS_NR produces a
CONS cell that does not track
references; these are used for internal argument lists and the like.
Many of the uses of
CONS_NR could be eliminated by using a stack for
There are also a few places where reference counts can safely be decremented, like
forloop at the end of the loop
xis copied to
xis discarded (growing/shrinking vectors)
Other opportunities for decrementing reference counts exist as well but have not yet been addressed.
The core macros implementing reference counting are
As a rule the values of the
TRACKREFS fields should
only be changed using the higher level macros
To support the transition a number of standard use idioms for
SET_NAMED have been replaced by macros that can also be defined
in terms of reference counts:
NAMED(x) == 2or
NAMED(x) > 1becomes
NAMED(x) == 0becomes
NAMED(x) != 0becomes
SET_NAMED(x, 2)can be replaced by
With these changes in place the initial implementation of reference
counting treats the remaining calls to
NAMED as equivalent to calls
SET_NAMED becomes a no-op. Some of the
SET_NAMED(x, 2) calls could be replaced by
but this does not appear to be necessary.
All remaining uses of
SET_NAMED will need to be
thoroughly reviewed before completing the transition.
The same binary format is used as for the
NAMED mechanism, with the
two bits of the
named field used for the reference count. That means
reference counts stick at the maximal value of 3. Some rearranging of
the header would allow us to use a larger maximal reference count.
include/Rinternals.h: macro definitions
CONS_NR uses in argument lists; changes in the
complex assignment code; decrementing of reference counts in some
main/memory.c: core implementation of reference count adjustment,
along side the write barrier maintenance.
REF values instead of
CONS_NR to build up argument
lists that are only used internally and so should not increment
main/names.c: Mark .Last.value symbol to not increment reference
main/subassign.c: In local function
decrement reference count on object after dispatch. Also, when
extending vectors do not increment reference counts when copying
objects from the old vector into the new one since the old one will
main/subset.c: In local function
reference count on object after dispatch.
NAMED approach R level component extraction function
NAMED value of the container to the extracted
element. So duplicating of LHS values can be deferred until the
replacement functions are called. With reference counting duplicating
is necessary if an object itself or any of its containing LHS values
has a reference count greater than one. The simplest way to deal with
this is to duplicate as needed as new intermediate LHS values are
extracted. This is done in
evalseq for interpreted code. For byte
compiled code it is done in the
SWAP instruction for now, since this
is only used between LHS extractions. This instruction should be
renamed to reflect this usage.
Since reference counts are applied to all objects, including environments and promises, a check after applying a closure can show which bindings are no longer needed, and this allows reference counts on arguments to be decremented again. Thus, for example, after the call
this mechanism can restore the reference count of
x to its value
prior to the call (no modification of the
mean source is
needed). There are still a number of rough edges and interactions with
the complex assignment process that need to be resolved, so this is
not yet committed to the subversion sources. But it is likely these can
be resolved and the result committed before too long.