Steps for Making C code in R Thread-safe


Duncan Temple Lang Robert Gentleman

Table of Contents

  1. Making do_allnames() thread-safe
    1. Removing the Final Globals
    2. The final step
  2. Identifying Global Variables
  3. Footnotes

Abstract

Future versions of R will (optionally) support internal, and probably user-level threads, and so it is desirable that C code accessed from R also be thread-safe. By this we essentiall mean that two different streams of executions can be running concurrently, executing the same code but have different local variables. This document is intended to provide guidelines for those writing C code to be used with R to make it thread-safe. It is the first in a number of documents that we will attempt to provide that discuss threads.

1. Making do_allnames thread-safe

The idea here is to take a small, specific segment of code within the R tree as it currently is and to re-organize it so as to make it thread-safe. In very simple terms, what this means is that we avoid global variables [1] . The code that we focus on is the C routine do_allnames() and the associated routine namewalk() both in list.c. We will resist any temptation to rewrite this code except for the purpose of removing the use of global/static variables.

There are 6 static variables defined in list.c
static SEXP	ans;
static int	UniqueNames;
static int	IncludeFunctions;
static int	StoreValues;
static int	ItemCounts;
static int	MaxCount;
These naturally form a group of related variables that are to be used in associate with each other. In many respects, a class would be an obvious way to group them. We will use C's equivalent of this which is a struct and gather these variables into a single variable.
typdef struct {
 SEXP	ans;
 int	UniqueNames;
 int	IncludeFunctions;
 int	StoreValues;
 int	ItemCounts;
 int	MaxCount;
} NameWalkerData;

We can start our changes by declaring a global variable which is an instance of this structure.

  static NameWalkerData GlobalNameData;
  static NameWalkerData *nameData = &GlobalNameData;

GlobalNameData is an instance of this structure. For reasons that will become clearer later on, we will want to refer to the fields in this instance of the structure via a pointer. Hence we define nameData as a pointer to a NameWalkerData instance and set it to point to GlobalNameData.

Of course, GlobalNameData is a global/static variable and so will not be thread-safe. We have simply reduced the number of globals from 6 to 1 (or 2 because of the use of a pointer). We will remove this global variable later, but will use it to focus on the changes to the code that use the original 6 variables.

We should note that this may not work with all compilers (i.e. initializing a static variable as the address of another static variable), but we will remove this code and are using it only for purposes of explanation.

If we recompile with these two changes (defining the structure and declaring an instance of it), we will obviously get numerous error messages about the original 6 variables not being defined. We can use these errors to step through the code (i.e. using something like emacs' navigation facilities for jumping to the point of compilation errors).

We have several different ways to go about changing the code and the approach one choses depends on how much time one wants to put in [2] .
  • Update variable references
  • This is the obvious approach where we replace all the references (e.g. MaxCount) with a reference to the corresponding field in nameData. Therefore, code such as
    
        switch(TYPEOF(s)) {
        case SYMSXP:
    	if(ItemCounts < MaxCount) {
    
    
    becomes
    
        switch(TYPEOF(s)) {
        case SYMSXP:
    	if(nameData->ItemCounts < nameData->MaxCount) {
    
    
  • Use Macros
  • We have used the first approach and the resulting code can be seen in step1.html

    1. Removing the Final Globals

    The next thing to consider is that we must reset the contents of GlobalNameData for each top-level call to do_allnames(). Basically, we must initialize GlobalNameData for each call to do_allnames(). The simplest way to do this is to remove GlobalNameData altogether and simply have a local instance of NameWalkerData. We can then set the nameData to point to this and all will be well.
    SEXP do_allnames(SEXP call, SEXP op, SEXP args, SEXP env)
    {
        SEXP expr;
        int i, savecount;
    
        NameWalkerData localData;
        nameData = &localData;
    
        checkArity(op, args);
    
        expr = CAR(args);
    
    
    The code at the end of this step is in step2.html

    2. The final step

    The final step is to remove the global nameData. To be thread-safe, this is essential. Why? Consider two threads in each of which there is a call to do_allnames(). Obviously, it these calls are concurrent, they will both see the same nameData object and each will access and write over the fields in that structure. Clearly the results are highly order dependent and unlikely to be reproduced.

    So how can we get rid of this global variable? At this point in the process, it is easy to see that we can simply pass the instance of NameWalkerData as an argument to namewalk() from do_allnames() and all recursive calls to namewalk(). We remove the global variable nameData and make it local to do_allnames() and point to localData within that routine. Then, we modify the declaration for namewalk() to take an additional argument of type NameWalkerData * and we make certain to call this parameter nameData. This means we do not have to change any references to the different fields that we introduced in step 1. Recompiling at this point will identify all the places that we call namewalk() without the new argument and we can change these calls to include the local variable nameData.

    The code at the end of this step is in step3.html

    2. Identifying Global Variables

    The do_allnames() example was reasonably straightforward and we also managed to ignore an important point. And that is how did we know that there were global variables in that section of the code? In other words, how did we know it may not be thread-safe.

    The simplest mechanism for finding global (i.e. non-local) variables is to compile the C code and to use the nm utility [3] .

    Footnotes

    1. Being thread-safe does not require avoidance of global variables, but synchronized access to these. But that will bring us away from our primary goal of this document which is to describe how one can remove global variables.
    2. At this point in time. The quick approach may come back to haunt you and eat up much more time in debugging problems causes by pre-processor details.
    3. There are different variants of nm but they all provide the same type of information.