Old- and New-Style Classes and Methods

Beginning with Version 1.4, R will have available formal class and method definitions, assuming we don't back out current efforts. These are at the moment in package methods; they may partly or entirely be moved into the base package.

The formal classes and methods implement most of the API defined in Programming with Data (Springer, 1998). See the companion page for a list of exceptions.

R also has an implementation of the informal classes and methods, similar to that described in Statistical Models in S and other books. The present page outlines some of the issues in dealing with two models.

The discussion will use the short forms``S3 classes'' and ``S4 classes'', since that terminology has become fairly common. But what we mean throughout is the existing R mechanisms and the new (and future) more formal mechanisms.

Class, the "class" Attribute, and is.object

In the S4 class model, every object has a class, and the class is always a single character string. In the S3 class model, class(x) is the attribute named "class", which may be anything. In particular, it will be a character vector informally interpreted as the ``primary'' class of the object in the first string, and other classes that the object inherits possibly as additional elements in the vector. The attribute can also be NULL; that is, not there.

In R, extra efficiency is obtained in the S3 class dispatch by allocating a bit in the sxpinfo field at the C level. The R function is.object tests that bit, which is asserted to be 1 iff there is a class attribute of length greater than zero.

The current R implementation of S4 classes uses the class attribute to hold the class, but never allows NULL values for class(x); if the attribute is not there, the value returned follows the logic of data.class; that is, arrays and matrices are recognized by examining the dim attribute, if any, and the default is essentially typeof(x).

The implementation does not enforce the class to be a single string, but it will be if the class is formally defined.

For compatibility, the source of most difficulty will be code written for S3 classes that uses, explicitly or implicitly, the test:

As Luke Tierney points out, the preferred alternative for efficiency (and now, importantly, for correctness) is:

For future consideration, what should happen to the obj bit in the internal representation? Should it be retained? Should there be a bit instead or in addition for formally defined classes?

The noquote Example

One of the best examples of inconsistency in the underlying models is the noquote function and ``class''. The function returns its argument but with "noquote" either added to the class attribute or forming the class attribute by itself.

The purpose is to force use of the print.noquote method for printing the object: The method strips off the added string from the class and recalls print with a quote=FALSE argument, to suppress quotes on strings.

There is no class "noquote" in the formal sense; any object whatsoever can have that string in its class attribute. Conversely, two objects ``from'' another class can have different class attributes if one has passed through the noquote function.

It's not that this procedure is invalid, from the definition of S3 classes. But it has no possible mapping directly into S4 classes.

It's interesting to note, though, how the same intent would be implemented there. One could define a class "noquote". It would be a virtual class, only existing so actual classes could extend it. Print methods could be designed somewhat as before. It's probably more likely that not a large number of classes would extend "noquote" however; instead, something like "noquoteCharacter" would be a class of character vectors that printed without quotes.

John Chambers<jmc@research.bell-labs.com>
Last modified: Mon Aug 13 14:52:30 EDT 2001