Martin Maechler: When you think `class(.) == *`, think again!



Historical relict: R matrix is not an array

In a recent discussion on the R-devel mailing list, in a thread started on July 8, head.matrix can return 1000s of columns – limit to n or add new argument? Michael Chirico and then Gabe Becker where proposing to generalize the head() and tail() utility functions, and Gabe noted that current (pre R-4.x.y) head() would not treat array specially. I’ve replied, noting that R currently typically needs both a matrix and an array method:

Note however the following historical quirk :

sapply(setNames(,1:5),
       function(K) inherits(array(7, dim=1:K), "array"))

((As I hope this will change, I explicitely put the current R 3.x.y result rather than evaluating the above R chunk: ))

     1     2     3     4     5
  TRUE FALSE  TRUE  TRUE  TRUE

Note that matrix objects are not array s in that (inheritance) sense, even though — many useRs may not be aware of —

identical(
    matrix(47, 2,3), # NB  " n, n+1 " is slightly special
    array (47, 2:3))
## [1] TRUE

all matrices can equivalently constructed by array(.) though slightly more clumsily in the case of matrix(*, byrow=TRUE).

Note that because of that, base R itself has three functions where the matrix and the array methods are identical, as I wrote in the post: The consequence of that is that currently, “often” foo.matrix is just a copy of foo.array in the case the latter exists, with base examples of foo in {unique, duplicated, anyDuplicated} .

for(e in expression(unique, duplicated, anyDuplicated)) { # `e` is a `symbol`
    f.m <- get(paste(e, "matrix", sep="."))
    f.a <- get(paste(e, "array",  sep="."))
    stopifnot(is.function(f.m),
              identical(f.m, f.a))
}

In R 4.0.0, will a matrix() be an "array"?

In that same post, I’ve also asked

Is this something we should consider changing for R 4.0.0 – to have it TRUE also for 2d-arrays aka matrix objects ??

In the mean time, I’ve tentatively answered “yes” to my own question, and started investigating some of the consequences. From what I found, in too eager (unit) tests, some even written by myself, I was reminded that I had wanted to teach more people about an underlying related issue where we’ve seen many unsafe useR’s use R unsafely:

If you think class(.) == *, think again:            Rather inherits(., *) …. or is(., *)

Most non-beginning R users are aware of inheritance between classes, and even more generally that R objects, at least conceptually, are of more than one “kind”. E.g, pi is both "numeric" and "double" or 1:2 is both integer and numeric. They may know that time-date objects come in two forms: The ?DateTimeClasses (or ?POSIXt) help page describes POSIXct and POSIXlt and says

"POSIXct" is more convenient for including in data frames, and "POSIXlt" is closer to human-readable forms. A virtual class "POSIXt" exists from which both of the classes inherit …

and for example

(tm <- Sys.time())
## [1] "2019-12-05 11:47:54 CET"
class(tm)
## [1] "POSIXct" "POSIXt"

shows that class(.) is of length two here, something breaking a if(class(x) == "....") .. call.

Formal Classes: S4

R’s formal class system, called S4 (implemented mainly in the standard R package methods) provides functionality and tools to implement rich class inheritance structures, made use of heavily in package Matrix, or in the Bioconductor project with it’s 1800+ R “software” packages. Bioconductor even builds on core packages providing much used S4 classes, e.g., Biostrings, S4Vectors, XVector, IRanges, and GenomicRanges. See also Common Bioconductor Methods and Classes.

Within the formal S4 class system, where extension and inheritance are important and often widely used, an expression such as

if (class(obj) == "matrix")  { ..... }   # *bad* - do not copy !

is particularly unuseful, as obj could well be of a class that extends matrix, and S4 using programmeRs learn early to rather use

if (is(obj, "matrix"))  { ..... }        # *good* !!!

Note that the Bioconductor guidelines for package developers have warned about the misuse of class(.) == * , see the section R Code and Best Practices

Informal “Classical” Classes: S3

R was created as dialect or implementation of S, see Wikipedia’s R History, and for S, the “White Book” (Chambers & Hastie, 1992) introduced a convenient relatively simple object orientation (OO), later coined S3 because the white book introduced S version 3 (where the blue book described S version 2, and the green book S version 4, i.e., S4).

The white book also introduced formulas, data frames, etc, and in some cases also the idea that some S objects could be particular cases of a given class, and in that sense extend that class. Examples, in R, too, have been multivariate time series ("mts") extending (simple) time series ("ts"), or multivariate or generalized linear models ("mlm" or "glm") extending normal linear models "lm".

The “Workaround”: class(.)[1]

So, some more experienced and careful programmers have been replacing class(x) by class(x)[1] (or class(x)[1L]) in such comparisons, e.g., in a good and widely lauded useR! 2018 talk.
In some cases, this is good enough, and it is also what R’s data.class(.) function does (among other), or the (user hidden) methods:::.class1(.).

However, programmeRs should be aware that this is just a workaround and leads to their working incorrectly in cases where typical S3 inheritance is used: In some situtation it is very natural to slightly modify or extend a function fitme() whose result is of class "fitme", typically by writing fitmeMore(), say, whose value would be of class c("fMore", "fitme") such that almost all “fitme” methods would continue to work, but the author of fitmeMore() would additionally provide a print() method, i.e., provide method function print.fMore().

But if other users work with class(.)[1] and have provided code for the case class(.)[1] == "fitme" that code would wrongly not apply to the new "fMore" objects.
The only correct solution is to work with inherits(., "fitme") as that would apply to all objects it should.

In a much depended on CRAN package, the following line (slightly obfuscated) which should efficiently determine list entries of a certain class

isC <- vapply(args, class, "") == "__my_class__"

was found (and notified to the package maintainer) to need correction to

isC <- vapply(args, inherits, TRUE, what = "__my_class__")

Summary:

Instead class(x) == "foo", you should use inherits(x, "foo")
    or maybe alternatively is(x, "foo")

Corollary:

switch(class(x)[1],
       "class_1" = { ..... },
       "class_2" = { ..... },
       .......,
       .......,
       "class_10" = { ..... },
       stop(" ... invalid class:", class(x)))

may look clean, but is is almost always not good enough, as it is (typically) wrong, e.g., when class(x) is c("class_7", "class_2").

References

  • R Core Team (2019). R Help pages:

  • Becker, R. A., Chambers, J. M. and Wilks, A. R. (1988) The New S Language (the blue book, introducing S version 2 (S2)); Wadsworth & Brooks/Cole.

  • Chambers, J. M. and Hastie, T. J. eds (1992) Statistical Models in S (the white book, introducing S version 3 (S3); Chapman & Hall, London.

  • Chambers, John M. (1998) Programming with Data (the green book, for S4 original); Springer.

  • Chambers, John M. (2008) Software for Data Analysis: Programming with R (S4 etc for R); Springer.