The Definition of Generic Functions and Methods

The following notes review the model for generic functions and the methods defined for them, in the S language generally and in the R implementation in particular. The description in the book Programming with Data (Springer, 1998) is the general background, with some of the ideas below being revisions or extensions of the programming model described there.

Methods in the S Language

Methods play a different role in the S language than they do in other major programming languages. Analogies with other languages can be useful, but only if they are re-interpreted keeping the different underlying philosophy in mind. Two distinctions are particularly important: S is a functional language, and its use covers a spectrum from casual interaction to large-scale programming.

OOP and Functional Languages

S is a functional language, and methods in the language are function-based, not class-based.

The most commonly encountered languages emphasizing methods follow what is generally called the Object-Oriented Programming (OOP) model; more precisely, the languages follow a class-based organization of software. For the most part, programming is organized around the definition of classes of objects. Methods are invoked on an object. In S-style notation,


        x$plot(y)

invokes the plot method on the object x, passing it an argument y.

The languages are class-based in that a method is determined based only on the class of the object; generally, no other properties of x are relevant in determining the method to be invoked. (There have been some instance-based languages in the past, but as far as I know none of these are in serious use now.) The essential organizing category of these languages is the class. In Java, for example, essentially all programming is done by defining classes.

In particular, aside from possible inheritance there is no necessary connection between the method called plot defined for two different classes.

The OOP model is useful in many contexts. It can be added to the S language model as a specialized package, with many useful applications (see the Omegahat OOP package for an experiment in this direction). However, OOP computations must be a specialized addition to the basic model.

The basic organizing principle of programming in the S language is the function definition. Programmers spend most of their time writing function definitions, and the S evaluator spends most of its time evaluating calls to functions. Other program-organizing concepts have evolved as important adjuncts or extensions to function definitions. Packages or libraries allow sets of functions to be grouped together meaningfully. Formal classes and methods organize information about objects and functions in a more explicit and distributed way.

But the model (or at least my model) is that most programming in the S language evolves from a user/programmer wanting do do something with data. The something to be done is expressed as one or more functions, which typically start out simple and then evolve. Methods arise as part of the evolution: The definition of functions very often depends on the properties of the objects passed as arguments, and method and class definitions are often the best way to encapsulate such properties.

The distinction between function-based and class-based methods has implications for many aspects of the language. For example, function-based methods need to be integrated with the user's understanding of the function they specialize, which has implications for argument definition (see below).

Interaction and Programming

The S language is used interactively to a much greater extent than other languages supporting formal classes and methods. The other languages supporting function-based methods (chiefly Common Lisp and more recent languages such as Dylan influenced by Common Lisp) deal with programming in the sense of constructing complete program units, typically as files, which are then used in some way.

On the other hand, nearly all discussions of the S language, even those emphasizing programming, are written against the assumed background of S expressions being evaluated interactively, typically in response to something a user types. Programming in the S language aims to extend the computations available for such interaction. In contrast, for example, to Dylan, S does not have a concept of a ``complete program''. Instances of the S evaluator are processes that go on indefinitely, waiting for expressions to evaluate.

A corollary to the importance of interaction is that it comes as a continuum from the user/programmer's perspective. At one end are expressions so simple that they are typed straight-off without pause (or, perhaps, hidden behind a graphical interface). But even a fairly simple user interface supplements this with the ability to recall expressions, cut-and-paste editing changes, and navigate around the text of the expression before evaluating it. The recent history of the interactive session becomes an informal programming environment.

Moving from this stage to defining functions is the major leap from interaction to programming. But simple function definitions can be entered directly from the command line, and editing a small source file that will be parsed immediately is only a little less interactive.

Many of the innovations throughout the history of the language have tried to help the user's evolution from simple interaction to increasingly extensive programming. Both the early, informal method definitions and the formal class/method mechanisms are best seen from this perspective. An implication for the design of the language, and in particular for that of the method mechanism, is that we should avoid making the user do a large amount of programming in order to add a conceptually simple extension to what exists.

Formal Arguments and Method Dispatch

This section discusses questions related to the formal arguments for generic functions and how these can be coordinated with the design of methods.

As the term method suggests, a method is a definition of how a function call should be evaluated, as determined by the classes of the actual arguments. It is part of the current API for formal methods that argument matching takes place at the call to the (generic) function: Arguments are not re-matched after the method is selected. Therefore, the arguments of the method are treated as identical to those of the generic function. Not re-matching arguments is moderately important for efficiency of method dispatch, but more fundamentally, any general departure from this requirement would confuse the semantics of method dispatch. (If an argument that was used to select the method is then re-matched to a different formal argument, is the method selection still valid and meaningful?)

This model for method dispatch has implications for choosing formal arguments for generic functions, and for the design of methods. With a little care, the designer of methods can have full flexibility in dealing with arguments.

From the viewpoint of someone designing a method for a particular generic function, possible formal arguments may fall into three categories:

Formal arguments in the generic function that are meaningful and, in particular, may be involved in selecting methods (that is, the class of the object corresponding to the argument might be part of the signature of the method);
Formal arguments that appear in the generic (and quite possibly might be part of the signature for some other method), but which are meaningless for this method;
Arguments that are meaningful for this method but not generally, and not currently formal arguments to the generic function.

The first category raises no problems. The second and third can be handled in a reasonably convenient way as well, but need some consideration. And just which arguments are important enough to be included in the generic (i.e., should they fall in the second or third category) will always be open to discussion.

Two examples will illustrate the issues. The function plotis defined in the R methods package to have arguments:


  plot(x, y, ...)

The arguments x and y represent the datasets providing values to plot on the x- and y-axis respectively. Both arguments are included in the generic, since it may well be useful to define methods based on either dataset. Some methods, however, will be defined for only a single object. At the same time, the definition of the generic function implies that additional arguments are not relevant in dispatching methods for plot. Individual methods can have specific needs for additional arguments, however.

As a second example, consider the "[" operator. The formal arguments for this operator in the R methods package are:


  x[i, j, ..., drop]

Here, there are three arguments included in the generic that might reasonably be part of a method signature: the object x for which as subset is extracted or replaced, and the first and second subscripts, i and j. (Including these enables methods to be defined separately, for example, for text-based and numerical subsetting, or for other more specialized subsetting situations.) Once again, for many methods the second subscript will not be meaningful, but matrix and matrix-like objects are so central to applications of the S language that we need to enable methods for subsetting such objects. The drop argument, on the other hand, is also specialized to matrix- or array-like objects and is not likely to be useful in method selection. If we were starting over, this argument would more naturally fall into our third category, but it's there (for now) as part of the traditional S language definition.

The suggested approaches to handling conceptual differences in function and method arguments are as follows.

Arguments Not Meaningful in Methods

The formal method-dispatch model for the S language provides directly for arguments that are not meaningful in a particular method. These arguments should appear in the signature of the method with class "missing". The corresponding method will then never be selected if the call includes the meaningless argument. The point for the method designer to keep in mind is the distinction between including the argument with class "missing" and omitting the argument from the signature. The latter implies that any object may appear as this argument (it corresponds formally to class "ANY" in the signature), which is not correct if the argument is not meaningful.

The R implementation of methods provides for an equivalent way of specifying that an argument should not be included. If the method definition is a function whose formal arguments are a subset of those to the generic (in the same order) then the setMethod function infers that missing formal arguments are to have class "missing" in the signature. While this corresponds to a notion of conforming arguments in the Dylan language, it is pretty much just syntactic sugar in S.

Arguments that are Meaningful for Some Methods

The suggestion for handling such arguments is that they be made formal arguments to a function that is called as the corresponding method, and that the generic function include ... as a formal argument to allow the arguments to be passed down.

Again, this is not an extension to the language, just making use of existing features. For example, suppose a method is defined for subsetting objects maintained in some particular remote database, and suppose we want an argument copy to the method that says whether to copy the subset or create a remote reference for it. There is no copy argument to the generic, and it's reasonable to say that the concept isn't sufficiently general to justify redefining the generic.

The suggestion is to define a function, say subsetRemote with the appropriate argument list, and to call that function as the method:


 subsetRemote <- function(x, i, copy = TRUE)
  { .... }

 setMethod("[", "remoteObject",
    function(x, i, ...) subsetRemote(x, i, ...)
 )

With this definition, an expression like


 newObj <- myObj[sample(length(myObj), 1000), copy = FALSE]

would invoke the method, assuming myObj had a suitable class extending "remoteObject".

A few points of detail are relevant. First, notice that while the formal method uses the ... argument from the generic, the actual function defining the method does not, with the result that invalid argument names will be detected. On the other hand, the requirement that there be no argument re-matching means that the special arguments must pass through ..., which in turn means that copy must be supplied by name (otherwise it would match j). As a third point, notice that we've used the R feature of omitting the j argument from the method definition, forcing that argument to be missing.

There is an objection to this mechanism, in that as written it clutters up the name space with the additional function (subsetRemote in this case). The proposals for namespaces in R would be helpful here. It is possible to embed the new function in the method itself, at the cost of less readable code and some (trivial) extra computation:



 setMethod("[", "remoteObject",
    function(x, i, ...) {
       subsetRemote <- function(x, i, copy = TRUE)
        { .... }
      subsetRemote(x, i, ...)
    }
 )

We could provide a convenience mechanism for this feature as we did for the missing arguments. For example, setMethod could interpret arguments not found in the generic to create a function in the form shown. (The current Splus implementation does something similar.) On the whole, such a mechanism seems a little dangerous.

John Chambers <jmc@research.bell-labs.com>

Last modified: Wed Jan 2 11:40:39 EST 2002