Functions get, assign, etc.

Problems and Proposed Solutions for Functions that Access Objects

The functions get, exists, and assign all take a second argument that refers, generally, to where the action of the function should take place. The functions objects and rm also take similar arguments.

From the user's view, where can be a workspace image, an attached package, an environment (maybe of a currently active call or maybe a special environment). In the future we will likely want it to represent other things as well, for example when we're interfacing to database software.

The functions should allow any meaningful object and should then behave consistently, up to differences in what they are doing.

Right now, that's not entirely the case. For example, to supply an environment as an argument, you must use a separate envir= argument.

The following list suggests some changes for this and related problems. Items 1 and 2 are being pushed for immediate action; the remainder seem desirable, but are either not quite as back-compatible or else depend on other changes.

The proposed changes for 1 and 2 have been added to the development branch.

  1. Allow environments to be supplied directly.

    There's really quite a clean concept operating here: whatever is specified as an argument is essentially coerced to be an R environment. Unfortunately, it's done a different way depending on what the user supplies and on which function is called. Numbers are passed to pos.to.env. Character strings are first explicitly matched (by some code that relies on lazy evaluation to work). And the simplest case, when the user supplies an environment, requires the actual call to be different.

    That last problem causes messy, inefficient, and error-prone programming in higher-level functions that try to treat environments and other databases uniformly.

    The proposed solution is to replace the calls to pos.to.env by calls to a new function, as.environment, which would work uniformly for the various cases.

    At the same time, this eliminates the ad hoc code dealing with character string arguments, turning the body of get, exists, and assign just into .Internal calls. The claim is that this change is back-compatible with existing code.

    For efficiency, the current as.environment implementation should be replaced by a version in C.

  2. The arguments to the objects function.

    In principle, the first argument to objects is just another where argument. The current implementation has 3 related arguments, name, pos and envir, and the treatment of the first one is, well, strange.

    
        if (!missing(name)) {
            if (!is.numeric(name) || name != (pos <- as.integer(name))) {
                name <- substitute(name)
                if (!is.character(name)) 
                    name <- deparse(name)
                pos <- match(name, search())
            }
            envir <- pos.to.env(pos)
        }
    
    I _think_ the intention here is to allow the names on the search list to be supplied with or without quotes, so "package:base" or package:base would both work.

    If so: Alas, it won't fly. The culprit is that expression is.numeric(name), which will have to evaluate name. Lots of luck with package:base! Generally, you can't put a substitute(x) call into anything conditional on the value of x and not expect to go down in flames.

    Because the subsitute is used unless the evaluation produces a numeric, one can't supply an expression that evaluates to a string or an environment.

    The proposed modified version retains the basic idea, but evaluates name in a try() expression and performs the subsitute only if the try fails. This should (usually!) even work for package:base.

  3. Inconsistent arguments.

    The argument lists of get, exists, and assign aren't consistent with each other, or with S-Plus. (The S-Plus arguments aren't entirely consistent either, but closer.)

    Two issues are whether it's where (as in exists and in S-Plus) or pos (as in the other functions) and whether a separate frame argument is allowed (yes in exists and in S-Plus, no elsewhere). Where frame is allowed, it's equivalent to supplying the environment sys.frame(frame), but the semantics are a bit confusing.

    3.(a) For the first issue, it would be nice to use where throughout, but there is clearly a serious compatibility issue.

    3.(b) For the second, one could add a frame argument and treat it consistently by making the default expression for the environment be (if(missing(frame)) as.environment(where) else sys.frame(frame)) This is what the revised version of exists does.

  4. rm and remove

    Right now, these are identical. That doesn't seem too useful, and it would be mildly more convenient if remove expected a character vector, as it does in S-Plus. It would just let one say remove(objects(....)) rather than remove(list = objects(....)) (On the other hand, at least both rm and remove in R take a position argument, which is more consistent than S-Plus, where remove does but rm doesn't.)

  5. Methods

    What's really going on with the where argument is that we want to make the various functions behave correctly for the database defined by where.

    In other words, we would like get and the other functions to have methods based on the where argument.

    We can't do that directly until S4-style methods are introduced, since for most of the functions the corresponding argument is not the first one.

    As an interim solution, we could make the as.environment function mentioned in item 1 into an S3-style generic. If it became a primitive (as pos.to.env is now) we could dispatch it from the C code if efficiency is a big deal here (which I doubt).

    The interim solution is not as good, though, for future work, because in some cases (imagine various database interfaces that let you attach a database table) you don't want to create in intermediate R environment object, but rather to go off to the appropriate interface directly for each of the functions. For that, you really do want methods for get, exists, etc.

  6. The -1 value for pos and the inherits argument.

    In terms of the current implementation, this amounts to what pos.to.env(-1) means. From the implementation and the error message, it means "the environment of the parent of the call to pos.to.env". The documentation of pos.to.env, on the other hand, claims you have to give a positive integer as an argument.

    The actual semantics are mostly OK but the situation is a bit subtle. When no where or pos argument is given exists or get, the intended semantics is "search for the name in this function call and then in the search list". What seems confusing, if not downright inconsistent, is that the -1 argument behaves rather differently depending on the function.

    For example, the functions rm and assign have optional inherits arguments also. If inherits is TRUE, then both functions search back through parents to match names. Since both functions are ``destructive'' to the environments they work in, I don't think it's a great idea to have them swinging back into, for example, the global environment from inside a function call. For rm this is inconsistent with S-Plus, and rather nervous-making. For assign, it seems bizarre--why would the existence of an object with the same name affect the behavior of assign? (At least inherits is FALSE by default in these functions.)


John Chambers<jmc@research.bell-labs.com>
Last modified: Mon Aug 13 14:57:05 EDT 2001