Luke Tierney: Managing Search Path Conflicts



Starting with R 3.6.0 the library() and require() functions allow more control over handling search path conflicts when packages are attached. The policy is controlled by the new conflicts.policy option. This post provides some background and details on this new feature.

Background

When loading a package and attaching it to the search path, conflicts can occur between objects defined in the new package and ones already provided by other packages on the search path. Sometimes these conflicts are intentional on the part of a package author; for example, dplyr provides its own versions of several functions from base and stats:

library(dplyr)
## 
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
## 
##     filter, lag
## The following objects are masked from 'package:base':
## 
##     intersect, setdiff, setequal, union

In other cases the conflict is not intended and can cause problems. Students in my classes often load the MASS package after dplyr and then get

library(MASS)
## 
## Attaching package: 'MASS'
## The following object is masked from 'package:dplyr':
## 
##     select

If select is used at this point, the one from MASS will be used; if the intent was to use the one from dplyr, then this results in error messages that are difficult to understand.

By default, library() and require() do show messages about the conflicts, but they are easy to miss. And they will not be visible at all in an Rmarkdown document if the calls appear in code chunks that do not show their results at all or use the message = FALSE chunk option.

The new mechanism is intended to provide more control over how conflicts are handled, in particular allowing for stricter checks if desired. Different levels of strictness may be appropriate in different contexts. Some contexts, ordered by the level of strictness that might make sense:

  1. Interactive work in the console.
  2. Interactive work in a notebook.
  3. Code in an Rmarkdown or Sweave document.
  4. Scripts.
  5. Packages.

Conflicts in packages are already handled by the NAMESPACE mechanism. The others could use some help. Scripts in particular might benefit from being able to specify exactly which objects should be made available from the packages used, much like specifying explicit imports in a package NAMESPACE file.

Another useful distinction is between anticipated and unanticipated conflicts:

  • Anticipated conflicts occur when a single package is attached that in turn might cause other packages to be attached. The package author will see the messages, and should address any ones that are not as intended. These conflicts should usually not require user intervention.

  • Unanticipated conflicts arise when a user requests that two packages be attached that may not have been designed to be used together. Conflicts in these cases will typically require the user to make an appropriate choice.

In the initial examples, the conflicts caused by loading dplyr would be anticipated conflicts, but the one caused by then loading MASS would be an unanticipated conflict. For interactive work, as well as code in Rmarkdown documents, it would be useful to be able to use a setting that allows anticipated conflicts, but signals an error for unanticipated conflicts that are not explicitly resolved.

Specifying a Policy

A number of aspects of the conflict handling process can be customized via the conflicts.policy option. The value of this option can be a string naming a standard policy, or a list with named elements specifying policy details. Supported named policies are "strict" and "depends.ok". Policy elements that are supported:

  • error: If TRUE, then conflicts produce errors.
  • generics.ok: If set, this determines whether masking a function with an S4 generic version is considered a conflict. The default is FALSE for strict checking and TRUE otherwise.
  • can.mask: A character vector of names of packages that are allowed to be masked without producing an error. Specifying the base packages can reduce the number of explicit masking approvals needed.
  • depends.ok: If TRUE, then allow all conflicts produced within a single package load.
  • warn: Sets the default for the warn.conflicts argument to library and require.

A specification that may work for most users who want protection against unanticipated conflicts:

options(conflicts.policy =
            list(error = TRUE,
                 generics.ok = TRUE,
                 can.mask = c("base", "methods", "utils",
                              "grDevices", "graphics",
                              "stats"),
                 depends.ok = TRUE))

This specification assumes that package authors know what they are doing, and all conflicts from loading a package by itself are intentional and OK. With this specification all CRAN and BIOC packages should individually load without error. A shortcut provided for specifying this policy is the name "depends.ok":

options(conflicts.policy = "depends.ok")

A strict policy specification would be

options(conflicts.policy = list(error = TRUE, warn = FALSE))

This can also be specified as the named policy "strict":

options(conflicts.policy = "strict")

Resolving Conflicts

With a "strict" policy, attempting to load a conflicting package produces an error:

options(conflicts.policy = "strict")
library(dplyr)
## Error: Conflicts attaching package ‘dplyr’:
##
## The following objects are masked from ‘package:stats’:
##
##     filter, lag
##
## The following objects are masked from ‘package:base’:
##
##     intersect, setdiff, setequal, union

The error signaled with strict checking is of class packageConflictsError with fields package and conflicts.

Explicitly specifying names that are allowed to mask variables already on the search path avoids the error:

library(dplyr,
        mask.ok = c("filter", "lag",
                    "intersect", "setdiff", "setequal",
                    "union"))

With the "depends.ok" policy, calling library(dplyr) will succeed and produce the usual message about conflicts.

With both "strict" and "depends.ok" policies, loading MASS after dplyr will produce an error:

library(MASS)
## Error: Conflicts attaching package ‘MASS’:
##
## The following object is masked from ‘package:dplyr’:
##
##     select

One way to eliminate this error is to exclude select when attaching the MASS exports:

library(MASS, exclude = "select")

The select function from MASS can still be used as MASS::select.

To make it easier to use a strict policy for commonly used packages, and for implicitly attached packages, conflictRules provides a way to specify default mask.ok and exclude values for packages, e.g. with

conflictRules("dplyr",
              mask.ok = c("filter", "lag",
                          "intersect", "setdiff",
                          "setequal", "union"))
conflictRules("MASS", exclude = "select")

To allow masking only variables in specific packages:

library(dplyr, mask.ok = list(stats = c("filter", "lag"),
                              base = c("intersect", "setdiff",
                                       "setequal", "union")))

To allow masking all variables in specific packages:

library(dplyr, mask.ok = list(base = TRUE, stats = TRUE))

library() has some heuristics to avoid warning about S4 overrides. These heuristics mask the overrides in Matrix, but not those in BiocGenerics. These heuristics are disabled by default when strict conflict checking is in force, so they have to be handled explicitly in some way.

This specification would allow Matrix to be loaded with strict checking:

conflictRules("Matrix",
              mask.ok = list(stats = TRUE,
                             graphics = "image",
                             utils = c("head", "tail"),
                             base = TRUE))

Similarly, this will work for BiocGenerics:

conflictRules("BiocGenerics",
              mask.ok = list(base = TRUE,
                             stats = TRUE,
                             parallel = TRUE,
                             graphics = c("image", "boxplot"),
                             utils = "relist"))

In the future it might be worth considering whether this information could be encoded in a package’s DESCRIPTION file.

Additional Strictness

The arguments include.only and attach.required have been added to library() and require() to provide additional control in a script using a "strict" policy:

  • If include.only is supplied as a character vector, then only variables named there are included in the attached frame.
  • Packages specified in the Depends field of the DESCRIPTION file are only attached when attach.required is TRUE. The default value of attach.required is TRUE if include.only is missing, and FALSE if include.only is supplied.

Other Approaches

Other approaches to handling conflicts are provided by packages conflicted and import; there are also several packages by the name modules.

The conflicted package arranges to signal an error when evaluating a global variable that is defined by multiple packages on the search path, unless an explicit resolution of the conflict has been specified.

Being able to ask for stricter handling of conflicts is definitely a good idea, but it seems it would more naturally belong in base R than in a package. There are several downsides to the package approach, at least as currently implemented in conflicted, including being fairly heavy-weight, confusing find, and not handling packages that have functions that do things like

foo <- function(x) { require(bar); ... }

Not that this is a good idea, but it is out there.

The approach taken in conflicted of signaling an error only when a symbol is used might be described as a dynamic approach. This dynamic approach could be implemented more efficiently in base R by adjusting the global cache mechanism. The mechanism introduced in R 3.6.0 in contrast insists that conflicts be resolved at the point where the library() or require() call occurs. This might be called a static or declarative approach.

To think about these options it is useful to go back to the range of activities that might need increasing levels of strictness in conflict handling listed at the beginning of this post. For Rmarkdown documents and scripts the static approach seems clearly better, as long as a good set of tools is available to express conflict resolution concisely. For notebooks I think the static approach is probably better as well. For interactive use it may be a matter of personal preference. Once I have decided I want help with conflicts, my preference would be to be forced to resolve them on package load rather than to have my work flow interrupted at random points to go and sort things out.