R: Setting options - Data and Code

When developing a package or a set of functions, you often needs a lot of options. Often, you would want to set some sensible defaults for each option whilst giving users the flexibility to customize and extend. Take ggplot2 for example, if you ever used it, you know the background of the plots is, by default, grey, though You have the flexibility to change it.

While there are many ways to set options, I haven’t found a summary of pros and cons of different approaches and I certainly don’t know what best practices are. So I guess it could be useful to poke around several packages and see how they approach option setting.

ggplot2

ggplot2 use an environment object, theme_env, to hold all theme related options. It is defined in the namespace:ggplot2 but not exported:

getAnywhere(theme_env)

## A single object matching 'theme_env' was found
## It was found in the following places
##   namespace:ggplot2
## with value
## 
## <environment: 0x000000001e08c0a8>

Also note that the parent environment for theme_env is the emptyenv(). This ensures that R does not look else where if it can’t find a setting in theme_env. ggplot2 also provides several functions theme_***() to allow users to change the theme.

So it seems almost trivial if you’re writing a package: just create an option object and it will live in the package namespace not in the gloabl environment. Okay at least I learned something obvious. But if you don’t want to package your functions up for some reason and you don’t want to expose your option object in global envrionment, how do you go about implementing that?

It might be tempting to create a global object holding all your options with the name of the object starting with “.” which makes it a “hidden” object that the user is not aware of until they type ls(all.names = TRUE). However, this does not isolate users from the object: you will need to worry about cases like what if the users removed it? or what if the user supplied an invalid options?

tibble

Let’s take a look at tibble package which is a modern reimagination of data.frame.¹ It’s approach to option setting follows the same pattern: there is a list object, op.tibble, that stores options and that is not exported:

getAnywhere(op.tibble)

## A single object matching 'op.tibble' was found
## It was found in the following places
##   namespace:tibble
## with value
## 
## $tibble.print_max
## [1] 20
## 
## $tibble.print_min
## [1] 10
## 
## $tibble.width
## NULL
## 
## $tibble.max_extra_cols
## [1] 100

However, tibble package does not provide a getter or setter function for users to change the default options, instead it relies on base::options() function. For example, if you print a tibble, by default, the console shows a maximum of 10 rows if your tibble has more than 20 rows. You can change this behaviour by options(tibble.print_max = n, tibble.print_min = m), which means if there are more than n rows, print only the first m rows.

In fact many packages use options() to allow the user to change defaults, below I am listing options for data.table, profvis and shiny (for some reason, the options for shiny and profvis only show up if you use rstudio, I guess the options are rstudio specific):

options() %>% names %>% str_subset("profvis|datatable|shiny")

##  [1] "datatable.alloccol"             "datatable.allow.cartesian"     
##  [3] "datatable.auto.index"           "datatable.dfdispatchwarn"      
##  [5] "datatable.fread.datatable"      "datatable.fread.dec.experiment"
##  [7] "datatable.fread.dec.locale"     "datatable.integer64"           
##  [9] "datatable.nomatch"              "datatable.old.unique.by.key"   
## [11] "datatable.optimize"             "datatable.print.class"         
## [13] "datatable.print.nrows"          "datatable.print.rownames"      
## [15] "datatable.print.topn"           "datatable.showProgress"        
## [17] "datatable.use.index"            "datatable.verbose"             
## [19] "datatable.warnredundantby"

This approach is cognitively easier for users - the user don’t need to learn option getters or setters that are different for different packages, and more importantly, you don’t need to create a global option object: simply move all your options into options() function! The downside is that you can potentially have option clashes with other packages; it’s a good idea to prefix your options with your package name. Further more, you can’t check that the options supplied is valid when the user supply them. In other words, it still fully exposes the options and you need to worry about that. Is there a better solution?

knitr

When writing an Rmarkdown file, the first code chunck, by default, look like this: knitr::opts_chunk$set(echo = TRUE). Here, opts_chunk is a list of five functions through which users can interact with code chunk options; opts_chunk acts like an “option manager”.

ls(knitr::opts_chunk)

## [1] "append"  "get"     "merge"   "restore" "set"

knitr is somewhat different. It uses a function factory pattern. When compiling the package, the opts_chunck is returned by an internal function knitr:::new_defaults(). Because R functions are implemented as closure, the functions enlisted in opts_chunck (i.e. append(), get(), etc.) are enclosed in the execution environment of knitr:::new_defaults() when the package is built. To see if this is true, open another R session, and environment(knitr::opts_chunk$get) will give you a different environment.

This approach shows it’s advantage if you don’t want to package up your functions: you can avoid creating a global object that holds your options: store your options in your function! More importantly, one can implement checks, at the time when the user supplies options, to ensure that they are valid. Below is a simple example: I created a global object my_opts, my “option manager”, and two default options a = 1 and b = 2.

default_options <- list(a = 1, b = 2)

new_defaults <- function(current_options) {
  
  # create a copy of default options 
  # so users can reset to defaults
  old <- current_options
  
  # create an "option manager", my_opts, 
  # and assign it into global environment
  my_opts <<- list(
    
    # getter function 
    get = function(x) {
      # return all options by default
      if (missing(x)) current_options 
      else if (x %in% names(current_options)) 
        # return specified options
        current_options[[x]]
      else
        # throw a more informative error message 
        # if the user type invalid option name
        stop("No such option available")
    },
    
    # setter function
    set = function(x, new_value) {
      if (missing(x) | missing(new_value)) 
        stop("Please specify both x and new_value")
      if (x %in% names(current_options)) {
        # perform some test to check that 
        # the new_value is valid before change settings
        # get a reference to the enclosing environment
        option_env <- parent.env(environment()) 
        # assign new_value into options
        option_env$current_options[[x]] <- new_value
      } else stop("No such option available")
    },
    reset = function() {
      current_options <<- old
    }
  )
}
new_defaults(default_options)
rm(new_defaults, default_options)

Let’s test if getter works:

# get option a
my_opts$get("a")

## [1] 1

my_opts$get()

## $a
## [1] 1
## 
## $b
## [1] 2

Test if error message works:

my_opts$set("d")
# Error in my_opts$set("d") : Please specify both x and new_value

Test if the setter works:

my_opts$set("a", 999)
my_opts$get("a")

## [1] 999

Test if my_opts$reset() works

my_opts$reset()
my_opts$get("a")

## [1] 1

Or, if you don’t want to use an “option manager” (because it appears under “Data” section on the upper right panel of the Rstudio), we can wrap everything into one function that allows users to interact with options. By the way, I marvel at the flexibility of R: you can change the enclosing environment of a function after it’s been created:

my_opts2 <- function(mode = c("get", "set", "reset"), x, new_value) {
  mode <- match.arg(mode)
  
  switch(
    mode,
    get = {
      if (missing(x)) current_options
      else if (any(names(current_options) %in% x))  
        current_options[names(current_options) %in% x]
      else stop("No such option available")
    },
    set = {
      if (missing(x) | missing(new_value)) 
        stop("Please specify both x and new_value")
      if (x %in% names(current_options)) {
        option_env <- parent.env(environment()) 
        option_env$current_options[[x]] <- new_value
      } else stop("No such option available")
    },
    reset = current_options <<- old
  )
}
option_env <- new.env(parent = .GlobalEnv)
option_env$old <- option_env$current_options <- list(a = 1, b = 2)
environment(my_opts2) <- option_env
rm(option_env)

my_opts2("get", "a")
## $a
## [1] 1
my_opts2("set", "b", 0)
my_opts2("get", "b")
## $b
## [1] 0
my_opts2("reset")
my_opts2()
## $a
## [1] 1
## 
## $b
## [1] 2

I see where I am getting at…

Mutable state!

It’s probably an overkill to implement mutable state object simply to set some options… But it could be useful if I don’t want my programming skills to stay still. Let me come back at it later! Read RC, or R6 from Hadley’s fantastic book!

See tibble’s github respository ↩