Writing better wrappers with R’s ... (aka dots/ellipse)

December 31, 2018 - 5 minutes
code baseR

Use the dots Luke!

Intro

Writing wrapper functions can be a nice way to improve your coding experience. Wrappers let you customize function behavior, maybe with by providing non-default arguments automatically or tidying up the results before they get returned, either way wrappers are making your code more efficient. This post is how to use R’s ... (I read it as “dots”, but the formal definition is “ellipse”) to build custom versions table() and dir().

In R the ellipse, ..., is used by functions for one of two things.

  • to capture an unknown number of argmunts, as in ?table()

  • or to pass arguments through to some underlying function, as in ?print().

The main reason to use ... over explicitly defining arguments, is to keep all of the original function arguments available, without explicitly defining each one in the new wrapper.

tabla(): a NA friendly version of table()

When I’m doing analysis I think of NA values as canaries in a coal mine, they are valuable warning signs that something might be wrong. That is why the default behavior of table() is so troublesome for me, because it just silently drops them!

data_with_NAs <- data.frame(
  species = sample(c("dog", "cat", NA), 20, TRUE, c(.4,.4,.2))
  )

table(data_with_NAs$species)
## 
## cat dog 
##   5  10
table(data_with_NAs$species, useNA = "ifany")
## 
##  cat  dog <NA> 
##    5   10    5

The authors of table() thoughfully included the option to control the counting of NA values, even if they did set the default to hide them. If this was a function I only used once in a while, I’d just suck it up and type the extra agrument each time. But since I use table() repeatedly during data input/clean-up, it gets old real fast adding useNA = "ifany" each time. So I wrote tabla(), a wrapper with alternate defaults.

tabla <- function(...) {
  table(..., useNA = 'ifany')
}

tabla(data_with_NAs$species)
## 
##  cat  dog <NA> 
##    5   10    5

That was super simple and now I have a version that treats NA values like first class citezens! It is a little jankey though because if you try to re-assign the useNA = argument you will meet an error about mutliple matches.

tabla(data_with_NAs$species, useNA = "no")
## Error in table(..., useNA = "ifany"): formal argument "useNA" matched by multiple actual arguments

Don’t worry, we can fix it. By using match.call() to get all of the arguments passed in, we can handle conflicts as they come up and leave that error message behind.

tabla <- function(...) {
  call <- as.list(match.call())[-1] # first position is the function_name
  
  custom_args <- list(useNA = "ifany") # could extend this list for more customization
  
  overlap_args <-  names(call) %in% names(custom_args) # handle overlapping args
  if (!any(overlap_args)) call <- c(call, custom_args)
  
  do.call(table, call) # exectue table() with the custom settings
}

tabla(data_with_NAs$species, useNA = "no")
## 
## cat dog 
##   5  10

Now we have a high-quality wrapper around table() that you could confidently share with anyone interested in a better NA counting experience!

Edit I might have jumped the gun on refering to this strategy using match.call() as ‘high-quality’. When I shared this post on Twitter, Hadley dropped a comment, encouraging me NOT to use match.call() due to unforeseen bugs that might crop up. I haven’t had the time to explore it further, but he recommended the Base Evaluation section of Advanced R. Once I have a better handle on why this strategy can get you into trouble I’ll make a follow up post, but for now incase anyone stumbles upon this post, you have been warned!

der(): an Excel friendly dir()

If you work with Microsoft Excel files programatically, you have probably encountered the temporary files that Excel spawns into your file system while an Excel document is open. These temp files are prefixed with “~” to differentiate them, but dir() returns them anyway, just like they were normal files.

dir("~/Binfo/PA/PAD/PAD07/PAD0702/albumin/data/", "Plate")
## [1] "~$PAD0702_Albumin_Plate1_Table.xlsx"
## [2] "PAD0702_Albumin_Plate1_Table.xlsx"  
## [3] "PAD0702_Albumin_Plate2_Table.xlsx"  
## [4] "PAD0702_Albumin_Plate3_Table.xlsx"  
## [5] "PAD0702_Albumin_Plate4_Table.xlsx"  
## [6] "PAD0702_Albumin_Plate5_Table.xlsx"  
## [7] "PAD0702_Albumin_Plate6_Table.xlsx"

The problem is read-in functions like, openxlsx::read_xlsx() and readxl::read_excel() can’t parse the tempory files. And since I often have a file open for visual inspection, when I try to read it in to R, these errors are constantly causing problems.

So I wrote der(), a wrapper that drops the dreaded temporary files from the results. Because the ... allow everything to pass through to dir(), I still have access to the original arguments when I need them.

der <- function(...) {
  files <- dir(...)
  files[!grepl("~", files)]
}

der("~/Binfo/PA/PAD/PAD07/PAD0702/albumin/data/", "Plate", full.names = TRUE)
## [1] "/Users/nathanday/Binfo/PA/PAD/PAD07/PAD0702/albumin/data//PAD0702_Albumin_Plate1_Table.xlsx"
## [2] "/Users/nathanday/Binfo/PA/PAD/PAD07/PAD0702/albumin/data//PAD0702_Albumin_Plate2_Table.xlsx"
## [3] "/Users/nathanday/Binfo/PA/PAD/PAD07/PAD0702/albumin/data//PAD0702_Albumin_Plate3_Table.xlsx"
## [4] "/Users/nathanday/Binfo/PA/PAD/PAD07/PAD0702/albumin/data//PAD0702_Albumin_Plate4_Table.xlsx"
## [5] "/Users/nathanday/Binfo/PA/PAD/PAD07/PAD0702/albumin/data//PAD0702_Albumin_Plate5_Table.xlsx"
## [6] "/Users/nathanday/Binfo/PA/PAD/PAD07/PAD0702/albumin/data//PAD0702_Albumin_Plate6_Table.xlsx"

Problem solved, temporary Excel files are no longer returned! No more errors from the read-in functions and now I can happily open an Excel file and read it in to R at the same time, talk about living the life.

Conclusion

Using ... are an argument in wrapper functions is a great idea because it keeps all of the original arguments available for the end-user. There is a reason you see ... all over the place in exported R functions, because it’s super useful.

Understanding the Fizz-Buzz-Whatever problem with R

September 15, 2018 - 8 minutes
algorithms baseR runtime