A function has three parts:
formals()
, the list of arguments that control how you call the function.body()
, the code inside the function.environment()
(sometimes called “enclosing environment”), the data structure that determines how the function finds the values associated with the names.<- function(x) {
square ^2
x
}formals(square)
## $x
body(square)
## {
## x^2
## }
environment(square)
## <environment: R_GlobalEnv>
Below: Black dot is function environment (aka enclosing environment) where the function finds objects, squares are arguments. Body is not graphed.
Exception: Many base functions are written directly in C and don’t have these three components.
typeof(square)
## [1] "closure"
typeof(sum)
## [1] "builtin"
typeof(`[[`)
## [1] "special"
body(sum)
## NULL
formals(sum)
## NULL
Functions are objects, just like any other variable.
Sometimes, functions are called closures because they “enclose” their environments (see Chapter 7). That’s why you sometimes see the error
1] lm[
## Error in lm[1]: object of type 'closure' is not subsettable
I.e., you cannot subset a function.
So you can pass functions as arguments in a function (like in optim()
).
<- function(par, dat) {
f sum((par - dat)^2)
}<- rnorm(100)
dat <- optim(par = 0, fn = f, dat = dat, method = "L-BFGS-B")
oout $par oout
## [1] -0.1456
mean(dat)
## [1] -0.1456
Functions can return other functions (like in ecdf()
).
<- c(1, 99, 2, 11)
x <- ecdf(x)
efun efun(10)
## [1] 0.5
Functions can be elements of a list.
<- list(
funs half = function(x) x / 2,
double = function(x) x * 2
)
$double(10) funs
## [1] 20
Because functions are objects, Hadley graphically represents name bindings in a similar way to variables:
Functions can also be anonymous in R
function(x) x^2)(2) (
## [1] 4
function(x) x^2)(3) (
## [1] 9
## Integrate anonymous function from -infinity to infinity
integrate(function(x) exp(-x^2), -Inf, Inf)
## 1.772 with absolute error < 4.3e-06
## apply anonymous function to each column of mtcars
sapply(mtcars, function(x) length(unique(x)))
## mpg cyl disp hp drat wt qsec vs am gear carb
## 25 3 27 22 22 29 30 2 2 3 6
Scoping is the act of finding a value associated with a name. That is, you have x
as a variable name in different parts of a program. When does it point to some objects versus others?
E.g., when is x
pointing to 20 versus 10 in the code below?
<- 10
x <- function() {
g01 <- 20
x
x }
x
points to 10 outside of the function, and points to 20 inside of the function.R uses lexical scoping, meaning that a name refers to an object based on when the function is defined.
In a lexically scoped language, a block defines a new scope. Variables are defined/declared in that scope and are not visible outside of that scope. Blocks can be nested, and when a variable is defined in an outer block, then it is visible in the inner block. But variables defined in the inner block are not defined in the outer block. E.g.
<- 10
outer_var <- function() {
f print(outer_var)
<- 1
inner_var
}f() ## outer_var is available in inner block though it is defined in outer block.
## [1] 10
## inner_var not available in outer block because defined in inner block. inner_var
## Error in eval(expr, envir, enclos): object 'inner_var' not found
Other types of scoping exist, such as dynamic scoping, meaning that a name refers to an object based on when the function is called. Most languages don’t use dynamic scope. One language that does use dynamic scope is bash.
Let’s formalize this mechanism.
Names inside a function mask names defined outside a function.
If a name is not defined in a function, R looks up one level.
<- 2
x <- 20
y <- function() {
g03 <- 1
y return(c(x, y))
}g03()
## [1] 2 1
# This doesn't change the previous value of y
y
## [1] 20
If a function is defined inside a function, then it keeps looking up levels until it finds a variable.
Below, the inner function finds z
inside the inner function, finds y
in the outer function, and finds x
outside both functions.
<- 1
x <- function() {
f1 <- 2
y <- function() {
f2 <- 3
z return(c(x, y, z))
}return(f2())
}f1()
## [1] 1 2 3
Each time a function is called, it creates a new environment to execute in (called the “execution environment”).
This means it does not remember what happened last time.
<- 2
a <- function() {
g11 <- a + 1
a return(a)
}
g11()
## [1] 3
g11()
## [1] 3
Functions and variables can share names (though, this is not a good idea).
<- c(10, 11)
sum sum(sum)
## [1] 21
This is allowed since the function and the variable are in different environments. The sum
object is in the global environment while the sum()
function is in the package:base
environment. More on this in Chapter 7.
::env_has(env = rlang::global_env(), nms = "sum") rlang
## sum
## TRUE
::env_has(env = rlang::env_parents()[["package:base"]], nms = "sum") rlang
## sum
## TRUE
R determines where to look at function creation time (e.g. one level up), but it determines what is there at evaluation time.
<- 1
a g11()
## [1] 2
<- 12
a g11()
## [1] 13
What does the following code return? Why? Describe how each of the three c’s is interpreted.
<- 10
c c(c = c)
What does the following function return? Make a prediction before running the code yourself.
<- function(x) {
f <- function(x) {
f <- function() {
f ^ 2
x
}f() + 1
}f(x) * 2
}f(10)
lazy evaluation is where a variable is only evaluated if used.
R arguments are lazily evaluated. So the following does not show an error, even though x
is not defined.
<- function(x) {
f return(10)
}
f()
## [1] 10
In R, lazy evaluation is done via a promise data object, which is described in Chapter 20.
But an important consequence of lazy evaluation is that default arguments can be defined in terms of other arguments.
<- function(x = NULL, y = x * 2, z = a + b) {
h04 <- 10
a <- 100
b
if (is.null(x)) {
<- 4
x
}
c(x, y, z)
}
h04()
## [1] 4 8 110
I’ve used this property before, e.g., for choosing initial values of an optimization approach based on the input data.
Default arguments are evaluated inside the function the first time the argument is used.
User supplied arguments are evaluated outside the function.
This example from Advanced R blew my mind:
<- function(x = ls()) { ## ls() is default argument
h05 <- 1
a
x
}
# ls() evaluated inside h05:
h05()
## [1] "a" "x"
# ls() evaluated in global environment:
h05(ls()) ## ls() is user supplied
## [1] "a" "dat" "efun" "f" "f1" "funs"
## [7] "g01" "g03" "g11" "h04" "h05" "oout"
## [13] "outer_var" "square" "sum" "x" "y"
Exercise (Advanced R): What does this function return? Why? Try to guess before evaluating them.
<- function(x = z) {
f2 <- 100
z
x
}f2()
Exercise (Advanced R): What does this function return? Why? Try to guess before evaluating them.
<- 10
y <- function(x = {y <- 1; 2}, y = 0) {
f1 c(x, y)
}f1()
y
Exercise: What does this function return on these calls? Why? Try to guess before evaluating them.
<- function(x, y = x^2) {
f1 <- x + 1
x return(y)
}f1(1)
<- 1
x f1(x, x^2)
...
(dot-dot-dot)In R, the dot-dot-dot argument is a special argument which allows a function to take any number of additional arguments beyond those defined by the user.
This is used all over the place in R. See seq()
, optim()
, plot()
, print()
, etc…
In computer science, this type of argument is called a “varargs” (for variable arguments).
Typically, you either do one of two things with ...
in a function:
Assign arguments to a list.
<- list(...) args
Pass arguments to another function.
fn(...)
E.g. I pass the extra arguments to a list so that I can check if they are NULL
in the list, then I pass the extra arguments to plot()
.
<- function(x, y, ...) {
plotd <- list(...)
args $x <- x
args$y <- y
argsif (is.null(args$pch)) {
$pch <- 16
args
}if (is.null(args$mar)) {
$mar <- c(3, 3, 2, 1)
args
}if (is.null(args$mgp)) {
$mgp <- c(1.8, 0.4, 0)
args
}if (is.null(args$las)) {
$las <- 1
args
}if (is.null(args$tcl)) {
$tcl <- -0.25
args
}do.call(what = plot, args = args)
}
I can pass more arguments to this function, or overwrite defaults
plotd(x = mtcars$wt,
y = mtcars$mpg,
main = "Motor Trends Cars",
xlab = "Weight",
ylab = "MPG",
pch = 19)
do.call()
is one way to call a function. You write the name of the function and provide a list of its arguments.
Exercise: Why do you think we get an error here?
<- function(x, y, ...) {
plot2 if (!hasArg(pch)) {
<- 16
pch cat("pch assigned")
}plot(x = x, y = y, pch = pch)
}plot2(mtcars$wt, mtcars$mpg)
plot2(mtcars$wt, mtcars$mpg, pch = 19)
If you want to just change one default, it is perhaps better to just include it as an additional argument.
<- function(x, y, pch = 16, ...) {
plot3 plot(x = x, y = y, pch = pch, ...)
}plot3(mtcars$wt, mtcars$mpg)
plot3(mtcars$wt, mtcars$mpg, pch = 19)
R will return the last evaluated expression by default:
<- function(x, y) {
f
y
x
}f(1, 2)
## [1] 1
I prefer to explicitly include a return()
call:
<- function(x, y) {
f
yreturn(x)
}f(1, 2)
## [1] 1
A visible return prints the result:
<- function(x) {
f return(x)
}f(1)
## [1] 1
You can prevent automatic printing by applying invisible()
.
<- function(x) {
f return(invisible(x))
}f(1)
We can print the value with print()
.
f(1) |>
print()
## [1] 1
or by enclosing in parentheses:
f(1)) (
## [1] 1
Assignment is a function with returns invisibly:
<- 1
x <- 1) (x
## [1] 1
You might be surprised that assignment is a function, but remember in R almost everything is a function. Prefix notation might make it more clear:
`<-`(x, 10)
x
## [1] 10
invisible()
returns are often used for arguments whose main purpose are side effects (like print()
or plot()
functions), so that you can chain arguments.
<- b <- c <- d <- 2
a a
## [1] 2
b
## [1] 2
c
## [1] 2
d
## [1] 2
If you change the global state (e.g. the options()
arguments), then it is polite to revert back on exit.
Use on.exit()
to do so, setting add = TRUE
to not overwrite previous exit handlers.
<- function(dir, code) {
cleanup <- setwd(dir)
old_dir on.exit(setwd(old_dir), add = TRUE)
# I can now change the working directory with impunity
<- options(stringsAsFactors = FALSE)
old_opt on.exit(options(old_opt), add = TRUE)
# I can now change the options with impunity
}
I have used this in real life when manipulating the parallelization backend using the {foreach}
package
<- doFuture::registerDoFuture()
oldDoPar on.exit(with(oldDoPar, foreach::setDoPar(fun=fun, data=data, info=info)), add = TRUE)
Hadley’s Function Forms:
foofy(a, b, c)
. These constitute of the majority of function calls in R.x + y
. Infix forms are used for many mathematical operators, and for user-defined functions that begin and end with %
.names(df) <- c("a", "b", "c")
. They actually look like prefix functions.[[
, if
, and for
. While they don’t have a consistent structure, they play important roles in R’s syntax.You can rewrite all forms in prefix form
+ y ## infix x
## [1] 30
`+`(x, y) ## prefix
## [1] 30
<- data.frame(1:2, c("a", "b"), c(TRUE, FALSE))
df names(df) <- c("x", "y", "z") ## replacement
`names<-`(df, c("x", "y", "z")) ## prefix
## x y z
## 1 1 a TRUE
## 2 2 b FALSE
for(i in 1:10) print(i) ## special
## [1] 1
## [1] 2
## [1] 3
## [1] 4
## [1] 5
## [1] 6
## [1] 7
## [1] 8
## [1] 9
## [1] 10
`for`(i, 1:10, print(i)) ## prefix
## [1] 1
## [1] 2
## [1] 3
## [1] 4
## [1] 5
## [1] 6
## [1] 7
## [1] 8
## [1] 9
## [1] 10
You can create your own infix functions by starting and ending them with %
`%+%` <- function(a, b) paste0(a, b)
"hello" %+% "world"
## [1] "helloworld"
You can create your own replacement functions.
<-
x
and value
andThe following will replace the second element
`second<-` <- function(x, value) {
2] <- value
x[
x
}
<- 1:10
x second(x) <- 12
x
## [1] 1 12 3 4 5 6 7 8 9 10
You can include additional arguments by placing them between x
and value
and including them on the left-hand size.
`modify<-` <- function(x, position, value) {
<- value
x[position]
x
}modify(x, 1) <- 10
x
## [1] 10 12 3 4 5 6 7 8 9 10
Rewrite the following code snippets into prefix form:
1 + 2 + 3
## [1] 6
1 + (2 + 3)
## [1] 6
if (length(x) <= 5) x[[length(x)]] else x[[5]]
## [1] 5
if
, else
, [[
, and <=
all into prefix form.Create a replacement function called rmod()
that modifies a random location in a vector. E.g.
set.seed(1)
<- 1:10
x rmod(x) <- NA
x
## [1] 1 2 3 4 5 6 7 8 NA 10
rmod(x) <- NA
x
## [1] 1 2 3 NA 5 6 7 8 NA 10
Write your own version of +
that pastes its inputs together if they are character vectors but behaves as usual otherwise. In other words, make this code work:
1 + 2
## [1] 3
"a" + "b"
## [1] "ab"
+
.Create an infix xor()
operator. Call it %x|%
. E.g.
c(TRUE, TRUE, FALSE, FALSE) %x|% c(TRUE, FALSE, TRUE, FALSE)
## [1] FALSE TRUE TRUE FALSE
Create infix versions of the set functions intersect()
, union()
, and setdiff()
. You might call them %n%
, %u%
, and %/%
to match conventions from mathematics.
do.call()
: Run a function with a list of arguments.invisible()
: Return without automatic printing.on.exit()
: Runs an expression when the function exits, whether naturally or by an error.