Hadley and colleagues made a really great package that, among other things, allows for handling environments: {rlang}
. It’s way better than the base R functionality.
library(rlang)
An environment is a fundamental data object in R that determines lexical scoping — i.e. they determine when a name is bound to an object.
Motivations:
An environment is like a list, except:
Create a new environment with rlang::env()
, which behaves a lot list list()
.
<- rlang::env(a = FALSE,
e1 b = "a",
c = 2.3,
d = 1:3)
Environments associate names to values without any particular order. Hadley draws them like this:
Above, the arrows indicate “bindings”. So b
is bound to "a"
and a
is bound to FALSE
, etc…
The blue dot indicates the parent environment.
Environments have “reference semantics”, which means that they modify-in-place.
<- e1
e2 $d <- 4:6
e2$d e1
## [1] 4 5 6
An environment can contain itself.
$d <- e1 e1
Printing an environment just shows its address in memory.
e1
## <environment: 0x5594308c2588>
To print the objects of an environment, use rlang::env_print()
.
::env_print(e1) rlang
## <environment: 0x5594308c2588>
## Parent: <environment: global>
## Bindings:
## • a: <lgl>
## • b: <chr>
## • c: <dbl>
## • d: <env>
Use rlang::env_names()
to get a character vector with names.
::env_names(e1) rlang
## [1] "a" "b" "c" "d"
The current environment is the environment in which the code is currently being executed (where R looks for names). Use rlang::current_env()
to get the current environment.
<- rlang::current_env()
ce typeof(ce)
## [1] "environment"
The global environment is the environment where you interactively use R. You can access it with rlang::global_env()
<- rlang::global_env()
gl typeof(gl)
## [1] "environment"
env_print(gl)
## <environment: global>
## Parent: <environment: package:rlang>
## Bindings:
## • ce: <env>
## • e1: <env>
## • e2: <env>
## • gl: <env>
We can see that the current environment is the global environment with identical()
identical(gl, ce)
## [1] TRUE
You should not use ==
. This is since ==
is expecting to be used on a vector, and an environment is not a vector.
Every environment has a parent. If a name is not found in the current environment then it is searched for in the parent. This is how lexical scoping operates in R.
rlang::env()
actually creates a child environment. You can either supply the parent or it assumes the parent is the current environment.
<- env(d = 4, e = 5)
e2a <- env(e2a, a = 1, b = 2, c = 3) e2b
In the above diagram, the pale blue dot represents a pointer to the parent. The left box is the child and the right box is the parent.
rlang::env_parent()
gives you the parent.
env_parent(e2b)
## <environment: 0x559433038710>
e2a
## <environment: 0x559433038710>
env_parent(e2a)
## <environment: R_GlobalEnv>
Every environment has as an ancestor the empty environment, which has no parent. You can access it with rlang::empty_env()
<- empty_env()
em ::env_print(em) rlang
## <environment: empty>
## Parent: NULL
::env_parents(em) rlang
## list()
We can make children of the empty environment with rlang::env()
by including the empty environment as the first argument.
<- env(empty_env(), d = 4, e = 5)
e2c <- env(e2c, a = 1, b = 2, c = 3) e2d
Your global environment has the empty environment as the progentor.
::env_parents(rlang::global_env()) rlang
## [[1]] $ <env: package:rlang>
## [[2]] $ <env: package:stats>
## [[3]] $ <env: package:graphics>
## [[4]] $ <env: package:grDevices>
## [[5]] $ <env: package:utils>
## [[6]] $ <env: package:datasets>
## [[7]] $ <env: package:methods>
## [[8]] $ <env: Autoloads>
## [[9]] $ <env: package:base>
## [[10]] $ <env: empty>
The ancestors of the global environment are all of the attached packages that ultimately terminate in the empty environment. So env_parents()
will stop at the global environment by default.
<- rlang::env(x = 1:3)
et ::env_parents(et) rlang
## [[1]] $ <env: global>
::env_parents(et, last = rlang::empty_env()) rlang
## [[1]] $ <env: global>
## [[2]] $ <env: package:rlang>
## [[3]] $ <env: package:stats>
## [[4]] $ <env: package:graphics>
## [[5]] $ <env: package:grDevices>
## [[6]] $ <env: package:utils>
## [[7]] $ <env: package:datasets>
## [[8]] $ <env: package:methods>
## [[9]] $ <env: Autoloads>
## [[10]] $ <env: package:base>
## [[11]] $ <env: empty>
Regular assignment <-
creates a variable in the current environment.
Super assignment <<-
modifies an existing variable found in the parent environment. If no such variable exists, it creates one in the global environment.
<- 0
x <- function() {
f <<- 1
x
}f()
x
## [1] 1
Most of the time, it is not a good idea to use super assignment. Global variables are vary dangerous. We’ll talk about one good application of them in Chapter 10.
Get and set values from an environment the same way as from a list.
<- env(x = 1, y = 2)
e3 $x e3
## [1] 1
"x"]] e3[[
## [1] 1
Because environments are unordered, integer subsetting does not work
1]] e3[[
## Error in e3[[1]]: wrong arguments for subsetting an environment
Because environments are not vectors, you cannot use sing brackets []
, so you cannot get more than one element.
"x"] e3[
## Error in e3["x"]: object of type 'environment' is not subsettable
c("x", "y")] e3[
## Error in e3[c("x", "y")]: object of type 'environment' is not subsettable
You get NULL
if a variable is not in an environment, just like a list:
$z e3
## NULL
Test if an environment has a binding with rlang::env_has()
::env_has(e3, "x") rlang
## x
## TRUE
::env_has(e3, "z") rlang
## z
## FALSE
Unlike a list, you do not remove elements by assigning them to NULL
, because the name refers to NULL
.
$a <- 10
e3$a <- NULL
e3$a e3
## NULL
::env_has(e3, "a") rlang
## a
## TRUE
Use rlang::env_unbind()
to remove an object.
::env_unbind(e3, "a")
rlang::env_has(e3, "a") rlang
## a
## FALSE
Create an environment as illustrated by this picture.
Create a pair of environments as illustrated by this picture.
Sometimes, you want to look through all of the ancestor environments to find an object, or for exploration. Here is an example where we count how many objects are in each ancestral environment.
<- function(base_env = rlang::caller_env(),
count_env end_env = rlang::empty_env()) {
<- length(rlang::env_parents(env = base_env, last = end_env))
nenv <- rep(NA_real_, length.out = nenv)
obj_num <- base_env
env for(i in seq_len(nenv)) {
<- length(env)
obj_num[[i]] names(obj_num)[[i]] <- rlang::env_name(env)
<- rlang::env_parent(env)
env
}return(obj_num)
}count_env()
## global package:rlang package:stats package:graphics
## 14 434 456 88
## package:grDevices package:utils package:datasets package:methods
## 119 221 104 371
## Autoloads package:base
## 1 1370
The caller environment is the environment of the function that called the current function. See below.
Most environments are created by R, not by you.
The most important ones are:
Global Environment: Environment that you, the user, interact with during an interactive session.
Package Environment: External interface for a package. When you, the user, uses a function, it looks for it in the package environment.
Namespace Environment: Internal interface for apackage. When the package searches for a function within the same package, it looks for it in the namespace environment.
Imports Environment: Functions used by the package. When the package searches for a function from another package, it looks for it in the imports environment.
Function Environment (aka enclosing environment): Environment where function was created and where it has access to objects. For a function from a package, this is the namespace environment. For a function you create during an interactive session, this is the global environment.
Binding Environment of a function: Environment where the name of a function is bound to the function. May or may not be the same as the function environment.
Execution Environment: Each time a function is called, a new temporary environment is created to host execution. This is to that function calls always have a “fresh start”
Caller environment: The environment in which the function was called.
Each package attached by library()
creates a package environment that becomes an ancestor of the global environment. They are parents in the order that you attached them.
This order is called the search path because variable names are searched in that order. You can see the search path with rlang::search_envs()
.
::search_envs() rlang
## [[1]] $ <env: global>
## [[2]] $ <env: package:rlang>
## [[3]] $ <env: package:stats>
## [[4]] $ <env: package:graphics>
## [[5]] $ <env: package:grDevices>
## [[6]] $ <env: package:utils>
## [[7]] $ <env: package:datasets>
## [[8]] $ <env: package:methods>
## [[9]] $ <env: Autoloads>
## [[10]] $ <env: package:base>
So if I try to evaluate a variable/function name, then it will first search for it in the global environment, then in the {rlang}
package environment, then in the {stats}
package environment, etc…
Attaching a new package with library()
makes that package the immediate parent of the global environment.
library(d)
The function environment is the environment where the function has access to all objects in that environment and its parent environments. This is the current environment when the function is created, not when the function is called.
You can see the function environment via rlang::fn_env()
The function environment may or may not be a new environment. E.g. most of the functions you write use the global environment as the function environment.
<- 5
x <- function() {
fn sum(1:x)
}::fn_env(fn) rlang
## <environment: R_GlobalEnv>
Above, since the function environment is the global environment, fn()
has access to x
(which is in the global environment) and to sum()
(which is in the namespace:base environment).
Most functions in a package have the namespace environment (see below) as the function environment.
::fn_env(base::sum) rlang
## <environment: namespace:base>
::fn_env(stats::lm) rlang
## <environment: namespace:stats>
::fn_env(rlang::fn_env) rlang
## <environment: namespace:rlang>
The function environment may or may not be different from the environment where the name is bound to the function. That space is called the binding environment of the function.
In many cases, the function environment is the same as the binding environment. Below, the name f
in the global environment is bound to the function (arrow moving from f to the yellow object), so the binding environment is the global environment. Also below, the function is bound to the global environment (arrow moving from the black dot to the global environment), so the global environment has the objects that the function has access to, so the function environment is also the global environment.
<- 1
y <- function(x) {
f return(x)
}
Below, the name g
in the e
environment was created in the global environment. So the function environment is the global environment. But the name g
is in the e
environment, so the binding environment is e
.
<- 1
y <- rlang::env()
e $g <- function(x) {
ereturn(x)
}
Exercise: Does g()
still have access to all of the objects in the global environment?
The package environment is the external interface for a package. It contains the exported functions of a package.
The namespace environment of a package is the internal interface for a package. Functions in the package will search for its objects in the namespace environment.
This is what allows us to modify (rather foolishly) var()
but still allow sd()
to work properly.
sd
## function (x, na.rm = FALSE)
## sqrt(var(if (is.vector(x) || is.factor(x)) x else as.double(x),
## na.rm = na.rm))
## <bytecode: 0x55943244bc68>
## <environment: namespace:stats>
<- rnorm(10)
x sd(x)
## [1] 0.6674
<- function(x) 0
var var(x)
## [1] 0
sd(x)
## [1] 0.6674
The namespace environment acts as the function environment for all functions in a package.
An exported function has a binding both in the namespace environment and the package environment.
An internal function only has a binding the namespace environment.
The parent of a namespace environment is an imports environment that contains bindings of all functions used by the package.
The parent of the imports environment is the base namespace, where all base functions are located (this is why you don’t need to import sum()
or use base::sum()
in your package code).
The parent of the base namespace is the global environment.
library(rlang)
env_parents(fn_env(stats::var))
## [[1]] $ <env: imports:stats>
## [[2]] $ <env: namespace:base>
## [[3]] $ <env: global>
env_parents(fn_env(rlang::fn_env))
## [[1]] $ <env: imports:rlang>
## [[2]] $ <env: namespace:base>
## [[3]] $ <env: global>
Below, let the yellow object be the sd()
function defined in the package {stats}
. Then whenever another function in {stats}
uses sd()
or var()
it finds them in the namespace:stats
environment. Right now, the package:stats
environment also points to that function, so when the user uses sd()
they can use the version created by the {stats}
authors. However, if we change the definition of var()
, we only change the binding in the package:stats
environment, not in the namespace:stats
environment. This means that {stats}
functions will work even if we change the binding. In particular, sd()
, which uses var()
, will still work.
Each time a function is called, a new environment is created to host execution. This is called the execution environment.
This is why a
is not saved between calls:
<- function() {
fn if (!env_has(current_env(), "a")) {
<- 1
a else {
} <- a + 1
a
}return(a)
}fn()
## [1] 1
fn()
## [1] 1
The execution environment is always the child of the function environment.
Consider this function
<- function(x) {
h # 1.
<- 2 # 2.
a + a
x
}<- h(1) # 3. y
The yellow object is the h()
function. The name h
is in the global environment (bottom right). The execution environment is the top left. x
binds to 1. Then a
is defined, it binds to 2
. When the function completes, it returns 3
and so y
binds to 3
in the global environment. The execution environment is garbage collected.
It is possible to explicitly return the execution environment so that it is not garbage collected. But this is rarely done.
<- function(x) {
h2 <- x * 2
a ::current_env()
rlang
}<- h2(x = 10)
e ::env_print(e) rlang
## <environment: 0x559433fd0850>
## Parent: <environment: global>
## Bindings:
## • a: <dbl>
## • x: <dbl>
More frequently, the execution environment is maintained because it is a function environment of a returned function.
<- function(x) {
plus function(y) x + y
}
<- plus(1)
plus_one plus_one
## function(y) x + y
## <environment: 0x559430a38cc8>
Above figure: the global environment is the bottom box. The execution environment is the top box. The plus()
function is the right yellow object. Its function environment and binding environment is the global environment. When plus()
is called with x = 1
, it creates the execution environment where x
is bound to 1
. When plus_one()
is defined, its function environment is the execution environment (since that is where it was created), but its binding environment is the global environment (where the name is bound). Note that the parent environment of the plus()
’s execution environment is the global environment.
::fn_env(plus_one) rlang
## <environment: 0x559430a38cc8>
::env_parent(rlang::fn_env(plus_one)) rlang
## <environment: R_GlobalEnv>
When we call plus_one()
, its execution environment will have the execution environment of plus()
as its parent.
<- 20
x plus_one(2)
## [1] 3
When we call plus_one()
with y
bound to 2 (execution environment top left), when it tries to find x
it first searches in
::fn_env(plus_one) rlang
## <environment: 0x559430a38cc8>
(execution environment top right) before going to the global environment (bottom).
Exercise (Advanced R): Draw a diagram that shows the function environments of this function, along with all bindings for the given execution.
<- function(x1) {
f1 <- function(x2) {
f2 <- function(x3) {
f3 + x2 + x3
x1
}f3(3)
}f2(2)
}f1(1)
## [1] 6
Recall, the function environment is the environment in which the function was created.
The caller environment is the environment in which the function was called (aka “used”)
You can get this environment inside a function via rlang::caller_env()
.
<- function(x) {
fn1 <- function(y) {
fn2 ::caller_env()
rlang
}<- rlang::current_env() ## The execution environment of fn1()
e0 <- fn2() ## The caller environment of fn2()
e1 <- rlang::caller_env() ## The caller environment of fn1()
e2 return(list(e0, e1, e2))
}fn1()
## [[1]]
## <environment: 0x559432c478c8>
##
## [[2]]
## <environment: 0x559432c478c8>
##
## [[3]]
## <environment: R_GlobalEnv>
stats::numericDeriv()
numerically evaluates the gradient of an expression at some value. It assumes that the evaluation occurs within some environment you provide.
Let’s calculate the gradient of \(x^y\) evaluated at \(x = 3\).
<- rlang::env(x = 3, y = 2)
myenv numericDeriv(expr = rlang::expr(x ^ y), theta = "x", rho = myenv)
## [1] 9
## attr(,"gradient")
## [,1]
## [1,] 6
From calculus, we know that the derivative of \(x^2\) is \(2x\), and so the gradient evaluated at \(x = 3\) should be \(2\times 3 = 6\).
rlang::expr()
is discussed in Chapter 19. Basically, it captures an expression without evaluating it. This is called “quoting”. We can then evaluate that expression with eval()
.
::expr(x^2) rlang
## x^2
::expr(x^2) |> eval(envir = myenv) rlang
## [1] 9
::expr(sum(c(1, 2, 3))) rlang
## sum(c(1, 2, 3))
::expr(sum(c(1, 2, 3))) |> eval(envir = rlang::global_env()) rlang
## [1] 6
An R package cannot alter the global environment.
Objects in packages are locked, so cannot be changed.
Say you want to keep track of whether or not a function was run (e.g. to write a message on first use). I have done this to (i) say that a function is defunct or (ii) list out special licenses that cover a method.
One way you could keep track of this is to use super assign in a function that will change the value of a logical in the package environment.
<- FALSE ## Global variable
ran_fun <- function() {
fun if (!ran_fun) {
message("Here is a message")
<<- FALSE ## this alters package environment
ran_fun
}
}fun()
## Here is a message
fun()
## Here is a message
What I think is better is having an environment specific to messages, that way you have to worry less about global variables.
<- rlang::env(ran_fun = FALSE)
menv <- function() {
fun if (!menv$ran_fun) {
message("Here is a message")
$ran_fun <- TRUE
menv
}
}fun()
## Here is a message
fun()
This functionality is very popular, so {rlang}
has a function dedicated to it.
<- function() {
fun ::warn("Here is a message", .frequency = "once", .frequency_id = "ran_fun")
rlang
}fun()
## Warning: Here is a message
## This warning is displayed once per session.
fun()
Exercise: Use environments to keep a tally for how many times a function called foo()
is run. Create another function called foo_count()
that returns that number. E.g.
foo_count()
## [1] 0
foo()
foo()
foo()
foo_count()
## [1] 3
Hashing is a quick way to select an object from a bunch of possible objects. You use a hash key to select a hash value.
Hashing in R was done through the {hash}
package using environments.
Hashing is way quicker than using a list or a vector.
In the below example, using hashing is about 1000 times faster than using native R vectors.
library(hash)
<- read.table("https://data-science-master.github.io/lectures/data/words.txt",
words header = TRUE,
na.strings = "")
<- hash(keys = words$word, values = seq_along(words$word))
h <- seq_along(words$word)
l names(l) <- words$word
::mark(
bench$MOTIVITIES,
h"MOTIVITIES"]],
l[[|>
) ::select(expression:result) |>
dplyr::kable() knitr
expression | min | median | itr/sec | mem_alloc | gc/sec | n_itr | n_gc | total_time | result |
---|---|---|---|---|---|---|---|---|---|
h$MOTIVITIES | 1.4µs | 1.48µs | 460833.6 | 0B | 0 | 10000 | 0 | 21.7ms | 150000 |
l[[“MOTIVITIES”]] | 1.69ms | 1.79ms | 543.9 | 0B | 0 | 272 | 0 | 500.1ms | 150000 |
Though, for this lookup to be worthwhile, you would need to be doing lookups millions of times a second.
rlang::env()
: Create a new child environment.rlang::env_print()
: Print an environment’s bindings.rlang::env_names()
: Print names in an environment.rlang::env_parent()
: Print the parent of an environment.rlang::env_has()
: Test if an environment has a binding.rlang::current_env()
: Access current environment.rlang::env_bind()
: Add objects to an environment.rlang::env_unbind()
: Remove objects from an environment.rlang::global_env()
: Access global environment.rlang::empty_env()
: Access the empty environment.rlang::fn_env()
: Access the function environment.rlang::caller_env()
: Access the caller environment.rlang::search_envs()
: Show the search path.identical()
: Check if two objects are exactly equal.