Learning Objectives

Names/Values
Copy-on-modify
Modify-by-reference
Chapter 2 from Advanced R
- These lecture notes are mostly taken straight out of Hadley’s book. Many thanks for making my life easier.
- His images, which I use here, are licensed under

Names and Values

Computer Memory is information (like numbers or strings) that is for immediate use. When you put information there, it is located at some “address” on your computer, and you can retrieve it from that address.
The following puts the vector c(1,2,3) in memory, and binds the name x to it
```
x <- c(1, 2, 3)
```
The function lobstr::obj_addr() let’s us see the address of this object.
```
lobstr::obj_addr(x)
```
```
## [1] "0x555da60f5588"
```
When you assign x to a new variable name y, it makes a new name that points to the same object as x.
```
y <- x
```
```
lobstr::obj_addr(y)
```
```
## [1] "0x555da60f5588"
```
If you modify y, then it will make a copy of object 0x555da60f5588 and point y to that new object. This is called copy-on-modify
```
y[[3]] <- 4
```
```
lobstr::obj_addr(y)
```
```
## [1] "0x555dab276d18"
```
Copy-on-modify exists so that x does not change when you change y.
```
x
```
```
## [1] 1 2 3
```

You can use tracemem() to track whenever an object is copied.

x <- c(1, 2, 3)
tracemem(x)

## [1] "<0x555da9dec568>"

y <- x
y[[3]] <- 4 ## copy made

## tracemem[0x555da9dec568 -> 0x555daaf4c3f8]: eval eval withVisible withCallingHandlers handle timing_fn evaluate_call <Anonymous> evaluate in_dir eng_r block_exec call_block process_group.block process_group withCallingHandlers process_file <Anonymous> <Anonymous> withCallingHandlers suppressMessages render_one FUN lapply sapply <Anonymous> <Anonymous>

y[[5]] <- 1 ## no copy made, y modified

Note: tracemem() is connected to the object (here 0x555da9dec568), not the name x. So the following will not show a copy-on-modify because we changed the binding of the name x.
```
x <- c(1, 2, 3)
tracemem(x)
```
```
## [1] "<0x555daac88428>"
```
```
x <- c(4, 5)
y <- x
y[[2]] <- 6
```
Note: tracemem() will give you weird results if you use it inside of RStudio. That’s because the Environment pane makes references to objects.

Name a inside function points to same object

x <- c(1, 2, 3)
tracemem(x)

## [1] "<0x555da9c4fe08>"

f <- function(a) {
  return(a)
}
z <- f(x) ## no copy made

a in function binds to same object

and x and z now point to same object

x and z point to same object

Exercise (From Advanced R): Explain the relationship between a, b, c and d in the following code:
```
a <- 1:10
b <- a
c <- b
d <- 1:10
```
Verify your conclusions using lobstr::obj_addr().
Exercise: When does the address of x change? Use cat() and lobstr::obj_addr() to verify your answer. Does tracemem() help you here? Why are why not?
```
x <- c()
for (i in 1:10) {
  x[[i]] <- i
}
```
Exercise: When does the address of x change? Use cat() and lobstr::obj_addr() to verify your answer.
```
x <- rep(x = NA_real_, length.out = 10)
for (i in 1:10) {
  x[[i]] <- i
}
```
Exercise: When does the address of x change? Use cat() and lobstr::obj_addr() to verify your answer.
```
x <- vector(mode = "numeric", length = 10)
for (i in 1:10) {
  x[[i]] <- i
}
```
Understanding when an object is copied is important for performance. Making copies can be expensive if you are doing it a lot (like in a for-loop), making your code run much slower.

Lists and Data Frames

Recall that a list is a vector that can have elements of any type.
To do this, the list’s name points to a vector of references, and these point to the objects.
```
l1 <- list(1, 2, 3)
```
Copy on modify for a list only copies the references, so is much more memory efficient. This is called a shallow copy.
```
l2 <- l1
```
```
l2[[3]] <- 4
```

lobstr::ref() allows you to see the location of each component of a list.

lobstr::ref(l1, l2)

## █ [1:0x555da7e73568] <list> 
## ├─[2:0x555da922d450] <dbl> 
## ├─[3:0x555da922d338] <dbl> 
## └─[4:0x555da922d300] <dbl> 
##  
## █ [5:0x555dab2cd6c8] <list> 
## ├─[2:0x555da922d450] 
## ├─[3:0x555da922d338] 
## └─[6:0x555da9fb29e8] <dbl>

Notice that the lists are at different addresses, the first two objects of each list are at the same address, but the third object of each list are at different addresses.

NOTE: Older versions of R (before 3.1.0) always created deep copies, and so were less memory efficient.

Data frames are lists of vectors (the columns)

d1 <- data.frame(x = c(1, 5, 6), y = c(2, 4, 3))

data frame d1

Modify a column, then only that column is copied and modified

d2 <- d1
d2[, 2] <- d2[, 2] * 2

column 2 of d2 points to a different object

lobstr::ref(d1, d2)

## █ [1:0x555da9a8d858] <df[,2]> 
## ├─x = [2:0x555da94c3c88] <dbl> 
## └─y = [3:0x555da9be1e48] <dbl> 
##  
## █ [4:0x555da9b8a038] <df[,2]> 
## ├─x = [2:0x555da94c3c88] 
## └─y = [5:0x555da9568188] <dbl>

Modify a row, then the entire data frame is copied (much less efficient).

d3 <- d1
d3[1, ] <- d3[1, ] * 3

d3 copied because modified row

lobstr::ref(d1, d3)

## █ [1:0x555da9a8d858] <df[,2]> 
## ├─x = [2:0x555da94c3c88] <dbl> 
## └─y = [3:0x555da9be1e48] <dbl> 
##  
## █ [4:0x555daa583c78] <df[,2]> 
## ├─x = [5:0x555da946b228] <dbl> 
## └─y = [6:0x555da946b1d8] <dbl>

Character Vectors

A character vector is a vector of references to a global string pool.
```
x <- c("a", "a", "abc", "d")
```
But Hadley usually writes this as

Use lobstr::ref() to show these references.

lobstr::ref(x, character = TRUE)

## █ [1:0x555daaf628d8] <chr> 
## ├─[2:0x555da507a0e8] <string: "a"> 
## ├─[2:0x555da507a0e8] 
## ├─[3:0x555da952d810] <string: "abc"> 
## └─[4:0x555da522f6f0] <string: "d">

Exercise (from Advanced R): Why do you think x is copied here? (it is only copied twice if you use R studio). Modify the code so that x is not copied.

x <- c(1L, 2L, 3L)
tracemem(x)

## [1] "<0x555da888ed48>"

x[[3]] <- 4

## tracemem[0x555da888ed48 -> 0x555da8a0a6b8]: eval eval withVisible withCallingHandlers handle timing_fn evaluate_call <Anonymous> evaluate in_dir eng_r block_exec call_block process_group.block process_group withCallingHandlers process_file <Anonymous> <Anonymous> withCallingHandlers suppressMessages render_one FUN lapply sapply <Anonymous> <Anonymous> 
## tracemem[0x555da8a0a6b8 -> 0x555da84ca6b8]: eval eval withVisible withCallingHandlers handle timing_fn evaluate_call <Anonymous> evaluate in_dir eng_r block_exec call_block process_group.block process_group withCallingHandlers process_file <Anonymous> <Anonymous> withCallingHandlers suppressMessages render_one FUN lapply sapply <Anonymous> <Anonymous>

x <- c(1L, 2L, 3L)
tracemem(x)

## [1] "<0x555da91516a8>"

x[[3]] <- 4L

## tracemem[0x555da91516a8 -> 0x555da936e4a8]: eval eval withVisible withCallingHandlers handle timing_fn evaluate_call <Anonymous> evaluate in_dir eng_r block_exec call_block process_group.block process_group withCallingHandlers process_file <Anonymous> <Anonymous> withCallingHandlers suppressMessages render_one FUN lapply sapply <Anonymous> <Anonymous>

Exercise (From Advanced R): Sketch out the relationship between the following objects:
```
a <- 1:10
b <- list(a, a)
c <- list(b, a, 1:10)
```

Object Size

You can tell how much memory an object takes up with lobstr::obj_size().
```
x <- 1:10
lobstr::obj_size(x)
```
```
## 680 B
```

Functions also take up memory

lobstr::obj_size(mean)

## 1,184 B

lobstr::obj_size(lm)

## 63,496 B

Because of the way objects efficiently bind, they might be smaller in memory than you expect.

Exercise: Why does the following list not take up 3 times as much memory as x?

x <- 1:10
y <- list(x, x, x)
lobstr::obj_size(x)

## 680 B

lobstr::obj_size(y)

## 760 B

Character strings may also be a lot smaller than you expect.

a <- "hello world, how are you"
b <- rep(a, 100)
lobstr::obj_size(a)

## 136 B

lobstr::obj_size(b) ## not 100 times larger

## 928 B

New versions of R have optimizations to efficiently store sequences of numbers called “ALTREP” for “alternative representation”. So the following are all the same size.
```
lobstr::obj_size(1:10)
```
```
## 680 B
```
```
lobstr::obj_size(1:100)
```
```
## 680 B
```
```
lobstr::obj_size(1:1000000)
```
```
## 680 B
```

Modify-in-place

The opposite of copy-on-modify is modify-in-place, where a new object is not created you you modify it.
Modify in place occurs when there is only a single binding.
```
v <- c(1, 2, 3)
```
```
v[[3]] <- 4
```

Exercise: Why is a copy made here?

x <- 1:3
tracemem(x)

## [1] "<0x555da883cfc8>"

x[[3]] <- 4

## tracemem[0x555da883cfc8 -> 0x555da91c3ac8]: eval eval withVisible withCallingHandlers handle timing_fn evaluate_call <Anonymous> evaluate in_dir eng_r block_exec call_block process_group.block process_group withCallingHandlers process_file <Anonymous> <Anonymous> withCallingHandlers suppressMessages render_one FUN lapply sapply <Anonymous> <Anonymous> 
## tracemem[0x555da91c3ac8 -> 0x555daac89f58]: eval eval withVisible withCallingHandlers handle timing_fn evaluate_call <Anonymous> evaluate in_dir eng_r block_exec call_block process_group.block process_group withCallingHandlers process_file <Anonymous> <Anonymous> withCallingHandlers suppressMessages render_one FUN lapply sapply <Anonymous> <Anonymous>

Modify-in-place also occurs in environments.
Environments are data structures that you can think of as like an unordered list. It’s a “bag of objects”.
Here, I create an environment, and bind the names e1 and e2 to it.
```
e1 <- rlang::env(a = 1, b = 2, c = 3)
e2 <- e1
```
If I change the e1 environment, then e2 is also changed.
```
e1$c <- 4
e2$c
```
```
## [1] 4
```
We will learn more about environments in Chapter 7, where this will be very important.

Garbage Collection

R often creates objects which no longer have names bound to them.
```
x <- 1:3
```
```
x <- 2:4
```
```
rm(x)
```
R has a garbage collector that periodically deletes these objects to free up memory. It is hard to reason when garbage collection is done.
This is only ever important to think about if you use C code in R without Rcpp.

New Functions

tracemem(): Tracks an object so that a message is printed whenever it is copied.
untracemem(): Untracks an object.
lobstr::ref(): Display a tree of object addresses.
lobstr::obj_addr(): Gives the address (in memory) of an object that a name points to.
lobstr::obj_size(): Gives the size (in memory) of an object.