Computer Memory is information (like numbers or strings) that is for immediate use. When you put information there, it is located at some “address” on your computer, and you can retrieve it from that address.
The following puts the vector c(1,2,3)
in memory, and binds the name x
to it
<- c(1, 2, 3) x
The function lobstr::obj_addr()
let’s us see the address of this object.
::obj_addr(x) lobstr
## [1] "0x555da60f5588"
When you assign x
to a new variable name y
, it makes a new name that points to the same object as x
.
<- x y
::obj_addr(y) lobstr
## [1] "0x555da60f5588"
If you modify y
, then it will make a copy of object 0x555da60f5588 and point y
to that new object. This is called copy-on-modify
3]] <- 4 y[[
::obj_addr(y) lobstr
## [1] "0x555dab276d18"
Copy-on-modify exists so that x
does not change when you change y
.
x
## [1] 1 2 3
You can use tracemem()
to track whenever an object is copied.
<- c(1, 2, 3)
x tracemem(x)
## [1] "<0x555da9dec568>"
<- x
y 3]] <- 4 ## copy made y[[
## tracemem[0x555da9dec568 -> 0x555daaf4c3f8]: eval eval withVisible withCallingHandlers handle timing_fn evaluate_call <Anonymous> evaluate in_dir eng_r block_exec call_block process_group.block process_group withCallingHandlers process_file <Anonymous> <Anonymous> withCallingHandlers suppressMessages render_one FUN lapply sapply <Anonymous> <Anonymous>
5]] <- 1 ## no copy made, y modified y[[
Note: tracemem()
is connected to the object (here 0x555da9dec568), not the name x
. So the following will not show a copy-on-modify because we changed the binding of the name x
.
<- c(1, 2, 3)
x tracemem(x)
## [1] "<0x555daac88428>"
<- c(4, 5)
x <- x
y 2]] <- 6 y[[
Note: tracemem()
will give you weird results if you use it inside of RStudio. That’s because the Environment pane makes references to objects.
Name a
inside function points to same object
<- c(1, 2, 3)
x tracemem(x)
## [1] "<0x555da9c4fe08>"
<- function(a) {
f return(a)
}<- f(x) ## no copy made z
and x
and z
now point to same object
Exercise (From Advanced R): Explain the relationship between a, b, c and d in the following code:
<- 1:10
a <- a
b <- b
c <- 1:10 d
Verify your conclusions using lobstr::obj_addr()
.
Exercise: When does the address of x
change? Use cat()
and lobstr::obj_addr()
to verify your answer. Does tracemem()
help you here? Why are why not?
<- c()
x for (i in 1:10) {
<- i
x[[i]] }
Exercise: When does the address of x
change? Use cat()
and lobstr::obj_addr()
to verify your answer.
<- rep(x = NA_real_, length.out = 10)
x for (i in 1:10) {
<- i
x[[i]] }
Exercise: When does the address of x
change? Use cat()
and lobstr::obj_addr()
to verify your answer.
<- vector(mode = "numeric", length = 10)
x for (i in 1:10) {
<- i
x[[i]] }
Understanding when an object is copied is important for performance. Making copies can be expensive if you are doing it a lot (like in a for-loop), making your code run much slower.
Recall that a list is a vector that can have elements of any type.
To do this, the list’s name points to a vector of references, and these point to the objects.
<- list(1, 2, 3) l1
Copy on modify for a list only copies the references, so is much more memory efficient. This is called a shallow copy.
<- l1 l2
3]] <- 4 l2[[
lobstr::ref()
allows you to see the location of each component of a list.
::ref(l1, l2) lobstr
## █ [1:0x555da7e73568] <list>
## ├─[2:0x555da922d450] <dbl>
## ├─[3:0x555da922d338] <dbl>
## └─[4:0x555da922d300] <dbl>
##
## █ [5:0x555dab2cd6c8] <list>
## ├─[2:0x555da922d450]
## ├─[3:0x555da922d338]
## └─[6:0x555da9fb29e8] <dbl>
NOTE: Older versions of R (before 3.1.0) always created deep copies, and so were less memory efficient.
Data frames are lists of vectors (the columns)
<- data.frame(x = c(1, 5, 6), y = c(2, 4, 3)) d1
Modify a column, then only that column is copied and modified
<- d1
d2 2] <- d2[, 2] * 2 d2[,
::ref(d1, d2) lobstr
## █ [1:0x555da9a8d858] <df[,2]>
## ├─x = [2:0x555da94c3c88] <dbl>
## └─y = [3:0x555da9be1e48] <dbl>
##
## █ [4:0x555da9b8a038] <df[,2]>
## ├─x = [2:0x555da94c3c88]
## └─y = [5:0x555da9568188] <dbl>
Modify a row, then the entire data frame is copied (much less efficient).
<- d1
d3 1, ] <- d3[1, ] * 3 d3[
::ref(d1, d3) lobstr
## █ [1:0x555da9a8d858] <df[,2]>
## ├─x = [2:0x555da94c3c88] <dbl>
## └─y = [3:0x555da9be1e48] <dbl>
##
## █ [4:0x555daa583c78] <df[,2]>
## ├─x = [5:0x555da946b228] <dbl>
## └─y = [6:0x555da946b1d8] <dbl>
A character vector is a vector of references to a global string pool.
<- c("a", "a", "abc", "d") x
But Hadley usually writes this as
Use lobstr::ref()
to show these references.
::ref(x, character = TRUE) lobstr
## █ [1:0x555daaf628d8] <chr>
## ├─[2:0x555da507a0e8] <string: "a">
## ├─[2:0x555da507a0e8]
## ├─[3:0x555da952d810] <string: "abc">
## └─[4:0x555da522f6f0] <string: "d">
Exercise (from Advanced R): Why do you think x
is copied here? (it is only copied twice if you use R studio). Modify the code so that x
is not copied.
<- c(1L, 2L, 3L)
x tracemem(x)
## [1] "<0x555da888ed48>"
3]] <- 4 x[[
## tracemem[0x555da888ed48 -> 0x555da8a0a6b8]: eval eval withVisible withCallingHandlers handle timing_fn evaluate_call <Anonymous> evaluate in_dir eng_r block_exec call_block process_group.block process_group withCallingHandlers process_file <Anonymous> <Anonymous> withCallingHandlers suppressMessages render_one FUN lapply sapply <Anonymous> <Anonymous>
## tracemem[0x555da8a0a6b8 -> 0x555da84ca6b8]: eval eval withVisible withCallingHandlers handle timing_fn evaluate_call <Anonymous> evaluate in_dir eng_r block_exec call_block process_group.block process_group withCallingHandlers process_file <Anonymous> <Anonymous> withCallingHandlers suppressMessages render_one FUN lapply sapply <Anonymous> <Anonymous>
<- c(1L, 2L, 3L)
x tracemem(x)
## [1] "<0x555da91516a8>"
3]] <- 4L x[[
## tracemem[0x555da91516a8 -> 0x555da936e4a8]: eval eval withVisible withCallingHandlers handle timing_fn evaluate_call <Anonymous> evaluate in_dir eng_r block_exec call_block process_group.block process_group withCallingHandlers process_file <Anonymous> <Anonymous> withCallingHandlers suppressMessages render_one FUN lapply sapply <Anonymous> <Anonymous>
Exercise (From Advanced R): Sketch out the relationship between the following objects:
<- 1:10
a <- list(a, a)
b <- list(b, a, 1:10) c
You can tell how much memory an object takes up with lobstr::obj_size()
.
<- 1:10
x ::obj_size(x) lobstr
## 680 B
Functions also take up memory
::obj_size(mean) lobstr
## 1,184 B
::obj_size(lm) lobstr
## 63,496 B
Because of the way objects efficiently bind, they might be smaller in memory than you expect.
Exercise: Why does the following list not take up 3 times as much memory as x
?
<- 1:10
x <- list(x, x, x)
y ::obj_size(x) lobstr
## 680 B
::obj_size(y) lobstr
## 760 B
Character strings may also be a lot smaller than you expect.
<- "hello world, how are you"
a <- rep(a, 100)
b ::obj_size(a) lobstr
## 136 B
::obj_size(b) ## not 100 times larger lobstr
## 928 B
New versions of R have optimizations to efficiently store sequences of numbers called “ALTREP” for “alternative representation”. So the following are all the same size.
::obj_size(1:10) lobstr
## 680 B
::obj_size(1:100) lobstr
## 680 B
::obj_size(1:1000000) lobstr
## 680 B
The opposite of copy-on-modify is modify-in-place, where a new object is not created you you modify it.
Modify in place occurs when there is only a single binding.
<- c(1, 2, 3) v
3]] <- 4 v[[
Exercise: Why is a copy made here?
<- 1:3
x tracemem(x)
## [1] "<0x555da883cfc8>"
3]] <- 4 x[[
## tracemem[0x555da883cfc8 -> 0x555da91c3ac8]: eval eval withVisible withCallingHandlers handle timing_fn evaluate_call <Anonymous> evaluate in_dir eng_r block_exec call_block process_group.block process_group withCallingHandlers process_file <Anonymous> <Anonymous> withCallingHandlers suppressMessages render_one FUN lapply sapply <Anonymous> <Anonymous>
## tracemem[0x555da91c3ac8 -> 0x555daac89f58]: eval eval withVisible withCallingHandlers handle timing_fn evaluate_call <Anonymous> evaluate in_dir eng_r block_exec call_block process_group.block process_group withCallingHandlers process_file <Anonymous> <Anonymous> withCallingHandlers suppressMessages render_one FUN lapply sapply <Anonymous> <Anonymous>
Modify-in-place also occurs in environments.
Environments are data structures that you can think of as like an unordered list. It’s a “bag of objects”.
Here, I create an environment, and bind the names e1
and e2
to it.
<- rlang::env(a = 1, b = 2, c = 3)
e1 <- e1 e2
If I change the e1
environment, then e2
is also changed.
$c <- 4
e1$c e2
## [1] 4
We will learn more about environments in Chapter 7, where this will be very important.
R often creates objects which no longer have names bound to them.
<- 1:3 x
<- 2:4 x
rm(x)
R has a garbage collector that periodically deletes these objects to free up memory. It is hard to reason when garbage collection is done.
This is only ever important to think about if you use C code in R without Rcpp.
tracemem()
: Tracks an object so that a message is printed whenever it is copied.untracemem()
: Untracks an object.lobstr::ref()
: Display a tree of object addresses.lobstr::obj_addr()
: Gives the address (in memory) of an object that a name points to.lobstr::obj_size()
: Gives the size (in memory) of an object.