Before you get started, the two rules of software optimization are (Jackson 1975):
If you try to optimize too soon, you will usually end up wasting more time than you save. Your code will usually become clunky and harder to maintain.
You should only start optimizing your code after:
It is hard to tell what code is fast versus slow, so we have tools to measure this.
Profiling is measuring the run-time on each line of code.
Use {profvis}
for profiling:
library(profvis)
Microbenchmarking is comparing performance between different pieces of code.
Use {bench}
for microbenchmarking:
library(bench)
When you go back to optimize, only work on the slowest parts.
Place a function call inside profvis::profvis()
to profile the function.
It is better for the profiler if you source the code first (it gives you better graphics). So I place the following code in “07_example.R.”
<- function() {
f pause(0.1)
g()
h()
}<- function() {
g pause(0.1)
h()
}<- function() {
h pause(0.1)
}profvis(f())
Then use source()
on it before placing the function in profvis()
.
source("./07_example.R")
profvis(f())
The pane that pops up looks like this:
The top pane is a bar-graph for the execution time for each line of code.
This doesn’t tell you why some lines are slower, e.g. h()
is called twice, so that’s why it is twice as long as other lines.
The bottom pane is called a flame graph.
pause()
running in f()
.pause()
running in g()
running in f()
.pause()
running in h()
running in g()
running in f()
.pause()
running in h()
running f()
.So if we saw this, we would try speeding up h()
since it takes up half the amount of total time, so we can probably have the most speed improvements by working on that.
The data tab has the same information as the flame graph, but vertically, and it let’s you collapse parts of it.
If you see <GC>
in in the profile, then this stands for “Garbage Collection” and is a sign that you are making lots and lots of copies that are being garbage collected. E.g.
<- function() {
f <- integer()
x for (i in 1:1e4) {
<- c(x, i)
x
}
}profvis(f())
The line where the issue occurs can be seen in the memory column.
To profile an R package, just load it into memory via devtools::load_all()
, then run profvis()
using code from that package on some example data.
You cannot profile C/C++ code using {profvis}
. You would have to use a C++ profiler like gperftools. More information can be found in Section 3.4 of Writing R Extensions.
Using anonymous functions will make the results of {profvis}
confusing.
The results are are a statistical sample of your call stack (what is being run and its ancestors), so your results will differ. But this only matters for really fast functions, which aren’t the ones you care about.
Here is a crappy implementation of fitting a linear model:
<- function(x, y) {
lmsuck <- cbind(1, x)
x <- c(solve(t(x) %*% x) %*% t(x) %*% y)
betahat names(betahat) <- c("beta0", "beta1")
<- c(x %*% betahat)
fits <- y - fits
resids <- sqrt(sum(resids^2) / (nrow(x) - ncol(x)))
sigma return(list(betahat = betahat,
sigma = sigma,
fits = fits,
resids = resids,
x = x,
y = y))
}
It seems to give similar results:
<- lmsuck(x = mtcars$wt, y = mtcars$mpg)
lm_sout <- lm(mpg ~ wt, data = mtcars)
lm_g
$betahat lm_sout
## beta0 beta1
## 37.285 -5.344
coef(lm_g)
## (Intercept) wt
## 37.285 -5.344
$sigma lm_sout
## [1] 3.046
sigma(lm_g)
## [1] 3.046
Profile lmsuck()
on a large (\(n \geq 100000\)) simulated dataset and tell me what takes the longest.
Benchmarking will compare small pieces of code.
This is only useful if you are using this code thousands of times a second.
Don’t try to generalize small fast code with slower versions (i.g. knowing what’s faster when \(n=1\) tells you nothing about what’s faster when \(n = 10000000\)).
Use bench::mark()
to do microbenchmarking.
<- function(x) {
rsum <- 0
sval for (i in seq_along(x)) {
<- x[[i]] + sval
sval
}return(sval)
}
<- 1:100
x <- bench::mark(
lb sum(x),
rsum(x)
) lb
## # A tibble: 2 × 6
## expression min median `itr/sec` mem_alloc `gc/sec`
## <bch:expr> <bch:tm> <bch:tm> <dbl> <bch:byt> <dbl>
## 1 sum(x) 135.97ns 156.11ns 4903545. 0B 0
## 2 rsum(x) 3.27µs 3.44µs 266521. 68.4KB 0
So running sum()
a million times would take about 0.2 seconds. Running rsum()
a million times would take about 5 seconds.
But rarly do you need to run sum()
a million times, so typically either one is OK in real life.
Always pay attention to the units. 1 ms \(>\) 1 µs \(>\) 1 ns.
There is a nice plot method for bench_mark
objects (multimodality comes from other processes running in the background).
plot(lb)
For summing up the columns of a matrix, I can think of at least four ways
colSums()
1
’sapply()
.Create functions that implements all four of these approaches and tell me which one is the fastest for \(100\times 100\), \(1000\times 100\), and \(100\times 1000\) matrices.
profvis::profvis()
: Profile a function.bench::mark()
: Microbenchmark multiple expressions.