{Rcpp}
provides classes of C++ vectors which make programming easier for folks used to R.
The classes are called LogicalVector
, IntegerVector
, NumericVector
, and CharacterVector
.
Note, in regular C++ (without Rcpp), you typically use arrays or std::vector
objects, like std::vector<double>
.
If you have a function that accepts and returns vectors, then you use those types.
C++
(IntegerVector x, bool y) {
NumericVector fn// code to create z
return z;
}
You create an empty vector of length n
with
C++
(n); NumericVector x
You can add default values by including a second argument via
C++
(n, 2.0); NumericVector x
If you want an IntegerVector
from 0
to n - 1
, probably the best way is through Rcpp::seq(0, n-1)
std::iota()
C++
= Rcpp::seq(0, n - 1); IntegerVector x
If you have specific elements you want to create the vector with, use curly braces (requires C++11):
C++
= {9.0, 1.2, 4.9}; NumericVector x
For older versions of C++, use NumericVector::create()
.
C++
= NumericVector::create(9.0, 1.2, 4.9); NumericVector x
Let’s demonstrate these methods
C++
// [[Rcpp::export]]
void vec_create() {
int n = 5;
(n);
NumericVector x1
(n, 2.0);
NumericVector x2
(n, "A");
CharacterVector x3
= Rcpp::seq(0, n - 1);
IntegerVector x4
= {9.0, 1.2, 4.9};
NumericVector x5
= NumericVector::create(9.0, 1.2, 4.9);
NumericVector x6
::Rcout << "x1: " << x1 << std::endl
Rcpp << "x2: " << x2 << std::endl
<< "x3: " << x3 << std::endl
<< "x4: " << x4 << std::endl
<< "x5: " << x5 << std::endl
<< "x6: " << x6 << std::endl;
}
R
vec_create()
## x1: 0 0 0 0 0
## x2: 2 2 2 2 2
## x3: "A" "A" "A" "A" "A"
## x4: 0 1 2 3 4
## x5: 9 1.2 4.9
## x6: 9 1.2 4.9
You get and set individual elements of a vector using brackets []
, just like in R.
Note that in C++, indexing starts at 0.
One more time for emphasis, in C++ indexing starts at 0.
C++
// [[Rcpp::export]]
void subset_example(NumericVector x) {
::Rcout << "x[0]:" << x[0] << std::endl
Rcpp << "x[1]:" << x[1] << std::endl
<< "x[2]:" << x[2] << std::endl;
[0] = 21;
x[1] = 22;
x[2] = 23;
x
::Rcout << "x[0]:" << x[0] << std::endl
Rcpp << "x[1]:" << x[1] << std::endl
<< "x[2]:" << x[2] << std::endl;
}
R
subset_example(c(14, 38, 29))
## x[0]:14
## x[1]:38
## x[2]:29
## x[0]:21
## x[1]:22
## x[2]:23
A method is a function that is attached to an object. In C++, methods are accessed with a .
(just like python).
Rcpp vectors have a lot of useful methods.
length()
or size()
: Get total number of elements in the vector. These are equivalent.
C++
.length(); x
fill()
: Fill all elements of a vector with scalar value.
C++
.fill(2.0); x
sort()
: Return a vector that sorts the object.
C++
.sort(false); // ascending order
x.sort(true); // descending order x
begin()
and end()
returns iterators pointing to the first and last elements of the vector (we’ll talk about this later)
Let’s demonstrate these.
C++
// [[Rcpp::export]]
void method_example() {
= {1.0, 3.0, 5.0};
NumericVector x ::Rcout << "x : " << x << std::endl;
Rcpp
::Rcout << "x.length() : " << x.length() << std::endl;
Rcpp
::Rcout << "x.sort(false): " << x.sort(false) << std::endl;
Rcpp::Rcout << "x.sort(true) : " << x.sort(true) << std::endl;
Rcpp
.fill(2.0);
x::Rcout << "x.fill(2.0) : " << x << std::endl;
Rcpp}
R
method_example()
## x : 1 3 5
## x.length() : 3
## x.sort(false): 1 3 5
## x.sort(true) : 5 3 1
## x.fill(2.0) : 2 2 2
NumericVector
s have methods to remove and add values, but Dirk says this is not a good idea. So you should use std::vector<double>
objects to do this efficiently.
All for-loops in C++ are of the form
C++
for (s1; s2; s3) {
// code executed each iteration
}
s1
is code that is executed once before the for-loop. Usually this declares an integer index, such as int i = 0
.s2
is a predicate that, when true
, allows the for-loop to execute another iteration. Usually, this is i < n
if you want to go through the for-loop n
times.s3
is code that is evaluated at the end of each for-loop. This is usually i++
to add1 to i
.Almost all for-loops you write will look like this
C++
for (int i = 0; i < n; i++) {
}
i
as an integer, initializing it to be 0
.i >= n
.1
to i
.Exercise: In the above for-loop, if n = 10
, how many iterations will of the for-loop will run, 9 or 10?
Let’s recreate sum()
using Rcpp.
C++
// [[Rcpp::export]]
double sum2(NumericVector x) {
int n = x.length();
double sval = 0.0;
for (int i = 0; i < n; i++) {
+= x[i];
sval }
return sval;
}
R
<- runif(100)
x sum(x)
## [1] 50.35
sum2(x)
## [1] 50.35
Here are the microbenchmarks:
expression | min | median | itr/sec | mem_alloc | gc/sec | n_itr | n_gc | total_time |
---|---|---|---|---|---|---|---|---|
sum(x) | 339.93ns | 387.9ns | 2281833 | 0B | 0 | 10000 | 0 | 4.38ms |
sum2(x) | 1.72µs | 3.76µs | 308930 | 18.2KB | 0 | 10000 | 0 | 32.37ms |
In R for-loops, you specify the vector you iterate over. In C++, you specify the exit condition and the code to run at the end of each iteration.
In effect, this makes for(i in 1:n)
in R the same as for(int i = 0; i < n; i++)
in C++.
If you assign an Rcpp vector x
to another object y
using =
, then the value of x
is not copied to y
. Rather, x
becomes an alias for y
.
This means that if you edit x
, then that will edit y
. And if you edit y
then that will edit x
.
You can use Rcpp::clone()
to make a copy.
C++
// [[Rcpp::export]]
void copy_example() {
= {1, 2, 3};
NumericVector x ::Rcout << x << std::endl;
Rcpp
// cloning
= Rcpp::clone(x);
NumericVector z [0] = 13;
z::Rcout << x << std::endl;
Rcpp
// aliasing
= x;
NumericVector y [1] = 10;
y::Rcout << x << std::endl;
Rcpp}
R
copy_example()
## 1 2 3
## 1 2 3
## 1 10 3
This is since x
is binding to a pointer for the vector, not for the vector itself. So copying x
to y
just copies the pointer.
Infinity is encoded using R_PosInf
and R_NegInf
.
Missing values are encoded using NA_REAL
, NA_INTEGER
, NA_LOGICAL
, and NA_STRING
.
You can include these in Rcpp vectors without worrying too much.
C++
::NumericVector x(10, NA_REAL) Rcpp
E.g. if you use Rcpp Sugar then Rcpp will understand how to propagate missing values appropriately.
But if you try to use these missing values as scalars, you have to be very scared.
NA_INTEGER
is the smallest integer allowed in C++, so NA_INTEGER + 1
would not longer be considered missing data, e.g.
R
::evalCpp("NA_INTEGER") Rcpp
## [1] NA
::evalCpp("NA_INTEGER + 1") Rcpp
## [1] -2147483647
Coercing NA_LOGICAL
to a bool
will evaluate to true
, since bool
does not allow for anything except true
and false
.
R
::evalCpp("(bool)NA_LOGICAL") Rcpp
## [1] TRUE
(from Advanced R) For each of the following functions, read the code and figure out what the corresponding base R function is. You might not understand every part of the code yet, but you should be able to figure out the basics of what the function does.
C++
double f1(NumericVector x) {
int n = x.size();
double y = 0;
for(int i = 0; i < n; ++i) {
+= x[i] / n;
y }
return y;
}
C++
(NumericVector x) {
NumericVector f2int n = x.size();
(n);
NumericVector out
[0] = x[0];
outfor(int i = 1; i < n; ++i) {
[i] = out[i - 1] + x[i];
out}
return out;
}
C++
bool f3(LogicalVector x) {
int n = x.size();
for(int i = 0; i < n; ++i) {
if (x[i]) return true;
}
return false;
}
C++
int f4(Function pred, List x) {
int n = x.size();
for(int i = 0; i < n; ++i) {
= pred(x[i]);
LogicalVector res if (res[0]) return i + 1;
}
return 0;
}
C++
(NumericVector x, NumericVector y) {
NumericVector f5int n = std::max(x.size(), y.size());
= rep_len(x, n);
NumericVector x1 = rep_len(y, n);
NumericVector y1
(n);
NumericVector out
for (int i = 0; i < n; ++i) {
[i] = std::min(x1[i], y1[i]);
out}
return out;
}
Create a function in C++ call fib()
that will take as input n
and return the first n
Fibonacci numbers, where the sequence begins with 0, 1, 1, 2, 3, 5, 8, 13, …
E.g.
R
fib(1)
## [1] 0
fib(2)
## [1] 0 1
fib(10)
## [1] 0 1 1 2 3 5 8 13 21 34
(from Advanced R) Convert the following functions into C++. For now, assume the inputs have no missing values. Try not to use Rcpp Sugar (e.g. Rcpp::min()
).
all()
.cumprod()
, cummin()
, cummax()
.diff()
. Start by assuming lag 1, and then generalize for lag n.range()
.all_cpp()
output examples:
R
all_cpp(c(TRUE, TRUE, FALSE))
## [1] FALSE
all_cpp(TRUE)
## [1] TRUE
all_cpp(FALSE)
## [1] FALSE
all_cpp(c(TRUE, TRUE))
## [1] TRUE
cumprod_cpp()
output examples
R
<- 1:4
x cumprod_cpp(x)
## [1] 1 2 6 24
<- double()
x cumprod_cpp(x)
## numeric(0)
<- 3
x cumprod_cpp(x)
## [1] 3
cummin_cpp()
output examples:
R
cummin_cpp(double())
## numeric(0)
cummin_cpp(3)
## [1] 3
cummin_cpp(c(10, 11, 3, 4, 5, 1, 2))
## [1] 10 10 3 3 3 1 1
cummax_cpp()
output examples:
R
cummax_cpp(double())
## numeric(0)
cummax_cpp(3)
## [1] 3
cummax_cpp(c(10, 11, 3, 4, 5, 22, 2))
## [1] 10 11 11 11 11 22 22
diff_cpp()
output examples:
R
diff_cpp(x = c(1, 5, 10, 20), lag = 1)
## [1] 4 5 10
diff_cpp(x = c(1, 5, 10, 20), lag = 2)
## [1] 9 15
diff_cpp(x = c(1, 5, 10, 20), lag = 3)
## [1] 19
diff_cpp(x = c(1, 5, 10, 20), lag = 4)
## numeric(0)
range_cpp()
output examples:
R
range_cpp(c(9, 11, 22, 12, 1))
## [1] 1 22
range_cpp(8)
## [1] 8 8
range_cpp(double())
## [1] -Inf Inf