If you need to use numeric matrices or arrays in C++, you should use the Armadillo C++ library.
Eigen is also a really great option, but appears to be less popular in the R community. You can read the docs here.
Armadillo contains fast linear algebra algorithms, as well as a nice interface for working with matrices.
You can set up an R package to use Armadillo by running the following R:
R
::use_rcpp_armadillo() usethis
This will add RcppArmadillo
to the LinkingTo
field in the DESCRIPTION
file. It will also create Makevars
files to set compilation settings.
At the top of each C++ file, you now include the following (instead of just including Rcpp):
C++
#include <RcppArmadillo.h>
// [[Rcpp::depends(RcppArmadillo)]]
using namespace Rcpp;
using namespace arma;
If you are using Armadillo for matrices, you should be using Armadillo vectors instead of NumericVector
s, so that it is easier to interact between matrices and vectors.
Armadillo vectors are just like NumericVector
’s with some minor differences. That is, there are no character or logical Armadillo vectors.
Armadillo vectors are called Col
objects (for “column vector”).
Initialize with
C++
::vec x(n_elem) arma
You can specify a default value for the vector with
C++
// fill with zeros
::vec x(n_elem, arma::fill::zeros);
arma
// fill with ones
::vec y(3, arma::fill::ones);
arma
// fill with specified value
::vec z(3, arma::fill::value(23.0)); arma
You can create custom vectors with braces
C++
::vec w = {4.3, 2.2, -9.4}; arma
Let’s demonstrate:
// [[Rcpp::export]]
void ex1() {
(3, fill::zeros);
vec x::Rcout << x << std::endl;
Rcpp
(3, fill::ones);
vec y::Rcout << y << std::endl;
Rcpp
(3, fill::value(23.0));
vec z::Rcout << z << std::endl;
Rcpp
= {4.3, 2.2, -9.4};
vec w ::Rcout << w << std::endl;
Rcpp}
R
ex1()
## 0
## 0
## 0
##
## 1.0000
## 1.0000
## 1.0000
##
## 23.0000
## 23.0000
## 23.0000
##
## 4.3000
## 2.2000
## -9.4000
Of course, you typically get Armadillo vectors from a user.
C++
// [[Rcpp::export]]
(vec x) {
vec ex2return x;
}
R
ex2(c(1, 4, 2))
## [,1]
## [1,] 1
## [2,] 4
## [3,] 2
If you return an Armadillo vector, then R will interpret this as a matrix
object with one column. But you can pass R vectors as inputs to Col
objects.
You can get and set elements using parentheses notation ()
. You can use brackets as well, but this is more dangerous since Armadillo then won’t do a bounds check. So you should always get and set elements by parentheses.
C++
// [[Rcpp::export]]
void ex3() {
= {6.2, 5.3, 1.1};
vec x ::Rcout << "x(0): " << x(0) << std::endl;
Rcppdouble y = x(1);
::Rcout << "y: " << y << std::endl;
Rcpp(2) = 8.9;
x::Rcout << "x(2): " << x(2) << std::endl;
Rcpp}
R
ex3()
## x(0): 6.2
## y: 5.3
## x(2): 8.9
You get the number elements in a vector with
C++
.n_elem x
You do arithmetic operations on vectors as you would in R, but note that element-wise multiplication is different.
C++
+ y
x - y
x % y // element-wise multiplication
x / y x
You can do lots of miscellaneous element-wise functions (table from Armadillo docs):
Armadillo | Explanation |
---|---|
exp(A) |
base-e exponential: \(e^x\) |
exp2(A) |
base-2 exponential: \(2^x\) |
exp10(A) |
base-10 exponential: \(10^x\) |
expm1(A) |
compute \(\exp(A)-1\) accurately for values of A close to zero |
log(A) |
natural log: \(\log_e(x)\) |
log2(A) |
base-2 log: \(\log_2(x)\) |
log10(A) |
base-10 log: \(\log_{10}(x)\) |
log1p(A) |
compute \(\log(1+A)\) accurately for values of A close to zero |
pow(A, p) |
raise to the power of p: \(x^p\) |
square(A) |
square: \(x^2\) |
sqrt(A) |
square root: \(\sqrt{x}\) |
floor(A) |
largest integral value that is not greater than the input value |
ceil(A) |
smallest integral value that is not less than the input value |
round(A) |
round to nearest integer, with halfway cases rounded away from zero |
trunc(A) |
round to nearest integer, towards zero |
lgamma(A) |
natural log of the absolute value of gamma function |
sign(A) |
sign function |
The usual statistics functions are available, but note the different name for standard deviation: mean()
, median()
, stddev()
, var()
, range()
Exercise: Implement cumprod()
, cummin()
, cummax()
, and diff()
using Armadillo vectors instead of NumericVector
s. You can copy and paste your solutions from the exercises in the Rcpp vectors lecture and then modify them to use armadillo vectors.
Hint: You can make a length zero Col
object by
C++
(1);
vec x.clear(); x
Armadillo calls matrix objects Mat
. They contain doubles.
You initialize Armadillo matrices with
C++
::max A(n_rows, n_cols); arma
You can specify a default value for the vector with
C++
// fill with zeros
::mat A1(n_rows, n_cols, arma::fill::zeros);
arma
// fill with ones
::mat A2(n_rows, n_cols, arma::fill::ones);
arma
// fill with specified value
::mat A3(n_rows, n_cols, arma::fill::value(23.0)); arma
You can create custom matrices with braces. Internal braces are the rows.
C++
// Three rows, two columns
::mat A4 = {{4.3, 2.2}, {-9.4, 7.0}, {10.0, 11.0}}; arma
Let’s demonstrate these
C++
// [[Rcpp::export]]
void ex4() {
int n_rows = 3;
int n_cols = 2;
::mat A1(n_rows, n_cols, arma::fill::zeros);
arma::Rcout << A1 << std::endl;
Rcpp
::mat A2(n_rows, n_cols, arma::fill::ones);
arma::Rcout << A2 << std::endl;
Rcpp
::mat A3(n_rows, n_cols, arma::fill::value(23.0));
arma::Rcout << A3 << std::endl;
Rcpp
::mat A4 = {{4.3, 2.2}, {-9.4, 7.0}, {10.0, 11.0}};
arma::Rcout << A4 << std::endl;
Rcpp}
R
ex4()
## 0 0
## 0 0
## 0 0
##
## 1.0000 1.0000
## 1.0000 1.0000
## 1.0000 1.0000
##
## 23.0000 23.0000
## 23.0000 23.0000
## 23.0000 23.0000
##
## 4.3000 2.2000
## -9.4000 7.0000
## 10.0000 11.0000
Element-wise arithmetic operations are the same as for vectors:
C++
+ B;
A - B;
A % B; // element-wise multiplication
A / B; A
Element-wise functions (e.g. log()
and exp()
) are the exact same as in vectors.
You get and set elements by parentheses notation. Always remember that indexing starts at 0.
C++
(0, 2); // get (0,2)th element
Adouble x = A(4, 2); // get (4,2)th element and assign to x
(1, 0) = 12.0; // set (1,0)th element A
Let’s demonstrate these
C++
// [[Rcpp::export]]
void ex5() {
= {{1.0, 2.0, 3.0}, {4.0, 5.0, 6.0}};
mat A
::Rcout << "A: " << A << std::endl;
Rcpp
::Rcout << "A(0, 2): " << A(0, 2) << std::endl;
Rcpp
(1, 0) = 12.0;
A::Rcout << "A(1, 0): " << A(1, 0) << std::endl;
Rcpp
= A + 2.0;
A ::Rcout << "A: " << A << std::endl;
Rcpp}
R
ex5()
## A: 1.0000 2.0000 3.0000
## 4.0000 5.0000 6.0000
##
## A(0, 2): 3
## A(1, 0): 12
## A: 3.0000 4.0000 5.0000
## 14.0000 7.0000 8.0000
You can extract columns and rows with A.col()
and A.row(s)
. When you do this, it uses value semantics, not reference semantics, so you can edit any newly defined Armadillo vectors without modifying A
. You can also set columns with A.col(col_num) = x
.
C++
// [[Rcpp::export]]
void ex6() {
= {{1.0, 2.0}, {3.0, 4.0}, {5.0, 6.0}};
mat A ::Rcout << "A:" << std::endl << A << std::endl;
Rcpp
= A.col(1); // second column
vec x ::Rcout << "x:" << std::endl << x << std::endl;
Rcpp
(0) = 10.0;
x::Rcout << "x:" << std::endl << x << std::endl; // changed
Rcpp::Rcout << "A:" << std::endl << A << std::endl; // unchanged
Rcpp
= {7.0, 8.0, 9.0};
vec a .col(0) = a;
A::Rcout << "A:" << std::endl << A << std::endl; // unchanged
Rcpp}
R
ex6()
## A:
## 1.0000 2.0000
## 3.0000 4.0000
## 5.0000 6.0000
##
## x:
## 2.0000
## 4.0000
## 6.0000
##
## x:
## 10.0000
## 4.0000
## 6.0000
##
## A:
## 1.0000 2.0000
## 3.0000 4.0000
## 5.0000 6.0000
##
## A:
## 7.0000 2.0000
## 8.0000 4.0000
## 9.0000 6.0000
Number of rows/columns/elements from a Mat
object:
A.n_elem
: Number of elements in matrix A
.A.n_rows
: Number of rows in matrix A
.A.n_cols
: Number of columns in matrix A
.Generate matrix of ones.
C++
= ones<mat>(n_rows, n_cols); mat A
Generate \(n\times n\) identity matrix.
C++
= eye(n_rows, n_cols); mat id
Diagonal matrix, where v
is a Col
that you already have defined.
C++
= diagmat(v); mat d
Let’s demonstrate making these special matrices:
C++
// [[Rcpp::export]]
void ex7() {
= ones<mat>(2, 3);
mat A ::Rcout << "A: " << std::endl << A << std::endl;
Rcpp
= eye(4, 4);
mat id ::Rcout << "id: " << std::endl << id << std::endl;
Rcpp
= {3.0, 2.2, 5.5};
vec v = diagmat(v);
mat d ::Rcout << "d: " << std::endl << d << std::endl;
Rcpp}
R
ex7()
## A:
## 1.0000 1.0000 1.0000
## 1.0000 1.0000 1.0000
##
## id:
## 1.0000 0 0 0
## 0 1.0000 0 0
## 0 0 1.0000 0
## 0 0 0 1.0000
##
## d:
## 3.0000 0 0
## 0 2.2000 0
## 0 0 5.5000
Rcpp has LogicalMatrix
, IntegerMatrix
, NumericMatrix
, and CharacterMatrix
objects that I am not covering.
This is because you mostly use C++ for numerical computing, so it’s less usual to use a CharacterMatrix
in C++ (I would say you are crazy for trying to use C++ when you are mostly dealing with strings). And for numerical computing, linear algebra is king, and Armadillo has nice linear algebra routines.
C++
* B; A
C++
.t(); A
Solving linear systems: Solving for \(X\) in \[
AX = B
\] results in \[
X = A^{-1}B
\] solve()
will calculate \(A^{-1}B\) numerically stably.
C++
(A, B); solve
C++
(A); inv
C++
(A); pinv
C++
.diag(); A
C++
(A); trace
Cholesky Decomposition: Find an upper triangular matrix \(R\) with non-negative diagonal elements such that \(X = R^TR\). Applicable when \(X\) is symmetric.
C++
= chol(X); mat R
Eigendecomposition: Find orthogonal matrix \(U\) and diagonal matrix \(D\) such that \(X = UDU^T\). Applicable when \(X\) is symmetric.
You need to create (i) a vector containing the main diagonal of \(D\) and (ii) a matrix that will hold the eigenvectors. You do that ahead of time. Then you run eig_sym()
.
C++
; // main diagonal of D matrix
vec eigval; // U matrix
mat eigvec(eigval, eigvec, B); eig_sym
Singular Value Decomposition: Find orthogonal matrices \(U\) and \(V\), and diagonal matrix \(D\) such that \(X = UDV^T\).
You need to create (i) a vector containing the main diagonal of \(D\) and (ii) matrices that will hold the left and right singular vectors. You do that ahead of time. Then you run svd()
.
C++
;
mat U;
vec s;
mat V(U,s,V,X); svd
QR Decomposition: Find an orthogonal matrix \(Q\) and an upper triangular matrix \(R\) with non-negative diagonal entries, such that \(X = QR\).
C++
;
mat Q;
mat R(Q, R, X); qr
Sometimes, you want to convert a \(1\times 1\) matrix to a double. This usually happens when you do a dot product. You can do that with arma::as_scalar()
.
C++
= {1.0, 2.0, 3.0};
vec x double ss = as_scalar(x.t() * x);
Example: This isn’t the most efficient way to find the OLS estimates from a regression, but we can just calculate
\[ \hat{\beta} = (X^TX)^{-1}X^Ty \]
Let’s do this in C++:
C++
// [[Rcpp::export]]
(mat X, vec y) {
vec stupid_ols= inv(X.t() * X) * X.t() * y;
vec betahat return betahat;
}
Let’s verify our results.
R
## Armadillo Way
<- model.matrix(mpg ~ wt, data = mtcars)
X <- mtcars$mpg
y stupid_ols(X = X, y = y)
## [,1]
## [1,] 37.285
## [2,] -5.344
## R Way
coef(lm(mpg ~ wt, data = mtcars))
## (Intercept) wt
## 37.285 -5.344
I called it stupid_ols()
because calculating an inverse is generally a computationally unstable task (but in this scenario, it’s probably OK). In numerical computing, you try to avoid calculating an inverse unless you are returning the inverse to a user. In which case, you should calculate the inverse only during the final step.
It’s better to use solve()
here, but folks have thought very hard about this seemingly simple problem (Farebrother 1988):
C++
// [[Rcpp::export]]
(mat X, vec y) {
vec arma_ols= solve(X.t() * X, X.t() * y);
vec betahat return betahat;
}
R
arma_ols(X = X, y = y)
## [,1]
## [1,] 37.285
## [2,] -5.344
Let’s compare performance when we have a large number of covariates (note: lm()
returns a lot more stuff, so this only says that when we only want coefficient estimates we can do better).
R
<- model.matrix(mpg ~ ., data = mtcars)
X <- mtcars$mpg
y
::mark(
benchcoef(lm(mpg ~ wt, data = mtcars)),
stupid_ols(X = X, y = y),
arma_ols(X = X, y = y),
check = FALSE
1:8] |>
)[::kable() knitr
expression | min | median | itr/sec | mem_alloc | gc/sec | n_itr | n_gc |
---|---|---|---|---|---|---|---|
coef(lm(mpg ~ wt, data = mtcars)) | 402.12µs | 419.75µs | 2262 | 5.18KB | 21.92 | 1032 | 10 |
stupid_ols(X = X, y = y) | 5.44µs | 6.63µs | 149921 | 18.2KB | 0.00 | 10000 | 0 |
arma_ols(X = X, y = y) | 7.89µs | 9.21µs | 107100 | 6.62KB | 10.71 | 9999 | 1 |
Exercise: The fitted values are defined as \[
\hat{y} = X\hat{\beta} = X(X^TX)^{-1}X^Ty.
\] The residuals are defined as \[
e = y - \hat{y} = (I_n - X(X^TX)^{-1}X^T)y.
\] The MSE is defined as \[
\hat{\sigma}^2 = e^Te / (n - p),
\] where \(n\) is the samples size (number of rows in \(X\)) and \(p\) is the number of parameters (number of columns in \(X\)). The estimated covariance matrix of \(\hat{\beta}\) is \[
\hat{\sigma}^2 (X^TX)^{-1}.
\] Implement all of these equations in Armadillo in one function and return the result as a list. Compare your calculations to the result of stats::lm()
.
In the below table, A
and B
are a matrices and Q
is a three-dimensional array (or “cube”). This table is based on the Matlab/Octave to Armadillo conversion table here.
R | Armadillo | Notes |
---|---|---|
A[1, 1] |
A(0, 0) |
indexing in Armadillo starts at 0 |
A[k, k] |
A(k-1, k-1) |
|
nrow(A) |
A.n_rows |
read only |
ncol(A) |
A.n_cols |
|
dim(A)[[3]] |
Q.n_slices |
Q is a cube (3D array) |
length(A) |
A.n_elem |
|
A[, k] |
A.col(k) |
this is a conceptual example only; exact conversion from R to Armadillo syntax will require taking into account that indexing starts at 0 |
A[k, ] |
A.row(k) |
|
A[, p:q] |
A.cols(p,q) |
|
A[p:q, ] |
A.rows(p,q) |
|
A[p:q, r:s] |
A(span(p,q),span(r,s)) |
A(span(first_row,last_row), span(first_col,last_col)) |
Q[, , k] |
Q.slice(k) |
Q is a cube (3D array) |
Q[, , t:u] |
Q.slices(t,u) |
|
Q[p:q, r:s, t:u] |
Q(span(p,q),span(r,s),span(t,u)) |
|
t(A) |
A.t() ortrans(A) |
matrix transpose |
A[] <- 0 |
A.zeros() |
Fill a matrix with 0’s |
A[] <- 1 |
A.ones() |
Fill a matrix with 1’s |
A <- matrix(0, nrow = k, ncol = k) |
A = zeros<mat>(k,k) |
Initialize a 0 matrix |
A <- matrix(1, nrow = k, ncol = k) |
A = ones<mat>(k,k) |
Initialize a 1 matrix |
A * B |
A % B |
element-wise multiplication |
A / B |
A / B |
element-wise division |
solve(A, B) |
solve(A, B) |
Solve linear equation \(Ax = B\) for \(A\) |
A <- A + 1; |
A++ |
|
A <- A - 1; |
A-- |
|
A <- matrix(c(1, 2, 3, 4), nrow = 2, ncol = 2, byrow = TRUE) |
A = {{1,2}, {3,4}} |
element initialization |
X <- c(A) |
X = vectorise(A) |
|
X = cbind(A, B) |
X = join_horiz(A, B) |
|
X = rbind(A, B) |
X = join_vert(A, B) |
|
A |
cout << A << endl or A.print("A=") |