C++ is pretty esoteric — you have to keep track of a lot of boilerplate to implement anything.
Rcpp Sugar is a set of functions that comes with Rcpp to allow you to code in C++ in ways that looks like R code.
This is not how C++ programmers do things, but it makes it easy for you (an expert R programmer) to code in C++ when you want just something to help your R code.
With Rcpp Sugar, you can do vectorized arithmetic operations as you would in R, as long as you are using Rcpp vectors (examples below from Rcpp-sugar).
C++
// two numeric vectors of the same size
;
NumericVector x;
NumericVector y
// expressions involving two vectors
= x + y;
NumericVector res = x - y;
NumericVector res = x * y;
NumericVector res = x / y;
NumericVector res
// one vector, one single value
= x + 2.0;
NumericVector res = 2.0 - x;
NumericVector res = y * 2.0;
NumericVector res = 2.0 / y; NumericVector res
You can create LogicalVector
s as you would in R using logical operators.
C++
// two integer vectors of the same size
;
NumericVector x;
NumericVector y
// expressions involving two vectors
= x < y;
LogicalVector res = x > y;
LogicalVector res = x <= y;
LogicalVector res = x >= y;
LogicalVector res = x == y;
LogicalVector res = x != y;
LogicalVector res
// one vector, one single value
= x < 2;
LogicalVector res = 2 > x;
LogicalVector res = y <= 2;
LogicalVector res = 2 != y; LogicalVector res
All of these take a numeric vector in, and return a numeric vector.
abs()
: Absolute valueceil()
: Round up.cummax()
: Cumulative maximum.cummin()
: Cumulative minimum.cumprod()
: Cumulative product.cumsum()
: Cumulative summation.exp()
: Exponentiation \(e^x\).expm1()
: Exponentiation minus 1 (numerically stable for x
close to 0). \(e^x - 1\).factorial()
: \(x!\)floor()
: Round down.gamma()
: Gamma function.lbeta()
: Log of the beta function.lchoose()
: Log of combination.lfactorial()
: Log of factorial.lgamma()
: Log of gamma functionlog()
: Natural logarithm.log10()
: Base 10 logarithm.log1p()
: Log of one plus (numerically stable for x
close to 0). \(\log(1 + x)\).pmin()
: Point-wise minimum of two-vectors.pmax()
: Point-wise maximum of two-vectors.pow()
: Powers.round()
: Round to nearest integer.sqrt()
: Square rootAll of these take a numeric vector in, and return a scalar.
max()
: Maximum.mean()
: Sample arithmetic mean.median()
: Median.min()
: Minimum.range()
: Returns a NumericVector
of length 2 that contains the minimum and the maximum.sd()
: Standard deviation.sum()
: Summation.var()
: Variance.all()
: returns TRUE
if all object elements are TRUE
.
any()
: Returns TRUE
if any of the elements are TRUE
.
ifelse()
: Vectorized if-else
statements.
is_finite()
: Returns TRUE
if not infinite.
is_infinite()
: Returns TRUE
if infinite.
is_na()
: Returns TRUE
if missing.
is_nan()
: Returns TRUE
if nan
.
All of these return LogicalVector
s, which are built on int
not bool
(because in R there is NA
, but there is no NA
equivalent in bool
).
One consequence is that all()
and any()
do not return a bool
object, so you cannot use them directly in if-else
statements.
Instead, wrap all()
and any()
inside is_true()
or is_false()
to convert it to a bool
.
C++
if (is_true(all(v))) {
}
Another consequence is that you should not use individual elements of LogicalVector
’s in if-then
statements because NA
’s will evaluate to the bool type true
.
Instead, check if an element is TRUE
(which is an object defined by Rcpp different from true
).
E.g. if v
is a LogicalVector
then do:
C++
if (v[i]==TRUE) {
} else if (v[i]==FALSE) {
} else if (v[i]==NA_LOGICAL) {
} else {
}
Example:
C++
::NumericVector x = {1.0, 2.0, 3.0};
Rcpp::LogicalVector z = x > 2;
Rcppif (z(2) == TRUE) {
::Rcout << "Hello" << std::endl;
Rcpp}
diff(x)
: Lagged differences.match(v, table)
: Same as R’s match()
function. Note that the indexing starts at 1 (R style), not 0, in what is returned.rep(x, n)
: Repeat a vector n
times.rev(x)
: Return a vector whose elements are in reverse order.sample(Vector x, int size, replace = false, probs = R_NilValue)
: Sample from a vector. Same as the R function sample()
.seq(start, end)
: Returns a vector of consecutive integers from start to end.seq_along(x)
: Returns a vector of consecutive integers from 1
to the length of a vector (does not start at 0).seq_len(n)
: Returns a vector of consecutive integers from 1
to n
(does not start at 0).setdiff(v1, v2)
: Returns a vector of differences.setequal(v1, v2)
: Returns true
if unique elements in v1
equal unique elements in v2
.unique(v)
: Returns a vector of unique values of v.which_max(v)
: Returns the numerical index of the largest element. This is C-style indices, so starts at 0.which_min(v)
: Returns the numerical index of the smallest element. This is C-style indices, so starts at 0.d
/q
/p
/r
(density/quantile/probability/random generation) functions exist for a variety of distributions. These allow for inputting and outputting NumericVector
s (whereas the Rmath
library only allows for scalar inputs and outputs).
See Distribution functions for details on parameterizations.
dbeta()
, qbeta()
, pbeta()
, rbeta()
: Beta distribution.
dbinom()
, qbinom()
, pbinom()
, rbinom()
: Binomial distribution.
dchisq()
, qchisq()
, pchisq()
, rchisq()
: Chi-squared distribution.
dexp()
, qexp()
, pexp()
, rexp()
: Exponential distribution using scale (and not rate) parameterization.
df()
, qf()
, pf()
, rf()
: \(F\)-distribution.
dgamma()
, qgamma()
, pgamma()
, rgamma()
: Gamma distribution using the shape and scale parameterization.
dgeom()
, qgeom()
, pgeom()
, rgeom()
: Geometric distribution.
dhyper()
, qhyper()
, phyper()
, rhyper()
: Hypergeometric distribution.
dnbinom()
, qnbinom()
, pnbinom()
, rnbinom()
: Negative Binomial distribution.
dnorm4()
, qnorm5()
, pnorm5()
, rnorm()
: Normal distribution.
dpois()
, qpois()
, ppois()
, rpois()
: Poisson distribution.
dt()
, qt()
, pt()
, rt()
: \(t\)-distribution.
dunif()
, qunif()
, punif()
, runif()
: Uniform distribution.
Rcpp provides a List
class that you can use to return (named) lists to R. I know three ways to create them:
Create an unnamed list:
C++
::List L1 = Rcpp::List::create(x, y, z); Rcpp
Create a named list:
C++
::List L2 = Rcpp::List::create(Rcpp::Named("name1") = x, _["name2"] = y, _["name3"] = z); Rcpp
Create a list then add elements.
C++
::List L3;
Rcpp["x"] = x;
L3["y"] = y;
L3["z"] = z; L3
Let’s demonstrate these:
C++
// [[Rcpp::export]]
(NumericVector x, bool y, CharacterVector z) {
List gl1= List::create(x, y, z);
List L1 return L1;
}
// [[Rcpp::export]]
(NumericVector x, bool y, CharacterVector z) {
List gl2= List::create(Named("x") = x, _["y"] = y, _["z"] = z);
List L2 return L2;
}
// [[Rcpp::export]]
(NumericVector x, bool y, CharacterVector z) {
List gl3;
List L3["x"] = x;
L3["y"] = y;
L3["z"] = z;
L3return L3;
}
R
<- c(1, 2.1, 43)
x <- TRUE
y <- c("hello", "world")
z
gl1(x, y, z)
## [[1]]
## [1] 1.0 2.1 43.0
##
## [[2]]
## [1] TRUE
##
## [[3]]
## [1] "hello" "world"
gl2(x, y, z)
## $x
## [1] 1.0 2.1 43.0
##
## $y
## [1] TRUE
##
## $z
## [1] "hello" "world"
gl3(x, y, z)
## $x
## [1] 1.0 2.1 43.0
##
## $y
## [1] TRUE
##
## $z
## [1] "hello" "world"
You usually only create List
s at the very end of a function that you then want to return to R.
Use Rcpp Sugar to create a function called msamp()
where the user inputs a sample size n
and the output is the mean of a random sample of size n
from a standard uniform distribution (over \([0,1]\)). E.g.
R
set.seed(1)
msamp(1)
## [1] 0.2655
msamp(10)
## [1] 0.5456
msamp(100)
## [1] 0.5229
msamp(1000)
## [1] 0.4957
The multinomial distribution has PMF \[
\frac{n!}{x_1!x_2!\cdots x_k!}p_1^{x_1}p_2^{x_2}\ldots p_k^{x_k}
\] Here, \(x_i\) is the number of counts in category \(i\) and \(n\) is the total number of counts, so \(n = x_1 + x_2 + \cdots + x_n\). Also, \(p_i\) is the probability of category \(i\). Use Rcpp Sugar to create a function called dmultinom_cpp()
that will take as input a NumericVector
x
, a NumericVector
p
, and a bool
lg
and return the log-PMF if lg = true
and return the PMF if lg = false
. For numerical stability, you should calculate the log values at each step and only exponentiate at the end (this is very typical in numerical computing). You can assume that all values of \(p_i\) are non-zero.
Hint: We have the relationship that \(\Gamma(n + 1) = n!\). This is needed for doing log-factorial on scalars.
R
dmultinom(x = c(1, 4, 2), p = c(0.1, 0.7, 0.2), log = TRUE)
## [1] -2.294
dmultinom_cpp(x = c(1, 4, 2), p = c(0.1, 0.7, 0.2), lg = TRUE)
## [1] -2.294
dmultinom(x = c(1, 4, 2), p = c(0.1, 0.7, 0.2), log = FALSE)
## [1] 0.1008
dmultinom_cpp(x = c(1, 4, 2), p = c(0.1, 0.7, 0.2), lg = FALSE)
## [1] 0.1008
Write a function called scale01()
that takes as input a NumericVector
, subtracts the minimum value, and divides by the maximum value of the resulting vector. So it scales all observations to be between 0 and 1. Use Rcpp Sugar.
R
<- 6:10
x scale01(x)
## [1] 0.00 0.25 0.50 0.75 1.00