Every line of code needs to end with a semicolon ;
.
Assignment is with =
, not <-
.
Every variable needs to be declared ahead of time.
Comments start with //
.
Scalars are different from vectors.
Function arguments are called by position, not by name.
Counting begins at 0 (like in python), not 1. This is the most common source of headaches for R programmers migrating to C++.
Order of function definitions matter.
You can print out to the console using Rcpp with Rcpp::Rcout
and the <<
operator.
C++
// [[Rcpp::export]]
void hello_world_1() {
::Rcout << "Hello World";
Rcpp}
R
hello_world_1()
## Hello World
The void
means that our function is not returning anything.
You can chain multiple outputs together.
C++
// [[Rcpp::export]]
void hello_world_2() {
::Rcout << "Hello" << " World";
Rcpp}
R
hello_world_2()
## Hello World
If you want a new line, use endl
from the std
namespace.
C++
// [[Rcpp::export]]
void hello_world_3() {
::Rcout << "Hello"
Rcpp<< std::endl
<< "World";
}
R
hello_world_3()
## Hello
## World
Notice that we use ::
to state the library, just like in R.
Notice how putting <<
on different lines did not affect things. This is because the end of the command is wherever the semicolon ;
is.
In non Rcpp C++, you would use std::cout
instead of Rcpp::Rcout
.
Since C++ is a compiled language, and you cannot use it interactively, you often debug by printing lots of output to the console and seeing if the output is consistent with what you expect. You will use Rcpp::Rcout
a lot.
Let’s go through a basic example of the differences between R and C++.
C++
// [[Rcpp::export]]
double add2(double x, double y) {
double z; // declare z to be an double
= x + y;
z return z;
}
In the above function, we
add2()
will return a double.x
and y
must be doubles.z
is a double.z
as x
plus y
. This uses =
.z
.Notice how there is a semicolon after each line (except where curly braces are concerned).
C++ will error if you forget declarations:
C++
// This will not compile
// [[Rcpp::export]]
double add2(double x, double y) {
= x + y;
z return z;
}
In C++, scalars (elements of length 1) are different from vectors (which contain scalars).
The different basic scalars are
bool
: This is either true
or false
(all lower case). Same as a length 1 R logical
.int
: Same as a length 1 R integer
.double
: Same as a length 1 R double
.string
: Same as a length 1 R character
.Note that in C++, 1
is and int
and 1.0
is a double.
Arithmetic operations are similar
+
, -
, *
, /
C++ gives us modify-in-place operators.
x++
: Add 1 to x
.x--
: Subtract 1 from x
.x += a
: Add a
to x
.x -= a
: Subtract a
from x
.x *= a
: Multiply x by a
.x /= a
: Divide x
by a
.C++
// [[Rcpp::export]]
void arith_example() {
double x = 1.0;
::Rcout << "x : " << x << std::endl;
Rcpp
++;
x::Rcout << "x++ : " << x << std::endl;
Rcpp
--;
x::Rcout << "x-- : " << x << std::endl;
Rcpp
+= 9;
x ::Rcout << "x += 9: " << x << std::endl;
Rcpp
-= 6;
x ::Rcout << "x -= 6: " << x << std::endl;
Rcpp
*= 3;
x ::Rcout << "x *= 3: " << x << std::endl;
Rcpp
/= 2;
x ::Rcout << "x /= 2: " << x << std::endl;
Rcpp}
R
arith_example()
## x : 1
## x++ : 2
## x-- : 1
## x += 9: 10
## x -= 6: 4
## x *= 3: 12
## x /= 2: 6
In C++, you do powers with pow()
from the std
namespace.
C++
// [[Rcpp::export]]
double square(double x) {
return std::pow(x, 2.0);
}
R
square(3)
## [1] 9
Logical operations are the same, but no &
and |
&&
is and||
is or!
is notThe comparison between doubles and ints are the same
==
, !=
, >
, <
, >=
, <=
You can use these in if-else
statements, which are exactly the same as R:
C++
if (condition) {
// code
} else if {
// code
} else if {
// code
} else {
// code
}
Let’s demonstrate this by creating a C++ version of sign()
.
C++
// [[Rcpp::export]]
int signC(int x) {
if (x > 0) {
return 1;
} else if (x == 0) {
return 0;
} else {
return -1;
}
}
R
signC(-3)
## [1] -1
signC(0)
## [1] 0
signC(11)
## [1] 1
You can throw an R error by using Rcpp::stop()
. This is how you do assertions using Rcpp.
C++
// [[Rcpp::export]]
void errpos(double x) {
if (x > 0.0) {
::stop("only negative values are allowed");
Rcpp}
}
R
errpos(-3)
errpos(2)
## Error in errpos(2): only negative values are allowed
Here are a bunch of the math scalar functions I use pretty regularly.
Most of these are from the <cmath>
library from the std
namespace, or from the <Rmath>
library from the R
namespace.
If you want to check out those R header files where the functions are declared, type in R:
R
R.home("include")
You can read about the <cmath>
library at https://en.cppreference.com
Below, I will write out the return value and arguments like so
C++
::name(type, type) type lib
<cmath>
from std
double std::min(double a, double b)
: Minimum of two objects (usually scalars) of same type.
double std::max(double a, double b)
: Maximum of two objects (usually scalars) of same type.
double std::ceil(double x)
: Round a double up.
double double std::floor(double x)
: Round a double down.
double std::trunc(double x)
: Round a double closer to zero. So negative numbers up and positive numbers down.
double std::round(double x)
: Rounds double to nearest integer. Returns a double.
double std::abs(double x)
: Absolute value of a double or int.
double std::sqrt(double x)
: Computes \(\sqrt{x}\)
double std::exp(double x)
: Computes \(e^x\).
double std::exp2(double x)
: Computes \(2^x\)
double std::pow(double x, double y)
: Computes \(x^y\).
double std::log(double x)
, double std::log10(double x)
, double std::log2(double x)
, logs a double with base \(e\), \(10\), or \(2\) (respectively).
double std::lgamma(double arg)
: Computes natural log of the gamma function. Note that std::lgamma(x)
is \(\log[(x-1)!]\) for any integer \(x\), so this can be used to calculate log-factorials.
<Rmath>
from R
double R::beta(double a, double b)
: Compute the beta function
double R::lbeta(double a, double b)
: Compute the log of the beta function
double R::choose(double n, double k)
: Compute the combination, \(\binom{n}{k}\)
double R::lchoose(double n, double k)
: Compute the log of the combination, \(\log\left[\binom{n}{k}\right]\).
There are many distributions available from <Rmath>
, each is parameterized in standard ways (read about them in Wikipedia).
Functions that begin with d
return the density (for continuous distributions) or the probability mass (for discrete distributions).
Functions that begin with p
return the probability for being less than or equal to some value. This is called the cumulative distribution function.
Functions that begin with q
return the quantile. That is, you tell me a probability and I will tell you the value such that the probability of being less than that value is the provided probability.
Functions that begin with r
provide random draws from a given distribution.
The common arguments between functions are:
x
an observation of that distribution.p
a probability.lt
: lt = 0
means that p
is the upper tail probability while lt = 1
means that p
is the lower tail probability.lg
: lg = 0
means that p
is the probability while lg = 1
means that lg
is the log-probability.Below are the most common distributions.
Normal Distribution with mean and standard deviation (not variance).
mu
= \(\mu\), sigma
= \(\sigma\).C++
double R::dnorm(double x, double mu, double sigma, int lg)
double R::pnorm(double x, double mu, double sigma, int lt, int lg)
double R::qnorm(double p, double mu, double sigma, int lt, int lg)
double R::rnorm(double mu, double sigma)
Gamma Distribution with shape and scale parameters.
shp
= \(k\), scl
= \(\theta\).C++
double R::dgamma(double x, double shp, double scl, int lg)
double R::pgamma(double x, double alp, double scl, int lt, int lg)
double R::qgamma(double p, double alp, double scl, int lt, int lg)
double R::rgamma(double a, double scl)
Beta Distribution with left and right shape parameters.
a
= \(\alpha\), b
= \(\beta\).C++
double R::dbeta(double x, double a, double b, int lg)
double R::pbeta(double x, double p, double q, int lt, int lg)
double R::qbeta(double a, double p, double q, int lt, int lg)
double R::rbeta(double a, double b)
df
= \(k\).C++
double R::dchisq(double x, double df, int lg)
double R::pchisq(double x, double df, int lt, int lg)
double R::qchisq(double p, double df, int lt, int lg)
double R::rchisq(double df)
\(F\)-distribution with numerator and denominator degrees of freedom.
df1
= \(d_1\), df2
= \(d_2\).C++
double R::df(double x, double df1, double df2, int lg)
double R::pf(double x, double df1, double df2, int lt, int lg)
double R::qf(double p, double df1, double df2, int lt, int lg)
double R::rf(double df1, double df2)
\(t\)-distribution with degrees of freedom.
n
= \(\nu\).C++
double R::dt(double x, double n, int lg)
double R::pt(double x, double n, int lt, int lg)
double R::qt(double p, double n, int lt, int lg)
double R::rt(double n)
Binomial Distribution with size and success probability parameters.
n
= \(n\), p
= \(p\)C++
double R::dbinom(double x, double n, double p, int lg)
double R::pbinom(double x, double n, double p, int lt, int lg)
double R::qbinom(double p, double n, double m, int lt, int lg)
double R::rbinom(double n, double p)
Exponential Distribution with scale parameter.
sl
= \(1 / \lambda\) (not \(\lambda\)).C++
double R::dexp(double x, double sl, int lg)
double R::pexp(double x, double sl, int lt, int lg)
double R::qexp(double p, double sl, int lt, int lg)
double R::rexp(double sl)
C++
// [[Rcpp::export]]
double exp_demo() {
return R::dexp(1, 2, 0);
}
R
exp_demo() ## uses scale
## [1] 0.3033
dexp(x = 1, rate = 1/2, log = FALSE) ## uses rate
## [1] 0.3033
Geometric Distribution with success probability \(p\). This uses the model formulation that counts the number of failures before a success.
p
= \(p\).C++
double R::dgeom(double x, double p, int lg)
double R::pgeom(double x, double p, int lt, int lg)
double R::qgeom(double p, double pb, int lt, int lg)
double R::rgeom(double p)
C++
// [[Rcpp::export]]
double g_demo() {
return R::dgeom(0, 0.7, 0);
}
R
g_demo() ## 70% prob of a success on the first trial, so 0 failures.
## [1] 0.7
x
= \(k\) = Number of white balls drawn.r
= \(K\) = number of white balls in urn.b
= \(N - K\) = number of black balls in urn.n
= \(n\) = number of balls drawn.C++
double R::dhyper(double x, double r, double b, double n, int lg)
double R::phyper(double x, double r, double b, double n, int lt, int lg)
double R::qhyper(double p, double r, double b, double n, int lt, int lg)
double R::rhyper(double r, double b, double n)
Poisson Distribution with mean parameter.
lb
= \(\lambda\).C++
double R::dpois(double x, double lb, int lg)
double R::ppois(double x, double lb, int lt, int lg)
double R::qpois(double p, double lb, int lt, int lg)
double R::rpois(double mu)
Write a function called square2()
that squares a number, but does not use std::pow()
. Which one is faster?
Write a function called quadform()
in C++ that prints the output (using Rcpp::Rcout
) of the quadratic formula. That is \[
x = \frac{-b \pm \sqrt{b^2 - 4ac}}{2a}
\] It should print two numbers to the console if \(b^2 - 4ac > 0\), one number if \(b^2 - 4ac = 0\), and should print "No Solution"
if \(b^2 - 4ac < 0\).
R
quadform(1, 2, 1)
## Solution: -1
quadform(2, 2, 1)
## No Solution
quadform(1, 4, 1)
## Solution 1: -3.73205
## Solution 2: -0.267949
The beta function is defined in terms of gamma functions as follows \[ \mathrm{B}(x, y) = \frac{\Gamma(x)\Gamma(y)}{\Gamma(x + y)} \]
R has its own version
R
beta(1.5, 2.2)
## [1] 0.2341
Use just std::lgamma()
and std::exp()
to write your own beta function, called beta2()
. You should include an argument lg
which is a flag for returning either the log-beta or the beta. lg
should default to false
. E.g.
R
beta2(1.5, 2.2)
## [1] 0.2341
beta2(1.5, 2.2, TRUE)
## [1] -1.452
Instead of shape and scale, some parameterizations of the gamma distribution use shape and rate (1 over the scale). stats::dgamma()
implements both of these in R, but R::dgamma()
in C++ only implements the scale parameterization. Implement a density function for the gamma that uses shape and rate parameterization. E.g.
R
dgamma_r(x = 1, shp = 2, rt = 3, lg = 0)
## [1] 0.4481
dgamma(x = 1, shape = 2, rate = 3, log = FALSE)
## [1] 0.4481