Learning Objectives

Motivation

Arithmetic Operations

Logical Operations

R-like Functions

Vector to Vector Math Functions

All of these take a numeric vector in, and return a numeric vector.

  • abs(): Absolute value
  • ceil(): Round up.
  • cummax(): Cumulative maximum.
  • cummin(): Cumulative minimum.
  • cumprod(): Cumulative product.
  • cumsum(): Cumulative summation.
  • exp(): Exponentiation \(e^x\).
  • expm1(): Exponentiation minus 1 (numerically stable for x close to 0). \(e^x - 1\).
  • factorial(): \(x!\)
  • floor(): Round down.
  • gamma(): Gamma function.
  • lbeta(): Log of the beta function.
  • lchoose(): Log of combination.
  • lfactorial(): Log of factorial.
  • lgamma(): Log of gamma function
  • log(): Natural logarithm.
  • log10(): Base 10 logarithm.
  • log1p(): Log of one plus (numerically stable for x close to 0). \(\log(1 + x)\).
  • pmin(): Point-wise minimum of two-vectors.
  • pmax(): Point-wise maximum of two-vectors.
  • pow(): Powers.
  • round(): Round to nearest integer.
  • sqrt(): Square root

Summary Math Functions

All of these take a numeric vector in, and return a scalar.

  • max(): Maximum.
  • mean(): Sample arithmetic mean.
  • median(): Median.
  • min(): Minimum.
  • range(): Returns a NumericVector of length 2 that contains the minimum and the maximum.
  • sd(): Standard deviation.
  • sum(): Summation.
  • var(): Variance.

Predicate Functions

  • all(): returns TRUE if all object elements are TRUE.

  • any(): Returns TRUE if any of the elements are TRUE.

  • ifelse(): Vectorized if-else statements.

  • is_finite(): Returns TRUE if not infinite.

  • is_infinite(): Returns TRUE if infinite.

  • is_na(): Returns TRUE if missing.

  • is_nan(): Returns TRUE if nan.

  • All of these return LogicalVectors, which are built on int not bool (because in R there is NA, but there is no NA equivalent in bool).

  • One consequence is that all() and any() do not return a bool object, so you cannot use them directly in if-else statements.

  • Instead, wrap all() and any() inside is_true() or is_false() to convert it to a bool.

    C++

    if (is_true(all(v))) {
    
    }
  • Another consequence is that you should not use individual elements of LogicalVector’s in if-then statements because NA’s will evaluate to the bool type true.

  • Instead, check if an element is TRUE (which is an object defined by Rcpp different from true).

  • E.g. if v is a LogicalVector then do:

    C++

    if (v[i]==TRUE) {
    
    } else if (v[i]==FALSE) {
    
    } else if (v[i]==NA_LOGICAL) {
    
    } else {
    
    }
  • Example:

    C++

    Rcpp::NumericVector x = {1.0, 2.0, 3.0};
    Rcpp::LogicalVector z = x > 2;
    if (z(2) == TRUE) {
      Rcpp::Rcout << "Hello" << std::endl;
    }

Utility Funtions

  • diff(x): Lagged differences.
  • match(v, table): Same as R’s match() function. Note that the indexing starts at 1 (R style), not 0, in what is returned.
  • rep(x, n): Repeat a vector n times.
  • rev(x): Return a vector whose elements are in reverse order.
  • sample(Vector x, int size, replace = false, probs = R_NilValue): Sample from a vector. Same as the R function sample().
  • seq(start, end): Returns a vector of consecutive integers from start to end.
  • seq_along(x): Returns a vector of consecutive integers from 1 to the length of a vector (does not start at 0).
  • seq_len(n): Returns a vector of consecutive integers from 1 to n (does not start at 0).
  • setdiff(v1, v2): Returns a vector of differences.
  • setequal(v1, v2): Returns true if unique elements in v1 equal unique elements in v2.
  • unique(v): Returns a vector of unique values of v.
  • which_max(v): Returns the numerical index of the largest element. This is C-style indices, so starts at 0.
  • which_min(v): Returns the numerical index of the smallest element. This is C-style indices, so starts at 0.

Statistical Functions

Lists

Exercises:

  1. Use Rcpp Sugar to create a function called msamp() where the user inputs a sample size n and the output is the mean of a random sample of size n from a standard uniform distribution (over \([0,1]\)). E.g.

    R

    set.seed(1)
    msamp(1)
    ## [1] 0.2655
    msamp(10)
    ## [1] 0.5456
    msamp(100)
    ## [1] 0.5229
    msamp(1000)
    ## [1] 0.4957
  2. The multinomial distribution has PMF \[ \frac{n!}{x_1!x_2!\cdots x_k!}p_1^{x_1}p_2^{x_2}\ldots p_k^{x_k} \] Here, \(x_i\) is the number of counts in category \(i\) and \(n\) is the total number of counts, so \(n = x_1 + x_2 + \cdots + x_n\). Also, \(p_i\) is the probability of category \(i\). Use Rcpp Sugar to create a function called dmultinom_cpp() that will take as input a NumericVector x, a NumericVector p, and a bool lg and return the log-PMF if lg = true and return the PMF if lg = false. For numerical stability, you should calculate the log values at each step and only exponentiate at the end (this is very typical in numerical computing). You can assume that all values of \(p_i\) are non-zero.

    Hint: We have the relationship that \(\Gamma(n + 1) = n!\). This is needed for doing log-factorial on scalars.

    R

    dmultinom(x = c(1, 4, 2), p = c(0.1, 0.7, 0.2), log = TRUE)
    ## [1] -2.294
    dmultinom_cpp(x = c(1, 4, 2), p = c(0.1, 0.7, 0.2), lg = TRUE)
    ## [1] -2.294
    dmultinom(x = c(1, 4, 2), p = c(0.1, 0.7, 0.2), log = FALSE)
    ## [1] 0.1008
    dmultinom_cpp(x = c(1, 4, 2), p = c(0.1, 0.7, 0.2), lg = FALSE)
    ## [1] 0.1008
  3. Write a function called scale01() that takes as input a NumericVector, subtracts the minimum value, and divides by the maximum value of the resulting vector. So it scales all observations to be between 0 and 1. Use Rcpp Sugar.

    R

    x <- 6:10
    scale01(x)
    ## [1] 0.00 0.25 0.50 0.75 1.00

National Science Foundation Logo American University Logo Creative Commons License
This work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License.