Learning Objectives

Motivation

{methods} Package

S4 Classes

Inheritance

  • Inheritance is really informal in S3. class is a vector and method dispatch just sequentially goes down the vector. But there is no guarantee that a subclass will have the correct structure of a superclass, or that the class vector is consistent across all members of a class.

  • In S4, use the contains argument in setClass() to explicitly define the superclass that your class is inheriting from.

  • E.g. we can create a class called Employee that inherits from Person.

    setClass(
      Class = "Employee", 
      contains = "Person",
      slots = c(
        boss = "character",
        years = "numeric"
      ),
      prototype = list(
        boss = NA_character_,
        years = NA_real_
      )
    )
  • Now, we can create an Employee object with all of the slots of Person, along with the new slots from Employee.

    david <- new(Class = "Employee", 
                 name = "David Gerard", 
                 age = 34, 
                 boss = "Self", 
                 years = 4)
    david
    ## An object of class "Employee"
    ## Slot "boss":
    ## [1] "Self"
    ## 
    ## Slot "years":
    ## [1] 4
    ## 
    ## Slot "name":
    ## [1] "David Gerard"
    ## 
    ## Slot "age":
    ## [1] 34

Helper Function

  • You should not expect users to use new(). Instead you should create a helper function for them. E.g.

    Employee <- function(name, 
                         age = NA_real_, 
                         boss = NA_character_, 
                         years = NA_real_) {
      age <- as.double(age)
      boss <- as.character(boss)
      years <- as.double(years)
    
      return(new("Employee", name = name, age = age, boss = boss, years = years))
    }
    david <- Employee(name = "David Gerard", age = 34, boss = "Self", years = 4)
    david
    ## An object of class "Employee"
    ## Slot "boss":
    ## [1] "Self"
    ## 
    ## Slot "years":
    ## [1] 4
    ## 
    ## Slot "name":
    ## [1] "David Gerard"
    ## 
    ## Slot "age":
    ## [1] 34
  • Try to make sensible defaults, and try to type-cast as much as you can.

Validator Functions

  • Suppose we want Person to be a vector-type object. So we should constrain the age and name slots to be the same size.

  • This is not currently a constraint

    profs <- new(Class = "Person", name = c("David Gerard", "Zois Boukouvalas"), age = 34)
    profs
    ## An object of class "Person"
    ## Slot "name":
    ## [1] "David Gerard"     "Zois Boukouvalas"
    ## 
    ## Slot "age":
    ## [1] 34
  • You define S4 validators with setValidity().

    • Class is the name of the class to create a validator for.
    • method is the function that should be run on the class.
      • It returns TRUE if all validity checks pass.
      • It returns a character string describing the issues if some of the validity checks fail.
    setValidity(
      Class = "Person", 
      method = function(object) {
        if(length(object@name) != length(object@age)) {
          "@name and @age must be same length"
        } else {
          return(TRUE)
        }
      }
    )
    ## Class "Person" [in ".GlobalEnv"]
    ## 
    ## Slots:
    ##                           
    ## Name:       name       age
    ## Class: character   numeric
    ## 
    ## Known Subclasses: "Employee"
  • Now it is impossible to make @name and @age slots of different lengths

        profs <- new(Class = "Person", name = c("David Gerard", "Zois Boukouvalas"), age = 34)
    ## Error in validObject(.Object): invalid class "Person" object: @name and @age must be same length
  • Note: Validity checks only run automatically with new(), so you can edit the slot to have invalid inputs.

    profs <- new(Class = "Person",
                 name = c("David Gerard", "Zois Boukouvalas"),
                 age = c(34, NA_real_))
    profs@age <- 1:20
  • But you can manually check validity with validObject().

    validObject(profs)
    ## Error in validObject(profs): invalid class "Person" object: @name and @age must be same length
    profs <- new(Class = "Person", 
                 name = c("David Gerard", "Zois Boukouvalas"), 
                 age = c(34, NA_real_))
    validObject(profs)
    ## [1] TRUE
  • Exercise: Create a factor2 S4 class that tries to implement the factor class for S4.

    • Define the class (with appropriate slots and prototypes)
    • Create a helper function. It should be able to take a character vector as input and define the
    • Create a validator. It should check that the the maximum integer is less than or equal to the number of levels.

S4 Generics and Methods

Getting Help

show() method

  • You should always define a show() method. This is used for printing S4 objects.

  • You need to use the same arguments of a generic that already exists. E.g. show needs object

    args(getGeneric("show"))
    ## function (object) 
    ## NULL
  • You usually use the cat() function for printing.

    setMethod(
      f = "show", 
      signature = "Person", 
      definition = function(object) {
        cat(is(object)[[1]], "\n",
            "  Name: ", object@name, "\n",
            "   Age: ", object@age, "\n",
            sep = "")
      }
    )
  • Now we get better printing for Person’s and Employee’s.

    david
    ## Employee
    ##   Name: David Gerard
    ##    Age: 21

Accessors

Coercion

Converting from S3 so S4

Method Dispatch

Multiple inheritance

  • It’s possible to create a class that inherits from two or more classes (i.e. have two superclasses). This is called multiple inheritance.

  • Issue: If the subclass does not not have a method, but both superclasses have methods, then R will choose the method based on alphabetical order. This is dangerous, so we won’t cover it.

Multiple dispatch

  • A method can be determined by the classes of multiple objects in the generic call. This is called multiple dispatch.

  • E.g. how should c(x, y) behave if

    1. x and y are both factors?
    2. x is a factor and y is a character?
    3. x and y are both characters?
  • We will demonstrate this using the Rational class defined next.

Rational Vector Example

Methods for Group Generic Functions

Documenting S4 objects

Some Example Uses of S4 by Others

{Matrix}

  • The {Matrix} package provides classes for many types of matrices:

    • sparse (mostly zeros)
    • triangular (either the upper or lower triangle elements are all 0).]
    • symmetric (upper and lower triangle elements are equal).
    library(Matrix)
  • Many matrix operations can be more efficiently executed given knowledge of these structures.

  • The USCounties matrix is a 3111 by 3111 matrix where an element measures a spatial relationship between counties. It is non-zero only if counties are contiguous.

    data("USCounties", package = "Matrix")
  • This is an S4 object of class dsCMatrix, used for handling sparse matrices (only 0.0094% of the values in this matrix are non-zero).

    sloop::otype(USCounties)
    ## [1] "S4"
    class(USCounties)
    ## [1] "dsCMatrix"
    ## attr(,"package")
    ## [1] "Matrix"
  • Almost every matrix operation you can imagine has been optimized for this sparsity:

    nrow(sloop::s4_methods_class("dsCMatrix"))
    ## [1] 383
  • This makes things like matrix multiplication much faster:

    densemat <- as.matrix(USCounties)
    ones <- rep(1, length.out = ncol(densemat))
    microbenchmark::microbenchmark(
      densemat %*% ones,
      USCounties %*% ones
    )
    ## Unit: microseconds
    ##                 expr     min      lq    mean  median      uq   max neval
    ##    densemat %*% ones 13477.0 18498.5 19398.3 18974.8 19894.7 28210   100
    ##  USCounties %*% ones   114.8   221.4   327.3   279.9   413.7  1082   100
    • This application doesn’t really demonstrate its abilities. It can be even more orders of magnitude faster.

{stats4}

  • One application of S4 is in the {stats4} package for likelihood inference.

  • The likelihood is the kind of like the probability of the data that we observed given some parameter values. (not exactly for continuous data)

  • E.g., let’s say we observed heights (in inches)

    x <- c(72, 66, 74, 64, 70)

    and let’s assume that these data were generated from a normal distribution.

  • Then, given any two parameters of the mean (mu) and standard deviation (sigma), the likelihood of these parameters at each observation would be from dnorm()

    dnorm(x = x, mean = mu, sd = sigma)
  • If the individuals that these heights are from are independent (e.g. they aren’t family members), then the likelihood of these parameters given all of the data would be

    prod(dnorm(x = x, mean = mu, sd = sigma))
  • We can calculate this likelihood for a few values of mu and sigma

    mu <- 72
    sigma <- 3
    prod(dnorm(x = x, mean = mu, sd = sigma))
    ## [1] 1.031e-07
    mu <- 70
    sigma <- 4
    prod(dnorm(x = x, mean = mu, sd = sigma))
    ## [1] 1.04e-06
  • Typically, the parameters are not known (mu and sigma are unknown) and our goal is to estimate them when we only see x.

  • One way to do this is to find the mu and sigma that maximize this likelihood. The resulting estimators are called the maximum likelihood estimators (MLE’s). This is one of the most common methods for estimating parameters from a statistical model.

  • Intuition: the MLE’s are the parameters that make our observed data as probable to have observed as possible.

  • Typically, the likelihood produces very small numbers, so folks maximize the log-likelihood, which results in the same MLE’s since \(\log()\) is a monotone increasing function.

    ll <- sum(dnorm(x = x, mean = mu, sd = sigma, log = TRUE)) ## log likelihood
    like <- prod(dnorm(x = x, mean = mu, sd = sigma)) ## likelihood
    
    ## they are the same
    log(like)
    ## [1] -13.78
    ll
    ## [1] -13.78
  • The {stats4} package provides a bunch of S4 objects and methods for dealing with MLE’s.

    library(stats4)
  • To use the {stats4} package, first create a function that takes as input your parameters nothing else. It assumes that all data comes from the parent environment. It should output the negative log-likelihood.

    #' @param mu The mean
    #' @param sigma The standard deviation
    neg_ll <- function(mu, sigma) {
      return(-sum(dnorm(x = x, mean = mu, sd = sigma, log = TRUE)))
    }
  • Then, use the mle() function. Include some starting values, and place some limits on constrained parameters.

    mlout <- stats4::mle(minuslogl = neg_ll, 
                         start = c(mu = 72, sigma = 1), 
                         lower = c(mu = -Inf, sigma = 0),
                         upper = c(mu = Inf, sigma = Inf))
  • This object is S4.

    sloop::otype(mlout)
    ## [1] "S4"
    class(mlout)
    ## [1] "mle"
    ## attr(,"package")
    ## [1] "stats4"
  • What generics are available for the objects of type mle?

    sloop::s4_methods_class("mle")
    ## # A tibble: 9 × 4
    ##   generic class visible source
    ##   <chr>   <chr> <lgl>   <chr> 
    ## 1 coef    mle   TRUE    stats4
    ## 2 confint mle   TRUE    stats4
    ## 3 logLik  mle   TRUE    stats4
    ## 4 nobs    mle   TRUE    stats4
    ## 5 profile mle   TRUE    stats4
    ## 6 show    mle   TRUE    stats4
    ## 7 summary mle   TRUE    stats4
    ## 8 update  mle   TRUE    stats4
    ## 9 vcov    mle   TRUE    stats4
  • coef() will get you the parameter estimates

    coef(mlout)
    ##     mu  sigma 
    ## 69.200  3.709
  • vcov() will you you the estimated variance/covariance matrix of the MLE’s (for their sampling distribution).

    vcov(mlout)
    ##              mu     sigma
    ## mu    2.752e+00 1.085e-07
    ## sigma 1.085e-07 1.376e+00
  • So the standard errors are from

    sqrt(diag(vcov(mlout)))
    ##    mu sigma 
    ## 1.659 1.173
  • confint() will get you confidence intervals

    confint(mlout)
    ## Profiling...
    ##        2.5 % 97.5 %
    ## mu    65.210 73.190
    ## sigma  2.219  8.083
  • In the normal case, these estimates are the same as the sample mean and the sample variance (not adjusting for degrees of freedom).

    mean(x)
    ## [1] 69.2
    sqrt(var(x) * (length(x) - 1) / length(x))
    ## [1] 3.709
  • But the MLE is general to many more complicated applications.

  • Exercise: The number of heads in \(n\) coin flips is binomially distributed with size \(n\) and success probability \(p\), where \(p\) is the probability that a single flip turns up heads. The likelihood, given that x heads turned up, is

    dbinom(x = x, size = n, prob = p)

    When \(x = 11\) and \(n = 31\), calculate the MLE of \(p\).

New Functions


National Science Foundation Logo American University Logo Creative Commons License
This work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License.