Learning Objectives

Motivation

Formula Creation

Formula Syntax

{lme4}

  • Mixed models allow for model parameters to be treated as random variables (e.g. random slopes or intercepts).

  • You do this to control for correlation between samples. E.g.

    • Many observations collected on the same individuals.
    • Observations are a part of a hierarchy. E.g. data collected on students who belong to schools who belong to districts.
  • Lots of folks prefer treating effects as random (as opposed to fixed) when they are interested as the population as a whole more than specific values of a covariate.

  • E.g. in a study on sleep deprivation, Researchers measured reaction times for individuals after so many days of sleep deprivation.

    library(ggplot2)
    data("sleepstudy", package = "lme4")
    ggplot(sleepstudy, aes(x = Days, y = Reaction)) +
      facet_wrap(. ~ Subject) +
      geom_point() +
      geom_smooth(method = "lm", se = FALSE)

  • Each subject has different slopes and intercepts.

  • We don’t care about the slopes and intercepts for each subject, but we want to account for correlation within subjects.

  • So we might consider the slope and intercept of each subject to be random.

  • Folks don’t just include fixed effects for subject because it would use a lot more degrees of freedom (this is a cynical view).

  • To specify just random intercepts (different for each subject), do

    lm1 <- lme4::lmer(Reaction ~ Days + (1 | Subject), data = sleepstudy)
  • To specify random slopes (different slope on Day for each subject), do

    lm2 <- lme4::lmer(Reaction ~ Days + (Days | Subject), data = sleepstudy)
  • Let

    • \(y_i\) be the reaction time for observation \(i\).
    • \(x_i\) be the day for observation \(i\).
    • \(z_{ij}\) be an indicator for subject \(j\).
  • Then lm1 fit the model \[ y_i = \beta_0 + \beta_1x_i + \sum_{j=1}^{18}\alpha_{0j}z_{ij} + \epsilon_i\\ \alpha_{0j} \overset{iid}{\sim} N(0, \sigma^2)\\ \epsilon_i \overset{iid}{\sim} N(0, \tau^2) \]

  • lm2 fit the model \[ y_i = \beta_0 + \beta_1x_i + \sum_{j=1}^{18}\alpha_{0j}z_{ij} + \sum_{j=1}^{18}\alpha_{1j}x_iz_{ij} + \epsilon_i\\ (\alpha_{0j},\alpha_{1j}) \overset{iid}{\sim} N_{2}(0, \Sigma)\\ \epsilon_i \overset{iid}{\sim} N(0, \tau^2) \]

  • The idea is that we specify the random effects with the parentheses. The left of the | is the explanatory variable to model a relationship with the response, to the right of the pipe is the grouping factor that is assumed to be random.

  • E.g., in lm1 the the \(y\)-intercept was different for each subject.

  • E.g., in lm2 the the slope of Day and \(y\)-intercept was different for each subject.

  • Here are the common formulations (Table 2 from {lme4} vignette):

    Formula Meaning
    y ~ (1 | g) Random intercept with fixed mean.
    y ~ (1 | g1/g2) or y ~ (1 | g1) + (1 | g1:g2) Intercept varying among g1 and g2 within g1.
    y ~ (1 | g1) + (1 | g2) Intercept varying among g1 and g2.
    y ~ x + (x | g) Correlated random intercept and slope
    • y is the response, x is the explanatory, and g g1 g1 are grouping variables.

Working with Formulas

Using formulas in a function

  1. Include formula and data arguments.
  2. Use model.frame()
  3. Extract the response vector (model.response()) and design matrix (model.matrix()) from the model frame.
  4. Do what you want with those entities.

{rlang} Formula Manipulation

New Functions


National Science Foundation Logo American University Logo Creative Commons License
This work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License.