Learning Objectives

What is simple linear regression?

  1. Equation for line: \(y = \beta_0 + \beta_1 x\)

     

  2. Have cloud of points

  3. Fit line to cloud of points

  4. Infer slope from fitted line

     

  5. Inference:

    1. Test if slopes are 0
    2. Confidence intervals on slopes.
    3. Interpret sign/magnitude of slopes.

What is multiple linear regression?

  1. Equation for a 2-d plane: \[ y = \beta_0 + \beta_1 x_1 + \beta_2 x_2 \]

    • When \(x_1\) is fixed (not changing), \(y = \beta_0 + \beta_1 x_1 + \beta_2 x_2\) is the equation for a line with slope \(\beta_2\) and \(y\)-intercept \(\beta_0 + \beta_1 x_1\).

    • When \(x_2\) is fixed (not changing), \(y = \beta_0 + \beta_1 x_1 + \beta_2 x_2\) is the equation for a line with slope \(\beta_1\) and \(y\)-intercept \(\beta_0 + \beta_2 x_2\).

    • So a plane can be interpreted as a line when you fix all predictors but one.

  2. Have a cloud of points:

  3. Fit plane to cloud of points:

  4. Infer slopes from fitted plane.

  5. Inference:

    1. Test if slopes are 0
    2. Confidence intervals on slopes.
    3. Interpret sign/magnitude of slopes.

Steps of a Regression Analysis

  • The above procedures assume that:

    1. The cloud of points roughly follows a line (or plane).
    2. All predictors (the \(x\)’s) are associated with the response (the \(y\)). We might have many predictors and we need to choose which ones to include.
  • We typically need to transform the data or try out a few models.

  • Steps:

What can you use it for?

  • Detecting trends.
    • Easy to see trends if you have two variables. Harder if you have more. Need something more sophisticated.
    • Linear regression allows us to say “folks that have bigger x have, on average, bigger y”.
  • Control for other variables.
    • “Folks that have the same z but bigger x have, on average, bigger y.”
  • Prediction
    • Most machine learning tasks in the read world are “small data”.
    • The fancy ML methods have many parameters that require lots of data to estimate.
    • Linear regression is often the best you can do in small data tasks.

Generality

  • Many statistical procedures are special cases of (or approximations to) linear regression.

  • Understanding linear regression really well will give you a deeper understanding of statistics in general.

  • Procedures that are special cases of linear regression, or can be well approximated by linear regression:

    • One/two sample \(t\)-test.
    • ANOVA
    • Correlation tests
    • Rank tests
    • Chi-square tests
    • Many others