Learning Objectives

Subsetting an Atomic Vector

  1. Integer Subsetting:
    • Put integers in brackets and it will extract those elements. R starts counting at 1.

      x[1]
      ## [1] 8
      x[c(1, 3)]
      ## [1]  8 33
      iset <- c(1, 3)
      x[iset]
      ## [1]  8 33
    • This can be used for sorting

      order(x)
      ## [1] 2 1 4 3
      x[order(x)]
      ## [1]  1.2  8.0 14.0 33.0
    • You can use duplicate integers to extract elements more than once.

      x[c(2, 2, 2)]
      ## [1] 1.2 1.2 1.2
  2. Negative Integer Subsetting:
    • Putting negative integers in instead will return all elements except the negative elements.

      x[-1]
      ## [1]  1.2 33.0 14.0
      x[c(-1, -3)]
      ## [1]  1.2 14.0
      x[-c(1, 3)]
      ## [1]  1.2 14.0
  3. Logical Vector Subsetting:
    • Wherever there is a TRUE will return the element.

      x[c(TRUE, FALSE, TRUE, FALSE)]
      ## [1]  8 33
  4. No Subsetting:
    • Empty brackets will return the original object.

      x[]
      ## [1]  8.0  1.2 33.0 14.0
  5. Zero Subsetting:
    • Using 0 in a bracket will return a zero-length vector.

      x[0]
      ## numeric(0)
  6. Names Subsetting:
    • If a vector has names, then you can subset using those names in quotes.

      names(x) <- c("a", "b", "c", "d")
      x["a"]
      ## a 
      ## 8
      x[c("a", "c")]
      ##  a  c 
      ##  8 33
      x[c("a", "a")]
      ## a a 
      ## 8 8
    • If you know what names you want to remove, use setdiff().

      setdiff(names(x), "a")
      ## [1] "b" "c" "d"
      x[setdiff(names(x), "a")]
      ##    b    c    d 
      ##  1.2 33.0 14.0

Subsetting Matrices

List subsetting

Data Frame Subsetting

Hadley’s Advanced R Exercises

  1. Fix each of the following common data frame subsetting errors:

    mtcars[mtcars$cyl = 4, ]
    mtcars[-1:4, ]
    mtcars[mtcars$cyl <= 5]
    mtcars[mtcars$cyl == 4 | 6, ]
  2. Why does the following code yield five missing values? (Hint: why is it different from x[NA_real_]?)

    x <- 1:5
    x[NA]
    ## [1] NA NA NA NA NA
  3. What does upper.tri() return? How does subsetting a matrix with it work?

    x <- outer(1:5, 1:5, FUN = "*")
    x[upper.tri(x)]
    ##  [1]  2  3  6  4  8 12  5 10 15 20
  4. Why does mtcars[1:20] return an error? How does it differ from the similar mtcars[1:20, ]?

  5. An lm object is a list-like object. Given a linear model, e.g., mod <- lm(mpg ~ wt, data = mtcars), extract the residual degrees of freedom. Then extract the R squared from the model summary (summary(mod)).

Subassignment

Applications

Lookup Tables

  • Use a vector of values, and subset using those values from a named list.

    x <- c("m", "f", "u", "f", "f", "m", "m")
    lookup <- c(m = "Male", f = "Female", u = NA)
    lookup[x]
    ##        m        f        u        f        f        m        m 
    ##   "Male" "Female"       NA "Female" "Female"   "Male"   "Male"

Bootstrapping

  • Resampling approaches are often used in Statistics to obtain standard errors, confidence intervals, and p-values.

  • Resampling: You sample each observation with replacement.

  • We typically resample entire rows of data frames (though not always, we’ll have a homework about this).

    ## Create fake data
    df <- data.frame(x = 1:10)
    df$y <- df$x * 2 + rnorm(nrow(df))
    
    ## obtain indices of rows to sample
    ind <- sample(seq_len(nrow(df)), replace = TRUE)
    
    df_samp <- df[ind, ]

Exercises

These are just meant to buff up your Base R skills. Consider the data from the {Sleuth3} package that contains information on sex and salary at a bank.

library(Sleuth3)
data("case0102")
sal <- case0102
  1. What is the salary of the person in the 51st row? Use two different subsetting strategies to get this.

  2. What is the mean salary of Male’s?

  3. How many Females are in the data?

  4. How many Females make over $6000?


National Science Foundation Logo American University Logo Creative Commons License
This work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License.