There are various strategies to programming that folks use.
You are mostly used to procedural programming where you list out a sequence of steps that are carried out in succession.
<- rep(NA_real_, length.out = length(mtcars))
mean_vec names(mean_vec) <- names(mtcars)
for (i in seq_along(mtcars)) {
<- mean(mtcars[[i]])
mean_vec[[i]]
} mean_vec
## mpg cyl disp hp drat wt qsec vs
## 20.0906 6.1875 230.7219 146.6875 3.5966 3.2172 17.8487 0.4375
## am gear carb
## 0.4062 3.6875 2.8125
You have also been exposed to functional programming where you compose functions with other functions.
sapply(mtcars, mean)
## mpg cyl disp hp drat wt qsec vs
## 20.0906 6.1875 230.7219 146.6875 3.5966 3.2172 17.8487 0.4375
## am gear carb
## 0.4062 3.6875 2.8125
::map_dbl(mtcars, mean) ## tidyverse version purrr
## mpg cyl disp hp drat wt qsec vs
## 20.0906 6.1875 230.7219 146.6875 3.5966 3.2172 17.8487 0.4375
## am gear carb
## 0.4062 3.6875 2.8125
Object oriented programming (OOP) is a different style of programming than you are used to, centered around objects with data and functions attached to them and their class.
R has three native object oriented programming systems (S3, S4, and RC for “reference classes”), and many other third-party packages have made their own object oriented systems ({R6}
being the most popular).
These systems are listed in increasing order of complexity, with S3 being “baby” OOP, S4 being “YA” OOP, and RC and R6 being “big boy” OOP.
If you are in extending {ggplot2}
then you will learn about another OOP system specific to {ggplot2}
: ggproto.
E.g.: To calculate the column means in S3 OOP, we would probably create a generic function for column means.
<- function(x, ...) {
col_means UseMethod("col_means")
}
and then create a specific method for the data.frame
class
<- function(x) {
col_means.data.frame sapply(x, mean)
}
Finally, we would call the generic function on a object of class data.frame
.
col_means(mtcars)
## mpg cyl disp hp drat wt qsec vs
## 20.0906 6.1875 230.7219 146.6875 3.5966 3.2172 17.8487 0.4375
## am gear carb
## 0.4062 3.6875 2.8125
E.g.: To calculate the column means in S4 OOP is very similar, just more formal:
setOldClass(Classes = "data.frame")
setGeneric(name = "col_means_s4", def = function(x) standardGeneric("col_means_s4"))
## [1] "col_means_s4"
setMethod(f = "col_means_s4",
signature = "data.frame",
definition = function(x) {
sapply(x, mean)
}
)col_means_s4(mtcars)
## mpg cyl disp hp drat wt qsec vs
## 20.0906 6.1875 230.7219 146.6875 3.5966 3.2172 17.8487 0.4375
## am gear carb
## 0.4062 3.6875 2.8125
E.g.: To calculate the column means in R6 OOP, we would probably create a new class that has the $col_means()
method that we could call.
<- R6::R6Class(classname = "datFrame", public = list(
datFrame df = NULL,
initialize = function(df) {
stopifnot(is.data.frame(df))
$df <- df
self
},col_means = function() {
sapply(self$df, mean)
}
)
)
<- datFrame$new(df = mtcars)
mtcars_df $col_means() mtcars_df
## mpg cyl disp hp drat wt qsec vs
## 20.0906 6.1875 230.7219 146.6875 3.5966 3.2172 17.8487 0.4375
## am gear carb
## 0.4062 3.6875 2.8125
Because R programmers are not OOP programmers, you should be coding mostly in S3 and S4 when using OOP. We’ll spend most of our time on these.
S3 and S4 use generic function OOP where the same function name is evaluated differently based on the class of the object.
E.g. that allows the output of summary()
to differ between doubles and factors.
<- sample(1:10, size = 100, replace = TRUE)
x summary(x)
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 1.0 3.0 6.0 5.6 8.0 10.0
<- factor(x)
y summary(y)
## 1 2 3 4 5 6 7 8 9 10
## 5 10 13 7 11 12 15 13 8 6
R6 and RC use encapsulated OOP where objects are the center of everything, holding fields (data) and methods (functions) that operate on those values. These are closest to what you would be used to if you are coming from an OOP language. Try not to use them.
E.g. in R we apply a function, like mean()
to a vector, like x
. But in an encapsulated object oriented programming system would have the function mean()
attached to a vector x
. That’s one difference between R and Python.
R
<- c(19, 22, 31)
x mean(x) ## apply mean to x
## [1] 24
Python
import numpy as np
= np.array([19, 22, 31])
x ## mean belongs to x x.mean()
## 24.0
S3 allows you to use functions like print()
and summary()
and plot()
on outputs of your functions. You can also define your own “generics.”
S4 is similar to S3 but is more formal and strict. S4 is important to understand if you want to use or contribute to Bioconductor.
Polymorphism: Use the same function name for different types of input, but have the function evaluate differently based on the types of input.
An object is a specific instance of a class. E.g. below, x
is an object of class factor
.
<- factor(c(119, 22, 31))
x class(x)
## [1] "factor"
A function for a specific class is a method.
col_means()
method for our R6 class above.print.factor()
is the print method for factor objects.A field is data that belongs to an object. In our R6 example, we had the df
and mean_vec
fields.
Classes are defined in a hierarchy. So if a method does not exist in one class it is searched for in the parent class. It is said that the child class inherits the behavior the parent class.
E.g. tibbles
inherit the behavior of data frames.
class(tibble::tibble(a = 1))
## [1] "tbl_df" "tbl" "data.frame"
The order in which classes are searched for methods is called method dispatch.
{sloop}
The {sloop}
package is an interface for exploring OOP systems.
sloop::otype()
allows you to see if the system is S3, S4, R6, etc…
::otype(mtcars) ## Most R stuff is in S3. sloop
## [1] "S3"
data("USCounties", package = "Matrix") ## Efficient matrix computations package
::otype(USCounties) sloop
## [1] "S4"
<- progress::progress_bar$new() ## progress bars for for-loops
pb ::otype(pb) sloop
## [1] "R6"
S (the precursor to R) was developed first without an OOP system. So their only objects were “base types”. But these don’t have basic OOP functionality like polymorphism, inheritance, etc..
R users often call base types “objects” even though they aren’t OOP objects.
<- 1:10
x ::otype(x) sloop
## [1] "base"
In R, an OO object has a class
attribute and a base type does not.
<- 1:10
x attr(x, "class")
## NULL
<- factor(x)
y attr(y, "class")
## [1] "factor"
class()
will return the result of typeof()
if an object has no class
attribute, this is called its implicit class.
class(x)
## [1] "integer"
typeof(x)
## [1] "integer"
Every object, including OO objects, have a base type that can be seen by typeof()
.
typeof(y)
## [1] "integer"
typeof(mtcars)
## [1] "list"
typeof(USCounties)
## [1] "S4"
typeof(pb)
## [1] "environment"
There are 25 base types. From Hadley’s list, the important ones are:
Vector: NULL
, logical
, integer
, double
, character
, list
typeof(NULL)
## [1] "NULL"
typeof(TRUE)
## [1] "logical"
typeof(1L)
## [1] "integer"
typeof(1)
## [1] "double"
typeof("1")
## [1] "character"
typeof(list(1))
## [1] "list"
Functions: closure
(regular R functions), special
(internal R functions), builtin
(“primitive” functions in the base namespace that were built using C)
typeof(mean)
## [1] "closure"
typeof(`if`)
## [1] "special"
typeof(sum)
## [1] "builtin"
Environments: environment
typeof(rlang::global_env())
## [1] "environment"
S4 types: S4
typeof(USCounties)
## [1] "S4"
Language types (used in metaprogramming): symbol
, language
, pairlist
, and expression
.
typeof(quote(a))
## [1] "symbol"
typeof(quote(a + 1))
## [1] "language"
typeof(formals(mean))
## [1] "pairlist"
typeof(expression(a))
## [1] "expression"
Exercise: What’s the (i) type, (ii) OOP system, and (iii) class of the following objects.
<- lubridate::make_date(year = c(1990, 2022), month = c(1, 2), day = c(30, 22))
x <- matrix(NA_real_, nrow = 10, ncol = 2)
y <- tibble::tibble(a = 1:3)
z <- lm(mpg ~ wt, data = mtcars)
aa <- t.test(mpg ~ am, data = mtcars)
bb <- rTensor::as.tensor(array(1:30, dim = c(2, 3, 5))) cc
Exercise: Why do we get different results from summary()
with the following code?
<- lm(mpg ~ wt, data = mtcars)
a <- t.test(mpg ~ am, data = mtcars)
b summary(a)
##
## Call:
## lm(formula = mpg ~ wt, data = mtcars)
##
## Residuals:
## Min 1Q Median 3Q Max
## -4.543 -2.365 -0.125 1.410 6.873
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 37.285 1.878 19.86 < 2e-16
## wt -5.344 0.559 -9.56 1.3e-10
##
## Residual standard error: 3.05 on 30 degrees of freedom
## Multiple R-squared: 0.753, Adjusted R-squared: 0.745
## F-statistic: 91.4 on 1 and 30 DF, p-value: 1.29e-10
summary(b)
## Length Class Mode
## statistic 1 -none- numeric
## parameter 1 -none- numeric
## p.value 1 -none- numeric
## conf.int 2 -none- numeric
## estimate 2 -none- numeric
## null.value 1 -none- numeric
## stderr 1 -none- numeric
## alternative 1 -none- character
## method 1 -none- character
## data.name 1 -none- character
Exercise: From the previous exercise, if we remove the class
from a
and b
, what happens to the summary()
call? What does this tell you about the summary()
methods of the htest
and lm
classes?