A random variable is a variable whose value is a numerical outcome of a random process. We denote random variables with letters, like \(Y\).
In practice, this “random process” is sampling a unit from a population of units and observing that unit’s value of the variable. E.g., we sample birth weights of babies born in the United States, then birth weight a random variable.
The mean of a random variable is average value of a very large sample of individuals. The notation for the mean of a random variable is \(E[Y]\).
Properties:
The variance of a random variable is the average squared deviation from the mean of this random variable. It measures how spread out the values are in a population. The notation for the variance of a random variable is \(var(Y)\). Specifically \[ var(Y) = E[(Y - E[Y])^2] \]
Properties:
The standard deviation of a random variable is the square root of its variance. \[ sd(Y) = \sqrt{var(Y)} \]
The distribution of a random variable is the possible values of a random variable and how often it takes those values.
A density describes the distribution of a quantitative variable. You can think of it as approximating a histogram. It is a curve where
The density of birth weights in America:
The distribution of many variables in Statistics approximate the normal distribution.
Normal densities with different means.
Normal densities with different standard deviations
Density Function (height of curve, NOT probability of a value).
dnorm(x = 2, mean = 1, sd = 1)
## [1] 0.242
Random Generation (generate samples from a given normal distribution).
samp <- rnorm(n = 1000, mean = 1, sd = 1)
head(samp)
## [1] 0.03807 0.70747 1.25879 -0.15213 1.19578 1.03012
Cumulative Distribution Function (probability of being less than or equal to some value).
pnorm(q = 2, mean = 1, sd = 1)
## [1] 0.8413
Quantile function (find value that has a given probability of being less than or equal to it).
qnorm(p = 0.8413, mean = 1, sd = 1)
## [1] 2
Exercise: Use rnorm()
to generate
10,000 random draws from a normal distribution with mean 5 and standard
deviation 2. What proportion are less than 3? Can you think up a way to
approximate this proportion using a different function?
Exercise: In Hong Kong, human male height is approximately normally distributed with mean 171.5 cm and standard deviation 5.5 cm. What proportion of the Hong Kong population is between 170 cm and 180 cm?
A property of the normal distribution is that if \(X \sim N(\mu, \sigma^2)\) and \(Z = (X - \mu) / \sigma\), then \(Z \sim N(0, 1)\).
Exercise: Use rnorm()
and
qqplot()
to demonstrate this property. That is, simulate
1000 values of \(X\) with some mean
different than 0 and some variance different than 1. Then transform
those \(X\) values to \(Z\). Then simulate some other variable
\(W\) from \(N(0, 1)\). Use qqplot()
to
show that \(W\) and \(Z\) follow the same distribution.
The \(t\)-distribution shows up a lot in Statistics.
\(t\)-distributions with different degrees of freedom:
Density, distribution, quantile, and random generation functions also exist for the \(t\)-distribution.
dt()
pt()
qt()
rt()
The covariance between two random variables, \(X\) and \(Y\), is a measure of the strength of the linear association between these variables. It is defined as \[ cov(X, Y) = E[(X - E[X])(Y-E[Y])] \]
Covariance is related to correlation by \[ cor(X, Y) = \frac{cov(X, Y)}{sd(X)sd(Y)} \]