{ggplot2}
{ggplot2}
package.{ggplot2}
is very powerful, so I am just going to
show you the most important and basic plots that are necessary for data
analysis.
Before using the plotting functions from {ggplot2}
in a new R session, always first load the {ggplot2}
library.
library(ggplot2)
In this vignette, we’ll also make some variable transformations,
so we will need the {dplyr}
package.
library(dplyr)
I will use the mpg
dataset to demonstrate plots
data(mpg, package = "ggplot2")
glimpse(mpg)
## Rows: 234
## Columns: 11
## $ manufacturer <chr> "audi", "audi", "audi", "audi", "audi", "audi", "audi", "…
## $ model <chr> "a4", "a4", "a4", "a4", "a4", "a4", "a4", "a4 quattro", "…
## $ displ <dbl> 1.8, 1.8, 2.0, 2.0, 2.8, 2.8, 3.1, 1.8, 1.8, 2.0, 2.0, 2.…
## $ year <int> 1999, 1999, 2008, 2008, 1999, 1999, 2008, 1999, 1999, 200…
## $ cyl <int> 4, 4, 4, 4, 6, 6, 6, 4, 4, 4, 4, 6, 6, 6, 6, 6, 6, 8, 8, …
## $ trans <chr> "auto(l5)", "manual(m5)", "manual(m6)", "auto(av)", "auto…
## $ drv <chr> "f", "f", "f", "f", "f", "f", "f", "4", "4", "4", "4", "4…
## $ cty <int> 18, 21, 20, 21, 16, 18, 18, 18, 16, 20, 19, 15, 17, 17, 1…
## $ hwy <int> 29, 29, 31, 30, 26, 26, 27, 26, 25, 28, 27, 25, 25, 25, 2…
## $ fl <chr> "p", "p", "p", "p", "p", "p", "p", "p", "p", "p", "p", "p…
## $ class <chr> "compact", "compact", "compact", "compact", "compact", "c…
ggplot()
The first function you use in making a plot is always
ggplot()
.
It takes two main arguments:
data
: The data frame that holds the variables you want
to plot.mapping
: The “aesthetic map”An “aesthetic map” says what variables go on the \(x\)-axis, what variables go on the \(y\)-axis, what variables are represented by color, or size, or point shape, etc…
You place all aesthetic maps inside an aes()
function.
E.g. here, we are mapping hwy
to be on the \(x\)-axis, and different values of
drv
should be different colors.
ggplot(data = mpg, mapping = aes(x = hwy, color = drv))
This function just sets the data and the aesthetic mapping, but it won’t produce any useful plot by itself.
You add additional functions to the plot to state the type of plot you want.
Histogram:
geom_histogram()
function.ggplot(data = mpg, mapping = aes(x = hwy)) +
geom_histogram()
Make the bin lines black and the fill white, and change the number of bins.
ggplot(data = mpg, mapping = aes(x = hwy)) +
geom_histogram(bins = 10, color = "black", fill = "white")
Exercise: Load in the estate data (see here for a description) and make a histogram of price with 20 bins. Make the bins red.
Barplot:
geom_bar()
.ggplot(data = mpg, mapping = aes(x = drv)) +
geom_bar()
Exercise: What variables from the
estate
data are appropriately plotted using a bar plot?
Plot them.
Scatterplot:
geom_point()
.ggplot(data = mpg, mapping = aes(x = cty, y = hwy)) +
geom_point()
Jitter points to account for overlaying points.
geom_jitter()
instead of
geom_point()
.ggplot(data = mpg, mapping = aes(x = cty, y = hwy)) +
geom_jitter()
Add a Loess Smoother by adding geom_smooth()
.
ggplot(data = mpg, mapping = aes(x = cty, y = hwy)) +
geom_point() +
geom_smooth(se = FALSE)
Exercise: Using the estate
data,
make a scatterplot of number of bedrooms versus number of bathrooms.
Adjust for any overplotting.
Boxplot
geom_boxplot()
.ggplot(data = mpg, mapping = aes(x = class, y = hwy)) +
geom_boxplot()
Exercise: Using the estate
data,
plot sales price versus style. (hint: you need to first convert
style
to a factor using as.factor()
)
Color code a scatterplot by a categorical variable and add a legend.
ggplot(data = mpg, mapping = aes(x = cty, y = hwy, color = drv)) +
geom_jitter()
Exercise: Using the estate
data,
create a boxplot of price versus ac, color coding by pool.
Add a scale_*()
call to change the name:
ggplot(data = mpg, mapping = aes(x = cty, y = hwy, color = drv)) +
geom_jitter() +
scale_color_discrete(name = "New Name1")
ggplot(data = mpg, mapping = aes(x = cty, y = hwy, shape = drv)) +
geom_jitter() +
scale_shape_discrete(name = "New Name2")
You can facet by a categorical variable by adding a
facet_grid()
or facet_wrap()
function.
The variable to the left of the tilde (“~
”) indexes
the row facets, the variable to the right of the tilde indexes the
column facets. Using a dot (“.
”) in place of a variable
means that there will only be one row/column facet.
ggplot(data = mpg, mapping = aes(x = cty, y = hwy)) +
geom_point() +
facet_grid(. ~ drv)
ggplot(data = mpg, mapping = aes(x = cty, y = hwy)) +
geom_point() +
facet_grid(drv ~ .)
ggplot(data = mpg, mapping = aes(x = cty, y = hwy)) +
geom_point() +
facet_grid(fl ~ drv)
Exercise: Using the estate
data,
plot price versus area, faceting by ac, color coding by pool.
Add a theme_*()
function to change the theme:
ggplot(data = mpg, mapping = aes(x = cty, y = hwy)) +
geom_point() +
theme_classic()