1. What is ggplot2?

“ggplot” (technically “ggplot2”) is an R package* that facilitates elegant design of graphics. Even if you are brand new to R, you might have heard about “ggplot”–in fact, for some people it might be the main reason they want to learn R.

ggplot2 is much more ambitious and in some ways much more challenging than most other packages because it creates a new “grammar” of graphics, and it requires you to learn some new syntax. But with practice, this syntax will start to make sense, and it can help you make excellent quality figures. In addition, there are now many extentions packages that allow you to do even more with the ggplot grammar (e.g., make maps with ggmap or display networks with ggraph, etc.–see a gallery of extensions here)

ggplot2 is part of the “tidyverse” suite of packages. There is a separate module on other major aspects of tidyverse, such as tidyr and dplyr.

Super Useful References:

ggplot2 website: https://r-graph-gallery.com/ggplot2-package.html

The ggplot2 book (free online version): https://ggplot2-book.org/index.html

The online ‘tidyverse’ book: https://r4ds.had.co.nz/data-visualisation.html


*** What is a “package” in R? ***

R packages are essentially a set of custom functions that R users have created and compiled, along with help files and vignettes, etc. Many of them are archived at CRAN–The Comprehensive R Archive Network–and available to install from the R console using the function install.packages(). There are still many other packages that users have not archived but are available from other sources, such as github. “Installing” the package means that the package is downloaded onto your computer. When you are ready to use them, you will have to load the package onto the environment by running the function library() or require().


2. Installing and loading the package

One can install each package separately, but you can also just install all “tidyverse” packages simply by running this command:

install.packages("ggplot2")

Note that this simply downloads the packages onto your computer. You only have to do this once on a given computer.

You now have the package downloaded on your computer, but to actually use it, you have to load the package. We can load the entire tidyverse package (or, if you prefer, you can just load the tidyr package).

library(ggplot2)

3. The basics of the ggplot2 syntax

3.1. Components of a graphic

ggplot2 uses what is called layered grammar of graphics

We can break down the layers of any graphic to different components (see this pdf for full explanation):

  1. The data

  2. Mapping: how the variables in the data are converted to “aesthetics” of the figure

  3. The geom, or geometric object: the type of visual object you want to make

  4. Scaling: i.e., how different values of variables are represented

  5. Faceting: i.e., representing subsets of data as subplots

3.2. The basic workflow of ggplot2

  • First specify the data using the ggplot() function.

  • Add “aesthetic mapping” (i.e., specify the visual parameters of the graphic) using aes(). This can be set within the ggplot() function if you want the aesthetic to apply as default to all layers you are going to define, or within the geom_ function if you want different layers to have different aesthetics.

  • Define specific plot components using additional geom_ functions (such as geom_points()). Note that you literally add these components using +

  • Layer on any other components with additional geom_ or stat_ functions

    • These layers can include summary stats (e.g., means, medians, counts, etc… or any other stats that you can calculate via custom functions).
  • Define scaling of variables if needed (e.g., color palette)

  • Make adjustments via scales, axes, legends, etc.


4. Building a simple scatterplot, step-by-step

Scatterplots are used to display the relationships between two continuous variables.

In the “basics of plots” module, we created a scatterplot of sepal lengths and widths from the iris dataset that looked like this:

colorset=rainbow(3) #create a palette of 3 colors
pt.cols=colorset[as.numeric(iris$Species)] #This is now a vector of colors for each point
plot(Sepal.Width~Sepal.Length, data=iris, xlab="Sepal Length", ylab="Sepal Width", las=1, pch=19, col=pt.cols)

Here, we will go through step-by-step on how to recreate this figure, but in ggplot2

step 1: Define the data and aesthetics.

This will only create a blank plot

ggplot(data=iris, mapping=aes(x=Sepal.Length, y=Sepal.Width))

step 2: add scatter plot using geom_point()

ggplot(iris, aes(x=Sepal.Length, y=Sepal.Width)) +
  geom_point()

step 3: Change point size by defining additional parameters within the geom_ function.

ggplot(iris, aes(x=Sepal.Length, y=Sepal.Width)) +
  geom_point(size=2)

step 4: Color the points by species by defining it in the aesthetics (aes() argument)

ggplot(iris, aes(x=Sepal.Length, y=Sepal.Width, color=Species)) +
  geom_point(size=2)

step 5: Define the color scaling

In the base R example plot above, we used a rainbow(3) palette to generate 3 color values. We can do that here using a scale_color_discrete() function. Note: there are lots of different scale_color_ functions, and it might take you a while to get familiar with them.

ggplot(iris, aes(x=Sepal.Length, y=Sepal.Width, color=Species)) +
  geom_point(size=2) +
  scale_color_discrete(type=rainbow(3))

step 6: set up a simple “theme”

ggplot(iris, aes(x=Sepal.Length, y=Sepal.Width, color=Species)) +
  geom_point(size=2) +
  scale_color_discrete(type=rainbow(3)) +
  theme_bw() 

step 7: additional adjustments in theme to remove grid lines

ggplot(iris, aes(x=Sepal.Length, y=Sepal.Width, color=Species)) +
  geom_point(size=2) +
  scale_color_discrete(type=rainbow(3)) +
  theme_bw() +
  theme(panel.grid.major = element_blank(), panel.grid.minor = element_blank())

step 8: Edit the x- and y-axis labels.

Right now, the labels say “Sepal.Length” and “Sepal.Width”. Let’s change the periods into spaces:

ggplot(iris, aes(x=Sepal.Length, y=Sepal.Width, color=Species)) +
  geom_point(size=2) +
  scale_color_discrete(type=rainbow(3)) +
  theme_bw() +
  theme(panel.grid.major = element_blank(), panel.grid.minor = element_blank()) +
  xlab("Sepal Length") +
  ylab("Sepal Width")

step 9: Remove legend

… ok, most of the time, you probably should have a legend. But, it will be helpful for you to learn how to play around with it. There are several ways to do this, but one way is to edit the legend.position argument in the theme() function.

ggplot(iris, aes(x=Sepal.Length, y=Sepal.Width, color=Species)) +
  geom_point(size=2) +
  scale_color_discrete(type=rainbow(3)) +
  theme_bw() +
  theme(panel.grid.major = element_blank(), panel.grid.minor = element_blank(), legend.position = "none") +
  xlab("Sepal Length") +
  ylab("Sepal Width")

Alternative syntax:

Just to note: You can also move the aesthetic mapping to the geom_point() function rather than the ggplot() function. It doesn’t make any difference in this example because you have only one geom function. But it might make a difference when you are doing more complex visualizations.

ggplot(iris) +
  geom_point(aes(x=Sepal.Length, y=Sepal.Width, color=Species), size=2) +
  scale_color_discrete(type=rainbow(3)) +
  theme_bw() +
  theme(panel.grid.major = element_blank(), panel.grid.minor = element_blank(), legend.position = "none") +
  xlab("Sepal Length") +
  ylab("Sepal Width")


5. Saving ggplot outputs

5.1. The ggsave() function

You can export the last plot you made using the function ggsave(). Enter the file name you want to save it as, including the file extension.

ggsave("scatterplot.png")

You will find the file in your Rproject folder.

5.2. Adjusting the file dimensions

You might find that you want to adjust the width and height of the plot. You can set this in inches or whatever other unit (see ?ggsave() for details).

ggsave("scatterplot.png", width=8, height=4, units="in")

5.3. Best practice: save the plot as an object, and then save it.

A better way is to save the plot as an object, and then save it. Here, we will assign the plot with the legend as p and then save it.

p=ggplot(iris, aes(x=Sepal.Length, y=Sepal.Width, color=Species)) +
  geom_point(size=2) +
  scale_color_discrete(type=rainbow(3)) +
  theme_bw() +
  theme(panel.grid.major = element_blank(), panel.grid.minor = element_blank()) +
  xlab("Sepal Length") +
  ylab("Sepal Width")

#display the plot
p

#save the plot
ggsave("scatterplot_w_legend.pdf", width=8, height=4, units="in")

6. Some other aesthetic options

Here is a vignette for other aesthetic specifications: https://ggplot2.tidyverse.org/articles/ggplot2-specs.html

Here is the “themes” section in the ggplot2 book: https://ggplot2-book.org/polishing.html