1. What is ggplot2?

“ggplot” (technically “ggplot2”) is an R package* that facilitates elegant design of graphics. Even if you are brand new to R, you might have heard about “ggplot”–in fact, for some people it might be the main reason they want to learn R.

ggplot2 is much more ambitious and in some ways much more challenging than most other packages because it creates a new “grammar” of graphics, and it requires you to learn some new syntax. But with practice, this syntax will start to make sense, and it can help you make excellent quality figures. In addition, there are now many extentions packages that allow you to do even more with the ggplot grammar (e.g., make maps with ggmap or display networks with ggraph, etc.–see a gallery of extensions here)

ggplot2 is part of the “tidyverse” suite of packages. There is a separate module on other major aspects of tidyverse, such as tidyr and dplyr.

Super Useful References:

ggplot2 website: https://r-graph-gallery.com/ggplot2-package.html

The ggplot2 book (free online version): https://ggplot2-book.org/index.html

The online ‘tidyverse’ book: https://r4ds.had.co.nz/data-visualisation.html


*** What is a “package” in R? ***

R packages are essentially a set of custom functions that R users have created and compiled, along with help files and vignettes, etc. Many of them are archived at CRAN–The Comprehensive R Archive Network–and available to install from the R console using the function install.packages(). There are still many other packages that users have not archived but are available from other sources, such as github. “Installing” the package means that the package is downloaded onto your computer. When you are ready to use them, you will have to load the package onto the environment by running the function library() or require().


2. Installing and loading the package

One can install each package separately, but you can also just install all “tidyverse” packages simply by running this command:

install.packages("ggplot2")

Note that this simply downloads the packages onto your computer. You only have to do this once on a given computer.

You now have the package downloaded on your computer, but to actually use it, you have to load the package. We can load the entire tidyverse package (or, if you prefer, you can just load the tidyr package).

library(ggplot2)

3. The basics of the ggplot2 syntax

3.1. Components of a graphic

ggplot2 uses what is called layered grammar of graphics

We can break down the layers of any graphic to different components (see this pdf for full explanation):

  1. The data

  2. Mapping: how the variables in the data are converted to “aesthetics” of the figure

  3. The geom, or geometric object: the type of visual object you want to make

  4. Statistical transformations: e.g., fit lines

  5. Scaling: i.e., how different values of variables are represented

  6. Non-data elements: e.g., grid lines, axis labels, title, etc.

  7. Faceting: i.e., representing subsets of data as subplots

3.2. The basic workflow of ggplot2

  • First specify the data using the ggplot() function.

  • Add “aesthetic mapping” (i.e., specify the visual parameters of the graphic) using aes(). This can be set within the ggplot() function if you want the aesthetic to apply as default to all layers you are going to define, or within the geom_ function if you want different layers to have different aesthetics.

  • Define specific plot components using additional geom_ functions (such as geom_points()). Note that you literally add these components using +

  • Layer on any other components with additional geom_ or stat_ functions

    • These layers can include summary stats (e.g., means, medians, counts, etc… or any other stats that you can calculate via custom functions).
  • Define scaling of variables if needed (e.g., color palette)

  • Make adjustments via scales, axes, legends, etc.


4. Building a simple scatterplot, step-by-step

Scatterplots are used to display the relationships between two continuous variables.

In the “basics of plots” module, we created a scatterplot of sepal lengths and widths from the iris dataset that looked like this:

colorset=rainbow(3) #create a palette of 3 colors
pt.cols=colorset[as.numeric(iris$Species)] #This is now a vector of colors for each point
plot(Sepal.Width~Sepal.Length, data=iris, xlab="Sepal Length", ylab="Sepal Width", las=1, pch=19, col=pt.cols)

Here, we will go through step-by-step on how to recreate this figure, but in ggplot2

step 1: Define the data and aesthetics.

This will only create a blank plot

ggplot(data=iris, mapping=aes(x=Sepal.Length, y=Sepal.Width))

step 2: add scatter plot using geom_point()

ggplot(iris, aes(x=Sepal.Length, y=Sepal.Width)) +
  geom_point()

step 3: Change point size by defining additional parameters within the geom_ function.

ggplot(iris, aes(x=Sepal.Length, y=Sepal.Width)) +
  geom_point(size=2)

step 4: Color the points by species by defining it in the aesthetics (aes() argument)

ggplot(iris, aes(x=Sepal.Length, y=Sepal.Width, color=Species)) +
  geom_point(size=2)

step 5: Define the color scaling

In the base R example plot above, we used a rainbow(3) palette to generate 3 color values. We can do that here using a scale_color_discrete() function. Note: there are lots of different scale_color_ functions, and it might take you a while to get familiar with them.

ggplot(iris, aes(x=Sepal.Length, y=Sepal.Width, color=Species)) +
  geom_point(size=2) +
  scale_color_discrete(type=rainbow(3))

You can also assign your own color palette using scale_color_manual().

ggplot(iris, aes(x=Sepal.Length, y=Sepal.Width, color=Species)) +
  geom_point(size=2) +
  scale_color_manual(values=c("tomato", "slateblue", "gold"))

You can learn all about the color scales in ggplot2 here: https://ggplot2-book.org/scales-colour.html

step 6: Edit the x- and y-axis labels.

Right now, the labels say “Sepal.Length” and “Sepal.Width”. Let’s change the periods into spaces:

ggplot(iris, aes(x=Sepal.Length, y=Sepal.Width, color=Species)) +
  geom_point(size=2) +
  scale_color_discrete(type=rainbow(3)) +
  xlab("Sepal Length") +
  ylab("Sepal Width")

step 7: Changing the plot “theme”

The gray back ground with the white grid lines is a signature background for ggplot. But for publications, you might want the more traditional background. You can play around with the background using “themes”. There are some built-in alternative themes in ggplot that you can call, such as theme_bw().

ggplot(iris, aes(x=Sepal.Length, y=Sepal.Width, color=Species)) +
  geom_point(size=2) +
  scale_color_discrete(type=rainbow(3)) +
  xlab("Sepal Length") +
  ylab("Sepal Width") +
  theme_bw() 

step 8: additional adjustments in theme:

You can also customize themes further using the theme() function.

However, to do this, you need to learn another class of ggplot functions called “elements”. Elements are non-data elements of the plot that you can change.

  • element_text() is used to change text

  • element_line() is used to lines

  • element_blank() is used to remove an element.

For example, I can use it to remove the grid lines.

ggplot(iris, aes(x=Sepal.Length, y=Sepal.Width, color=Species)) +
  geom_point(size=2) +
  scale_color_discrete(type=rainbow(3)) +
  theme_bw() +
  theme(panel.grid.major = element_blank(), panel.grid.minor = element_blank())

I can use element_text() to change the size of the text at the tick marks.

ggplot(iris, aes(x=Sepal.Length, y=Sepal.Width, color=Species)) +
  geom_point(size=2) +
  scale_color_discrete(type=rainbow(3)) +
  xlab("Sepal Length") +
  ylab("Sepal Width") +
  theme_bw() +
  theme(panel.grid.major = element_blank(), panel.grid.minor = element_blank(), axis.text=element_text(size=16, color="purple"))

For a complete guide on ggplot themes, go here: https://ggplot2-book.org/themes.html

step 9: Work with the legend

… ok, most of the time, you probably should have a legend. But, it will be helpful for you to learn how to play around with it. There are several ways to do this, but one way is to edit the legend.position argument in the theme() function.

ggplot(iris, aes(x=Sepal.Length, y=Sepal.Width, color=Species)) +
  geom_point(size=2) +
  scale_color_discrete(type=rainbow(3)) +
  theme_bw() +
  theme(panel.grid.major = element_blank(), panel.grid.minor = element_blank(), legend.position = "none") +
  xlab("Sepal Length") +
  ylab("Sepal Width")

Alternative syntax:

Just to note: You can also move the aesthetic mapping to the geom_point() function rather than the ggplot() function. It doesn’t make any difference in this example because you have only one geom function. But it might make a difference when you are doing more complex visualizations.

ggplot(iris) +
  geom_point(aes(x=Sepal.Length, y=Sepal.Width, color=Species), size=2) +
  scale_color_discrete(type=rainbow(3)) +
  theme_bw() +
  theme(panel.grid.major = element_blank(), panel.grid.minor = element_blank(), legend.position = "none") +
  xlab("Sepal Length") +
  ylab("Sepal Width")

5. Adding a regression line

Let’s go back to the basics and add a linear regression line through the iris sepal data. To start with, we will just add one line for all points. We can do this with geom_smooth()

Let’s first try it without specifying any methods

ggplot(iris, aes(x=Sepal.Length, y=Sepal.Width)) +
  geom_point() +
  geom_smooth() + 
  xlab("Sepal Length") +
  ylab("Sepal Width")
## `geom_smooth()` using method = 'loess' and formula = 'y ~ x'

You can see that it adds a “smooth” line (using a “loess” regression–which stands for locally estimated scatterplot smoothing).

But what we usually want to do is fit a linear regression line:

ggplot(iris, aes(x=Sepal.Length, y=Sepal.Width)) +
  geom_point() +
  geom_smooth(method="lm") +
  xlab("Sepal Length") +
  ylab("Sepal Width")
## `geom_smooth()` using formula = 'y ~ x'

You can see that, when we add a regression line to the whole data, we don’t get any relationship. This is because we are mixing up the data from all three species.

We can separate the species but just using it as a grouping variable:

ggplot(iris, aes(x=Sepal.Length, y=Sepal.Width, group=Species)) +
  geom_point() +
  geom_smooth(method="lm") +
  xlab("Sepal Length") +
  ylab("Sepal Width")
## `geom_smooth()` using formula = 'y ~ x'

… but it’s probably more useful to have them be in different colors:

ggplot(iris, aes(x=Sepal.Length, y=Sepal.Width, color=Species)) +
  geom_point() +
  geom_smooth(method="lm") +
  xlab("Sepal Length") +
  ylab("Sepal Width")
## `geom_smooth()` using formula = 'y ~ x'

Make it prettier with custom colors and formatting adjustments.

ggplot(iris, aes(x=Sepal.Length, y=Sepal.Width, color=Species)) +
  geom_point(size=2) +
  geom_smooth(method="lm") +
  scale_color_manual(values=c("tomato", "slateblue", "gold")) +
  xlab("Sepal Length") +
  ylab("Sepal Width") +
  theme_bw() +
  theme(panel.grid.major = element_blank(), panel.grid.minor = element_blank())
## `geom_smooth()` using formula = 'y ~ x'

6. Faceting

You can also easily create multi-panel plots using facet_wrap() or facet_grid() (you can look up the difference with a web search).

For example, let’s plot the relationships between sepal length and width for each of the three species separately.

ggplot(iris, aes(x=Sepal.Length, y=Sepal.Width)) +
  geom_point(size=2) +
  geom_smooth(method="lm") +
  xlab("Sepal Length") +
  ylab("Sepal Width") +
  theme_bw() +
  theme(panel.grid.major = element_blank(), panel.grid.minor = element_blank()) +
  facet_wrap(~Species)
## `geom_smooth()` using formula = 'y ~ x'


7. Saving ggplot outputs

7.1. The ggsave() function

You can export the last plot you made using the function ggsave(). Enter the file name you want to save it as, including the file extension.

ggsave("scatterplot.png")

You will find the file in your Rproject folder.

7.2. Adjusting the file dimensions

You might find that you want to adjust the width and height of the plot. You can set this in inches or whatever other unit (see ?ggsave() for details).

ggsave("scatterplot.png", width=8, height=4, units="in")

7.3. Best practice: save the plot as an object, and then save it.

A better way is to save the plot as an object, and then save it. Here, we will assign the plot with the legend as p and then save it.

p=ggplot(iris, aes(x=Sepal.Length, y=Sepal.Width, color=Species)) +
  geom_point(size=2) +
  scale_color_discrete(type=rainbow(3)) +
  theme_bw() +
  theme(panel.grid.major = element_blank(), panel.grid.minor = element_blank()) +
  xlab("Sepal Length") +
  ylab("Sepal Width")

#display the plot
p

#save the plot
ggsave("scatterplot_w_legend.pdf", width=8, height=4, units="in")

8. Some other aesthetic options

Here is a vignette for other aesthetic specifications: https://ggplot2.tidyverse.org/articles/ggplot2-specs.html

Here is the “themes” section in the ggplot2 book: https://ggplot2-book.org/polishing.html