1 The gRammar of gRaphics with ggplot2

In this section, we’ll discuss the Grammar of Graphics developed by Leland Wilkinson and implemented in R via Hadley Wickham. We’ll see how this is applied to a scatterplot with and without a regression line. These ideas will then be extended in Part 2 of the workshop.

1.1 The Grammar of Graphics

  • What are the variables here?
  • What is the observational unit?
    • i.e., what is the THING being measured?
  • How are the variables mapped to aesthetics?

What is a statistical graphic?

A mapping of data variables

to aes()thetic attributes

of geom_etric objects.


1.2 Back to basics

Consider the following data in tidy format:

simple_ex <-
  data_frame(
    A = c(1980, 1990, 2000, 2010),
    B = c(1, 2, 4, 5),
    C = c(3, 2, 1, 2),
    D = c("low", "low", "high", "high")
  )
simple_ex
  • Sketch the graphics below on paper, where the x-axis is variable A and the y-axis is variable B
  1. A scatterplot
  2. A scatterplot with fitted least-squares regression line

Intermediate folks:

    1. A scatter plot where the color of the points corresponds to D and the size of the points corresponds to C
    1. Only show a regression line of color “goldenrod” (no points and also no error bounds)

  1. A scatterplot
ggplot(data = simple_ex, mapping = aes(x = A, y = B)) + 
  geom_point()

  1. A scatterplot with fitted least-squares regression line
ggplot(data = simple_ex, mapping = aes(x = A, y = B)) + 
  geom_point() +
  geom_smooth(method = "lm")

Intermediate

  1. A scatter plot where the color of the points corresponds to D and the size of the points corresponds to C
ggplot(data = simple_ex, mapping = aes(x = A, y = B)) + 
  geom_point(mapping = aes(color = D, size = C))

  1. Only show a regression line of color “goldenrod” (no points and also no error bounds)
ggplot(data = simple_ex, mapping = aes(x = A, y = B)) + 
  geom_smooth(method = "lm", se = FALSE, color = "goldenrod")


1.3 Your Task

Recreate the gapminder plot shown at the beginning of this workshop (and below) using ggplot2 and the gapminder data frame in the gapminder package. The Data Visualization Cheat Sheet from RStudio may be helpful.

Note: To focus on only the rows in the data frame corresponding to 1992 we use the filter function from dplyr that we will discuss in Part 3 of this workshop/book.

eyJsYW5ndWFnZSI6InIiLCJzYW1wbGUiOiJsaWJyYXJ5KGdhcG1pbmRlcilcbmxpYnJhcnkoZHBseXIpXG5saWJyYXJ5KGdncGxvdDIpXG5nYXAxOTkyIDwtIGdhcG1pbmRlciAlPiUgZmlsdGVyKHllYXIgPT0gMTk5MilcblxuI1NwYWNlIGZvciB5b3VyIGFuc3dlciBoZXJlLiIsInNvbHV0aW9uIjoibGlicmFyeShnYXBtaW5kZXIpXG5saWJyYXJ5KGRwbHlyKVxubGlicmFyeShnZ3Bsb3QyKVxuZ2FwMTk5MiA8LSBnYXBtaW5kZXIgJT4lIGZpbHRlcih5ZWFyID09IDE5OTIpXG5nZ3Bsb3QoZGF0YSA9IGdhcDE5OTIsXG4gICAgICAgbWFwcGluZyA9IGFlcyh4ID0gbG9nKGdkcFBlcmNhcCwgYmFzZSA9IDEwKSwgXG4gICAgICAgICAgICAgICAgICAgICB5ID0gbGlmZUV4cCwgXG4gICAgICAgICAgICAgICAgICAgICBjb2xvciA9IGNvbnRpbmVudCxcbiAgICAgICAgICAgICAgICAgICAgIHNpemUgPSBwb3ApKSArXG4gIGdlb21fcG9pbnQoKSArIFxuICB4bGFiKFwiR3Jvc3MgRG9tZXN0aWMgUHJvZHVjdCAobG9nIHNjYWxlKVwiKSArIFxuICB5bGFiKFwiTGlmZSBFeHBlY3RhbmN5IGF0IGJpcnRoICh5ZWFycylcIikgKyBcbiAgZ2d0aXRsZShcIkdhcG1pbmRlciBmb3IgMTk5MlwiKSIsInNjdCI6IiNkZXZ0b29sczo6aW5zdGFsbF9naXRodWIoXCJkYXRhY2FtcC90ZXN0d2hhdFwiKVxudGVzdF9saWJyYXJ5X2Z1bmN0aW9uKFwiZ2FwbWluZGVyXCIpXG50ZXN0X2xpYnJhcnlfZnVuY3Rpb24oXCJkcGx5clwiKVxudGVzdF9saWJyYXJ5X2Z1bmN0aW9uKFwiZ2dwbG90MlwiKVxudGVzdF9vYmplY3QoXCJnYXAxOTkyXCIpXG50ZXN0X29yKHtcbiAgIyBtYXRjaCBvcmlnaW5hbCBzb2x1dGlvblxuICBnZ3Bsb3RfZnVuIDwtIGV4KCkgJT4lIGNoZWNrX2Z1bmN0aW9uKFwiZ2dwbG90XCIpXG4gIGdncGxvdF9mdW4gJT4lIGNoZWNrX2FyZyhcImRhdGFcIikgJT4lIGNoZWNrX2VxdWFsKGV2YWwgPSBGQUxTRSlcbiAgZ2dwbG90X2Z1biAlPiUgY2hlY2tfYXJnKFwibWFwcGluZ1wiKVxuICBhZXNfZnVuIDwtIGV4KCkgJT4lIGNoZWNrX2Z1bmN0aW9uKFwiYWVzXCIpXG4gIGFlc19mdW4gJT4lIGNoZWNrX2FyZyhcInhcIikgJT4lIGNoZWNrX2VxdWFsKGV2YWwgPSBGQUxTRSlcbiAgYWVzX2Z1biAlPiUgY2hlY2tfYXJnKFwieVwiKSAlPiUgY2hlY2tfZXF1YWwoZXZhbCA9IEZBTFNFKVxuICBhZXNfZnVuICU+JSBjaGVja19hcmcoXCJjb2xvclwiKSAlPiUgY2hlY2tfZXF1YWwoZXZhbCA9IEZBTFNFKVxuICBhZXNfZnVuICU+JSBjaGVja19hcmcoXCJzaXplXCIpICU+JSBjaGVja19lcXVhbChldmFsID0gRkFMU0UpXG4gIGV4KCkgJT4lIGNoZWNrX2Z1bmN0aW9uKFwiZ2VvbV9wb2ludFwiKVxuICBleCgpICU+JSBjaGVja19mdW5jdGlvbihcInhsYWJcIilcbiAgZXgoKSAlPiUgY2hlY2tfZnVuY3Rpb24oXCJ5bGFiXCIpXG4gIGV4KCkgJT4lIGNoZWNrX2Z1bmN0aW9uKFwiZ2d0aXRsZVwiKVxufSwge1xuICAjIG1hdGNoIGFsdCBzb2x1dGlvbjpcbiAgc29sX2FsdF9jb2RlIDwtICdnZ3Bsb3QoZGF0YSA9IGdhcDE5OTIsXG4gICAgICAgbWFwcGluZyA9IGFlcyh4ID0gbG9nKGdkcFBlcmNhcCwgYmFzZSA9IDEwKSxcbiAgICAgICAgICAgICAgICAgICAgIHkgPSBsaWZlRXhwKSkgK1xuICBnZW9tX3BvaW50KG1hcHBpbmcgPSBhZXMoY29sb3IgPSBjb250aW5lbnQsXG4gICAgICAgICAgICAgICAgICAgICBzaXplID0gcG9wKSkgK1xuICB4bGFiKFwiR3Jvc3MgRG9tZXN0aWMgUHJvZHVjdCAobG9nIHNjYWxlKVwiKSArXG4gIHlsYWIoXCJMaWZlIEV4cGVjdGFuY3kgYXQgYmlydGggKHllYXJzKVwiKSArXG4gIGdndGl0bGUoXCJHYXBtaW5kZXIgZm9yIDE5OTJcIiknXG4gIGFsdF9leCA8LSBleCgpICU+JSBvdmVycmlkZV9zb2x1dGlvbihzb2xfYWx0X2NvZGUpXG5cbiAgZ2dwbG90X2Z1biA8LSBhbHRfZXggJT4lIGNoZWNrX2Z1bmN0aW9uKFwiZ2dwbG90XCIpXG4gIGdncGxvdF9mdW4gJT4lIGNoZWNrX2FyZyhcImRhdGFcIikgJT4lIGNoZWNrX2VxdWFsKGV2YWwgPSBGQUxTRSlcbiAgZ2dwbG90X2Z1biAlPiUgY2hlY2tfYXJnKFwibWFwcGluZ1wiKVxuICBhZXNfZnVuIDwtIGFsdF9leCAlPiUgY2hlY2tfZnVuY3Rpb24oXCJhZXNcIilcbiAgYWVzX2Z1biAlPiUgY2hlY2tfYXJnKFwieFwiKSAlPiUgY2hlY2tfZXF1YWwoZXZhbCA9IEZBTFNFKVxuICBhZXNfZnVuICU+JSBjaGVja19hcmcoXCJ5XCIpICU+JSBjaGVja19lcXVhbChldmFsID0gRkFMU0UpXG4gIGdlb21fcG9pbnRfZnVuIDwtIGFsdF9leCAlPiUgY2hlY2tfZnVuY3Rpb24oXCJnZW9tX3BvaW50XCIpXG4gIGdlb21fcG9pbnRfZnVuICU+JSBjaGVja19hcmcoXCJtYXBwaW5nXCIpXG4gIGFsdF9leCAlPiUgY2hlY2tfZnVuY3Rpb24oXCJhZXNcIiwgaW5kZXggPSAyKSAlPiUgY2hlY2tfYXJnKFwiY29sb3JcIikgJT4lIGNoZWNrX2VxdWFsKGV2YWwgPSBGQUxTRSlcbiAgYWx0X2V4ICU+JSBjaGVja19mdW5jdGlvbihcImFlc1wiLCBpbmRleCA9IDIpICU+JSBjaGVja19hcmcoXCJzaXplXCIpICU+JSBjaGVja19lcXVhbChldmFsID0gRkFMU0UpXG4gIGFsdF9leCAlPiUgY2hlY2tfZnVuY3Rpb24oXCJ4bGFiXCIpXG4gIGFsdF9leCAlPiUgY2hlY2tfZnVuY3Rpb24oXCJ5bGFiXCIpXG4gIGFsdF9leCAlPiUgY2hlY2tfZnVuY3Rpb24oXCJnZ3RpdGxlXCIpXG59KVxuIyBGb3Igc29tZSByZWFzb24sIHRoZSBzdWNjZXNzIG1lc3NhZ2UgaXNuJ3QgYXBwZWFyaW5nP1xuc3VjY2Vzc19tc2coXCJOaWNlISBZb3VyIGNvZGUgcHJvZHVjZXMgdGhlIHBsb3Qgc2hvd24uXCIpXG50ZXN0X2Vycm9yKCkifQ==