5 More on Factors
5.1 More on manipulating factors
The following section is optional. Intermediate users may find these helpful. You should probably go on to Part 3 and learn some dplyr
basics before starting with these, since we use mutate()
, and it’s important to understand what it’s doing.
You’ll need to load the forcats
package to use these. It’s included with the tidyverse
, but is not loaded by default.
library(tidyverse)
library(forcats)
#load the pets data again
load("data/pets.rda")
5.2 Let’s build a bar plot of weights
To start out, let’s plot the weight for each pet as a bar graph.
ggplot(data = pets, aes(x = id, y = weight)) + geom_bar(stat = "identity")
5.3 Sort by another variable (intermediate)
Let’s sort the barplot by weight. We can do this by adding a fct_reorder()
expression to define a new variable id2
whose categories are ordered by weight
.
Based on this visualization, what can we conclude about the weights of each type of animal? Which kind of animal weighs the most?
pets %>% mutate(id = fct_reorder(id, weight)) %>%
ggplot(aes(x = id, y = weight)) +
geom_bar(stat = "identity")
5.4 Plot in Reverse Alphabetical Order
Often, you want to plot things in reverse alphametical order. This is useful because heatmaps and such are often plotted from the bottom.
You can use fct_rev
to do this.
library(forcats)
pets %>% mutate(id = fct_rev(id)) %>%
ggplot(aes(x = id, y = weight)) +
geom_bar(stat = "identity")
5.5 Sort by frequency
Going back to our pets
data, sometimes we want to sort our count data by frequency. We can use fct_infreq()
to do that.
How would we plot these in ascending order?
pets %>% mutate(name = fct_infreq(name)) %>%
ggplot(aes(x = name)) + geom_bar()
5.6 Recode levels of a factor
Sometimes we want to rename the levels of a factor. Often the data may have obscure categories (such as abbreviations), and we want to be clear in our visualization.
As a silly example, let’s change the names of the levels to the latin genus names for each animal. Note we didn’t change the name of gerbil
. What is the result?
pets %>% mutate(genus = fct_recode(animal, canis = "dog", felis = "cat")) %>%
ggplot(aes(x = genus)) + geom_bar()
5.7 Group levels of a factor together
Sometimes, your categories are too granular. It might make sense to aggregate some categories together. You can use fct_collapse()
to do this.
pets %>% mutate(alphabet = fct_collapse(name,
A = c("Apples"),
F = c("Fido"),
M = c("Morris", "Mr Bowser"),
L = c("Lady Sheba"),
H = c("Hubert"),
W = c("Winky"))
) %>%
ggplot(aes(x = alphabet)) + geom_bar()
5.7.1 forcats
does way more!
Reference page: http://forcats.tidyverse.org/