dplyr: joins and pivots

Lecture 9

Dr. Elijah Meyer

NC State University
ST 295 - Spring 2025

2025-02-06

Checklist

– Have you cloned the today’s AE repo?

– Are you keeping up with prepare material?

– Homework-2 is live! Due Feb 10th at 11:59

  -- Late window Feb 11th at 11:59 
  

– Quiz-4 released today at noon!

Warm up

What would left_join(x,y) produce?

full_join()?

x <- tibble(
  value = c(1, 2, 3),
  xcol = c("x1", "x2", "x3")
  )
y <- tibble(
  value = c(1, 2, 4),
  ycol = c("y1", "y2", "y4")
  )

Warm up

Read the following code as a sentence

penguins |>
  filter(species == "Adelie",
         island == "Torgersen",
         !is.na(body_mass_g)) |>
  mutate(size = if_else(body_mass_g > 4000,"large","small")) |>
  select(body_mass_g, size)

AE

if_else() vs case_when()

– joining by > 1 variable

– pivots

Joins Summary

– There are many ways to join data

– Let the join criteria choose the function for you

– Data sets are joined by a “key”

– The key(s) default to common names across data sets unless specified

– Can join on variables with different names by using the = sign by = c("variable1" = "variable2")

Forms of Data

Data Format (Wide vs Long)

Wide data contains values that do not repeat in the first column

Long data contains values that do repeat in the first column

Data Format (Wide vs Long)

– Which have we typically used to create plots in this class?

Tidy Data

There are three interrelated rules that make a dataset tidy:

  • Each variable is a column; each column is a variable.

  • Each observation is row; each row is an observation.

  • Each value is a cell; each cell is a single value.

Motivation

– Sometimes, data are not in this format…

pivots

pivot_longer

pivot_wider

pivot_wider

pivot_wider

– Making tables for quick comparison / display purposes

names_from

values_from

AE