dplyr: data wrangling

Lecture 6

Dr. Elijah Meyer

NC State University
ST 295 - Spring 2025

2025-01-28

Checklist

– Have you cloned the today’s AE repo?

– Are you keeping up with prepare material?

  > This includes making sure you can render a document to a PDF!

Video recordings

This class does not have classroom capture

We will Zoom record the screen, and I will post the videos to Moodle.

Note: I reserve the right to stop posting the videos if no one starts to show up…

Homework-1

Homework 1 is due tonight at 11:59 pm.

Please make sure that you:

– select pages

– check your pdf to make sure code does not run off page

– reach out if you have any questions! You can submit your homework as many times as you would like

How can we break apart this plot by island?

facet_wrap

We can use the function facet_wrap() to “break apart” a plot by another variable.

The syntax is ~variable.name, and this is layered onto your ggplot

What is the difference between fill and color?

Warm-up

fill fills in the geometric shape. color outlines the geometric shape.

The one exception is with geometric shapes that do not have a border. This includes scatter plot points.

Warm up

What’s the difference between geom_point() and geom_jitter()?

Can you stack geoms on top of each other?

Data wrangling

Why is data wrangling important?

Data wrangling

From the reading:

Often you’ll need to create some new variables or summaries to answer your questions with your data, or maybe you just want to rename the variables or reorder the observations to make the data a little easier to work with.

dplyr package

The dplyr package in R is a collection of functions that help users manipulate data frames. It’s a core part of the tidyverse, a group of packages in R

filter()

mutate()

count()

– and more!

ae-05