Data Import + Merge Conflicts

Lecture 15

Dr. Elijah Meyer

NC State University
ST 295 - Spring 2025

2025-02-25

Announcements

Great work on exam 1! Grades will come out Friday (~ish)

No quiz this week (look for one next week)

Be on the lookout for statistics experience assignment instructions

Be on the lookout for project assignment instructions

Goals

– Creating + exporting new data sets

– practice with the tidyverse for data import

– practice with readxl for data import from Excel

Warm up

# A tibble: 56 × 3
   year  degree     n
   <chr> <chr>  <dbl>
 1 2011  AB         2
 2 2011  AB2        0
 3 2011  BS         5
 4 2011  BS2        2
 5 2012  AB         2
 6 2012  AB2        1
 7 2012  BS         9
 8 2012  BS2        6
 9 2013  AB         4
10 2013  AB2        0
# ℹ 46 more rows

Warm up

What goes in the _____?

statsci |>
  pivot_wider(
    names_from = ____, 
    values_from = _____
  )
# A tibble: 4 × 5
  degree `2011` `2012` `2013` `2014`
  <chr>   <dbl>  <dbl>  <dbl>  <dbl>
1 AB          2      2      4      1
2 AB2         0      1      0      0
3 BS          5      9      4     13
4 BS2         2      6      1      0

Note: This is just a subset (first 5 cols) of the entire data set

Warm up

statsci |>
  pivot_wider(
    names_from = year, # what values become column names
    values_from = n # what values populate the new column
  )

Warm up

How do we go back?

statsci <- statsci_wide |>
  pivot_longer(
    cols = _____,
    names_to = ______,
    values_to = ______
  )

Warm up

statsci_wide |>
  pivot_longer(
    cols = !degree, #bring every column besides degree down
    names_to = "year", #name of the new col created ^
    values_to = "n" #name of new col where values go
  )

Warm up

Extension

We have two lines! One for each group. The default for geom_smooth() is not sufficient. geom_smooth() alone would run a line through all of our points. It’s not adhering to our color = argument in our above aes().

So, we need to specify this with a group = in the aes() function!

How this works

Batting |>
  filter(G > 150) |>
  mutate(era = if_else(yearID <= 1950, "Pre1951" ,"Post1951")) |>
  ggplot(
    aes(x = yearID, y = R, color = era, group = era)
  ) +
  geom_point() + 
  geom_smooth(method=lm, color = "black" , se = F)

Questions?

AE Import

Resolving merge conflicts

Working in teams includes using a shared GitHub repos. Sometimes things will go swimmingly, and sometimes you’ll run into merge conflicts.

What is a merge conflict?

A merge conflict occurs when a version control system, like Git, is unable to automatically combine changes from different branches during a merge operation because they have conflicting edits to the same part of a file..

.. in short: If you make changes to your document + pull changes from your repo that impact the changes you made… you will get a merge conflict

Goals

Today we will

– Go over how to avoid merge conflicts

– Go over how to resolve merge conflicts

– Go over how the “last ditch effort” method (which we try to avoid, but it’s used in practice)

How to avoid merge conflicts

Pull

How to avoid merge conflicts

When you open a project, Pull down any changes that your colleague could have made and pushed up.

When you are finished working, Render, and Push all changes up to the repo, so you don’t have un-committed changes to your document that may cause a merge conflict later!

How it happens: Demo

If a collaborator has made a change to your repo on GitHub that you haven’t incorporated into your local work, GitHub will stop you from pushing to the repo because this could overwrite your collaborator’s work!

  • So you need to explicitly “merge” your collaborator’s work before you can push.

  • If your and your collaborator’s changes are in different files or in different parts of the same file, git merges the work for you automatically when you *pull*.

  • If you both changed the same part of a file, git will produce a **merge conflict** because it doesn’t know how which change you want to keep and which change you want to overwrite.

What it looks like

Git will put conflict markers in your code that look like:

<<<<<<< HEAD 

See also: [dplyr documentation](https://dplyr.tidyverse.org/)   

======= 

See also [ggplot2 documentation](https://ggplot2.tidyverse.org/)  

>>>>>>> some1alpha2numeric3string4

The ===s separate your changes (top) from their changes (bottom).

Note that on top you see the word HEAD, which indicates that these are your changes.