Data Import - Solutions

Packages

We will use the following two packages in this application exercise.

  • tidyverse: For data import, wrangling, and visualization.
  • readxl: For importing data from Excel.

Data creation

Nobel winners

A Nobel laureate is a person or organization that receives a Nobel Prize. The term “laureate” comes from the laurel wreaths that were given to victors in ancient Greece. We are going to load in and work with a Nobel laureate data set.

  • Demo: Load the data from the data folder and assign it to nobel. Confirm that this new object appears in your Environment tab. Take a look at some of the variables in this data set. Specifically, we will be working with category variable. This variable defines the category of price.
nobel <- read_csv("data/nobel.csv")
  • Your turn Split the data into two different data sets: nobel laureates in STEM fields (category should be Physics, Medicine, Chemistry, or Economics) and nobel laureates in non-STEM fields. Name these two new objects nobel_stem and nobel_nonstem.
# define stem fields
stem_fields <- c("Physics", "Medicine", "Chemistry", "Economics")

nobel_stem <- nobel |>
  filter(category %in% stem_fields )

# non-steam laureates

nobel_nonstem <- nobel |>
  filter(!(category %in% stem_fields))

Export these data

Pull up the help file for the function write_csv().

In short, we can do a TON! Typically, we do everything we already need to our data, and then export it. Let’s export the R objects as .csv files. Specifically, let’s put them in our respective data folders.

write_csv(nobel_stem , "data/nobel-stem.csv")
write_csv(nobel_nonstem, "data/nobel-non.csv")

Sales

Sales data are stored in an Excel file that looks like the following:

  • Demo: Read in the Excel file called sales.xlsx from the data/ folder such that it looks like the following. Call it sales.

sales <- read_excel("data/sales.xlsx",
                    skip = 3,
                    col_names = c("id" , "n"))