library(tidyverse)
library(tidymodels)
library(kableExtra)
fish <- read_csv("data/fish.csv")Modeling fish
For this application exercise, we will work with data on fish. The dataset we will use, called fish, is on two common fish species in fish market sales.
The data dictionary is below:
| variable | description |
|---|---|
species |
Species name of fish |
weight |
Weight, in grams |
length_vertical |
Vertical length, in cm |
length_diagonal |
Diagonal length, in cm |
length_cross |
Cross length, in cm |
height |
Height, in cm |
width |
Diagonal width, in cm |
Visualizing the model
We’re going to investigate the relationship between the weights and heights of fish. Specifically, we are interested in if a fish’s height helps us better understand a fish’s weight.
Define the following terms below.
Response variable: what we are trying to measure / what we are interested in (weight) - goes on the y-axis
Explanatory variable: a variable that explains/helps us understand what’s going on with our response variable (height) - goes on the x-axis
- Demo: Create an appropriate plot to investigate this relationship. Add appropriate labels to the plot, and fit a line.
fish |>
ggplot(
aes(x = height, y = weight)
) +
geom_point() +
geom_smooth(method = "lm", se = F)`geom_smooth()` using formula = 'y ~ x'

Residual
Please run the following code below:
fish_hw_fit <- linear_reg() |> # this is the model
fit(weight ~ height, data = fish) # we will talk about this in a different question
fish_hw_aug <- augment(fish_hw_fit$fit)
ggplot(fish_hw_aug, aes(x = height, y = weight)) +
geom_point() +
geom_smooth(method = "lm", se = FALSE, color = "lightgrey") +
geom_segment(aes(xend = height, yend = .fitted), color = "gray") +
geom_point(aes(y = .fitted), shape = "circle open") +
theme_minimal() +
labs(
title = "Weights vs. heights of fish",
subtitle = "Residuals",
x = "Height (cm)",
y = "Weight (gr)"
)`geom_smooth()` using formula = 'y ~ x'

Practice: Using technical terms, how is R picking the “best line” to fit?
minimizes the residual sums of squares (RSS)
-
What types of questions can this plot help answer?
*1. Prediction: For a given height, what's my predicted weight?* *2. Relationship: What is the relationship between height and weight?*
Model fitting
- Demo: Fit a model to predict fish weights from their heights. Comment the code below.
fish_hw_fit <- linear_reg() |> # tell R that I want to do linear regression
fit(weight ~ height, data = fish) # y ~ x, data = Model summary
-
Demo: Display the model summary. Next, show how you can extract these values from the model output. Hint: pull up the help file
pull().
