Modeling fish

For this application exercise, we will work with data on fish. The dataset we will use, called fish, is on two common fish species in fish market sales.

The data dictionary is below:

variable description
species Species name of fish
weight Weight, in grams
length_vertical Vertical length, in cm
length_diagonal Diagonal length, in cm
length_cross Cross length, in cm
height Height, in cm
width Diagonal width, in cm

Visualizing the model

We’re going to investigate the relationship between the weights and heights of fish. Specifically, we are interested in if a fish’s height helps us better understand a fish’s weight.

Define the following terms below.

Response variable: what we are trying to measure / what we are interested in (weight) - goes on the y-axis

Explanatory variable: a variable that explains/helps us understand what’s going on with our response variable (height) - goes on the x-axis

  • Demo: Create an appropriate plot to investigate this relationship. Add appropriate labels to the plot, and fit a line.
fish |>
  ggplot(
    aes(x = height, y = weight)
  ) +
  geom_point() +
  geom_smooth(method = "lm", se = F)
`geom_smooth()` using formula = 'y ~ x'

Residual

Please run the following code below:

fish_hw_fit <- linear_reg() |> # this is the model
  fit(weight ~ height, data = fish) # we will talk about this in a different question

fish_hw_aug <- augment(fish_hw_fit$fit)

ggplot(fish_hw_aug, aes(x = height, y = weight)) + 
  geom_point() + 
  geom_smooth(method = "lm", se = FALSE, color = "lightgrey") +  
  geom_segment(aes(xend = height, yend = .fitted), color = "gray") +  
  geom_point(aes(y = .fitted), shape = "circle open") + 
  theme_minimal() +
  labs(
    title = "Weights vs. heights of fish",
    subtitle = "Residuals",
    x = "Height (cm)",
    y = "Weight (gr)"
  )
`geom_smooth()` using formula = 'y ~ x'

Practice: Using technical terms, how is R picking the “best line” to fit?

minimizes the residual sums of squares (RSS)

  • What types of questions can this plot help answer?

    *1. Prediction: For a given height, what's my predicted weight?*
    *2. Relationship: What is the relationship between height and weight?*

Model fitting

  • Demo: Fit a model to predict fish weights from their heights. Comment the code below.
fish_hw_fit <- linear_reg() |> # tell R that I want to do linear regression
  fit(weight ~ height, data = fish) # y ~ x, data = 

Model summary

  • Demo: Display the model summary. Next, show how you can extract these values from the model output. Hint: pull up the help file pull().
tidy(fish_hw_fit) |>
  filter(term == "(Intercept)") |>
  pull(estimate)
[1] -288.4152