Evaluating Regression Models

Packages

Loading required package: airports
Loading required package: cherryblossom
Loading required package: usdata
── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
✔ dplyr     1.1.4     ✔ readr     2.1.5
✔ forcats   1.0.0     ✔ stringr   1.5.1
✔ ggplot2   3.5.1     ✔ tibble    3.2.1
✔ lubridate 1.9.3     ✔ tidyr     1.3.1
✔ purrr     1.0.2     
── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
✖ dplyr::filter() masks stats::filter()
✖ dplyr::lag()    masks stats::lag()
ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
── Attaching packages ────────────────────────────────────── tidymodels 1.2.0 ──
✔ broom        1.0.6     ✔ rsample      1.2.1
✔ dials        1.3.0     ✔ tune         1.2.1
✔ infer        1.0.7     ✔ workflows    1.1.4
✔ modeldata    1.4.0     ✔ workflowsets 1.1.0
✔ parsnip      1.2.1     ✔ yardstick    1.3.1
✔ recipes      1.1.0     
── Conflicts ───────────────────────────────────────── tidymodels_conflicts() ──
✖ scales::discard() masks purrr::discard()
✖ dplyr::filter()   masks stats::filter()
✖ recipes::fixed()  masks stringr::fixed()
✖ dplyr::lag()      masks stats::lag()
✖ yardstick::spec() masks readr::spec()
✖ recipes::step()   masks stats::step()
• Learn how to get started at https://www.tidymodels.org/start/

Iris data

Up through this point, we have gone through SLR and MLR. You may be asking yourself, how do we choose? There are a lot of different ways we can justify a choice of model. In this activity, we are going to justify a model based on it’s predictive performance.

The Iris data set is a pre-loaded data set in R. Pull up the help file for the Iris data set to see what it’s all about. In this case, we are interested in Sepal length, and will fit a variety of models to investigate Sepal length.

The goal of this activity is to choose a model that “performs the best” when predicting new observations.

Last time

iris_full <- iris |>
  mutate(id = 1:nrow(iris))

set.seed(123) # sample_n does not have a set.seed argument... so use this!

train <- iris_full |>
  sample_n(.8*nrow(iris_full), replace = FALSE)


test <- iris_full |>
  anti_join(train, by = "id")

model1 <- linear_reg() |> # linear reg
  fit(Sepal.Length ~ Sepal.Width, data = train) # y ~ x

pred <- predict(model1, data.frame(Sepal.Width = test$Sepal.Width))

data <- data.frame(pred, test$Sepal.Length) # y-hat, y

mae(data, truth = test.Sepal.Length, estimate = .pred)
# A tibble: 1 × 3
  .metric .estimator .estimate
  <chr>   <chr>          <dbl>
1 mae     standard       0.631

Let’s compare models

Repeat the steps above, but now, fit an interaction model with the explanatory variables Petal Width and Sepal Width. Calculate the MAE. How does it compare to your SLR model?

model2 <- linear_reg() |>
  fit(Sepal.Length ~ Sepal.Width*Petal.Width, data = train)

pred2 <- predict(model2, data.frame(Sepal.Width = test$Sepal.Width, Petal.Width = test$Petal.Width ))

data2 <- data.frame(pred2, test$Sepal.Length)

mae(data2, .pred, test.Sepal.Length)
# A tibble: 1 × 3
  .metric .estimator .estimate
  <chr>   <chr>          <dbl>
1 mae     standard       0.275

Make a decision

Model 2 does a better job at predicting sepal length than model 1

Adjusted-R-squared

What if we use adjusted-r-squared? Do we come to the same conclusions?

Maybe? Adjusted R-squared and Mean Absolute Error (MAE) are both used to evaluate regression models, but they focus on different aspects of model performance: Adjusted R-squared measures how well the model explains the variance in Y (after accounting for the number of variables in the model), while MAE measures the average magnitude of prediction errors.

Can adjusted r squared go down?

Yes, we know this to be true. Is that the case here? Compare model 2 with the SLR model and a MLR model with just an additive effect.

model1 <- lm(Sepal.Length ~ Sepal.Width, data = iris) #SLR

summary(model1)$adj.r.square
[1] 0.007159294
model2 <- lm(Sepal.Length ~ Sepal.Width * Petal.Width, data = iris) #int

summary(model2)$adj.r.square #.702
[1] 0.7021525
model3 <- lm(Sepal.Length ~ Sepal.Width + Petal.Width, data = iris) #add

summary(model3)$adj.r.square #.703
[1] 0.7032539

Interaction: The relationship between sepal length (y) and sepal width (x) depends on petal width (x2)

Add: The relationship between sepal length (y) and sepal width (x) does NOT depend on petal width (x2)

Which model would you choose here?

Add model

What is Occam’s Razor (also known as the law of parsimony): The more simple model is easier to interpret

Pick the more simple model if there isn’t a big difference between the two

Are any of these models appropriate?

Let’s go to the slides!

Two things need to hold true for these models to be trustworthy:

– Independence: Checked based on how the data are collected / related to each other (independence of individual observations and NOT the variables)

– Linearity: Take a look at residuals vs fitted and check for random scatter pattern (cloud like)

Let’s define these terms, and discuss how to check them below….

#base R with a base R lm model 

plot(model1, which=c(1))

plot(model2, which=c(1))