Lecture 25
NC State University
ST 295 - Spring 2025
2025-04-15
– No more HWs
– No more Quizzes
– Peer Review due tonight (11:59pm)
> My feedback will get to you by Aug 17th
– Final write up (report) & presentation due April 25th (11:59pm)
> Moved deadline back a couple days
> Kept review day + changed presentation format
– Final Exam (8:30am) on April 24th
> Can bring a note sheet
> Cumulative with very heavy emphasis on Unit 2
– Statistics experience due April 22nd (11:59pm)
– The What, Why, and How of Logistic Regression
– How to fit these types of models in R
– How to calculate probabilities using these models
Similar to linear regression…. but
Modeling tool when our response is categorical
– This type of model is called a generalized linear model
We want to fit an S curve (and not a straight line)…
where we model the probability of success as a function of explanatory a variable(s)
But linear regression fits a straight line… so we need to do something to fit that S curve…
– Bernoulli Distribution
2 outcomes: Success (p) or Failure (1-p)
\(y_i\) ~ Bern(p)
What we can do is we can use our explanatory variable(s) to model p
Note: We use \(p_i\) for estimated probabilities
What values can probability take on?
Probabilities can take on the values of [0,1]…
Need: this means that we need to work with a model that constrains estimated probabilities (our response) to be on the correct scale [0,1]
p = \(\widehat{\beta_o} +\widehat{\beta}_1X1 + ....\) (bad)
so
we apply a “non-linear” transformation to the left side to fix the issue!
The transformation is called a logit link transformation
logit(p) = \(\ln(\frac{p}{1-p})\)
\(\widehat{\ln(\frac{p}{1-p})}\) = \(\widehat{\beta_o} +\widehat{\beta}_1X1 + ....\) (good)
\(ln(\frac{p}{1-p})\) is called the logit link function, and can take on the values from \(-\infty\) to \(\infty\)
\(ln(\frac{p}{1-p})\) represents the log odds of a success
p stands for probability
This logit link function restricts p to be between the values of [0,1]
Which is exactly what we want!
\(\widehat{ln(\frac{p}{1-p}})\) = \(\widehat{\beta_o} +\widehat{\beta}_1X1 + ....\)
– How do we take the inverse of a natural log?
– Taking the inverse of the logit function will map arbitrary real values back to the range [0, 1]
\[\widehat{ln(\frac{p}{1-p}}) = \widehat{\beta_o} +\widehat{\beta}_1X1 + ....\]
– Lets take the inverse of the logit function
– Demo Together
\[\hat{p} = \frac{e^{\widehat{\beta_o} + \widehat{\beta_1}X1 + ...}}{1 + e^{\widehat{\beta_o} + \widehat{\beta_1}X1 + ...}}\]
Example Figure:
A full data set is broken up into two parts
– Training Data Set
– Testing Data Set
– Training Data Set - the initial dataset that you fit your model on
– Testing Data Set - the dataset that you test your model on
split <- initial_split(data.set, prop = 0.80)
train_data <- training(split)
test_data <- testing(split)
If our model is doing well… we would want it to …
– predict a success when we actually observe a success
– predict a failure when we actually observe a failure
– Given that the email was actually spam, what is the probability that our model predicted the email to be spam? (Sensitivity)
– Given that the email was not spam, what is the probability that our model predicted the email to not be spam? (Specificity)
– Fit a model on the training data set (just like linear regression)
– Calculate predictions using x from the testing data set (just like linear regression)
– Compare y from the testind data set vs the predictions (just like linear regression)
> But now instead of MAE, we are going to look at specificity and sensitivity
With a categorical response variable, we use the logit link (logistic function) to calculate the log odds of a success
\(\widehat{ln(\frac{p}{1-p})}\) = \(\widehat{\beta_o} +\widehat{\beta}_1X1 + ....\)
We can use the same model to estimate the probability of a success
\[\hat{p} = \frac{e^{\widehat{\beta_o} + \widehat{\beta_1}X1 + ...}}{1 + e^{\widehat{\beta_o} + \widehat{\beta_1}X1 + ...}}\]