Why Is Logistic Regression Considered a Linear Model?

Learn why logistic regression is considered a linear model.

We'll cover the following

Logistic regression as a linear model
- What is a linear model?
Sigmoid and logit functions
Logistic regression is a linear model

Logistic regression as a linear model

We mentioned previously that logistic regression is considered a linear model, while we were exploring whether the relationship between features and response resembled a linear relationship. Recall that we plotted groupby/mean of the EDUCATION feature in the “Data Exploration” chapter, as well as for the PAY_1 feature in this chapter, to see whether the default rates across values of these features exhibited a linear trend. While this is a good way to get a quick approximation of how “linear or not” these features may be, here we formalize the notion of why logistic regression is a linear model.

What is a linear model?

A model is considered linear if the transformation of features that is used to calculate the prediction is a linear combination of the features. The possibilities for a linear combination are that each feature can be multiplied by a numerical constant, these terms can be added together, and an additional constant can be added. For example, in a simple model with two features, $X_1$ and $X_2$ , a linear combination would take the following form:

Linear \space combination \space of \space X_1 \space and \space X_2 = θ_0 + θ_1X_1 + θ_2X_2

The constants $θ_i$ can be any number, positive, negative, or zero, for i = 0, 1, and 2 (although if a coefficient is 0, this removes a feature from the linear combination). A familiar example of a linear transformation of one variable is a straight line with the equation y = mx + b. In this case, $θ_0 = b$ and $θ_1 = m$ . $θ_0$ is called the intercept of a linear combination, which should be familiar from algebra.

What kinds of things are “not allowed” in linear transformations? Any other mathematical expressions besides what was just described, such as the following:

Multiplying a feature by itself; for example, $X_1^2$ or $X_1^3$ . These are called polynomial terms.
Multiplying features together; for example, $X_1X_2$ . These are called interactions.
Applying non-linear transformations to features; for example, log and square root.
Other complex mathematical functions.
“If then” types of statements. For example, “if $X_1 > a$ , then $y = b$ .”

However, while these transformations are not part of the basic formulation of a linear combination, they could be added to a linear model by engineering features, for example, defining a new feature, $X3 = X_1^2$ .

Sigmoid and `logit` functions

Earlier, we learned that the predictions of logistic regression, which take the form of probabilities, are made using the sigmoid function. Taking another look here, we see that this function is clearly non-linear:

sigmoid(x) = \frac{1}{1+e^{-X}}

Why, then, is logistic regression considered a linear model? It turns out that the answer to this question lies in a different formulation of the sigmoid equation, called the logit function. We can derive the logit function by solving the sigmoid function for $X$ ; in other words, finding the inverse of the sigmoid function. First, we set the sigmoid equal to $p$ , which we interpret as the probability of observing the positive class, then solve for $X$ as shown in the following:

p = \frac{1}{1+e^{-X}}

1+e^{-X} = \frac{1}{p}

e^{-X} = \frac{1}{p}-1

e^{-X} = \frac{1-p}{p}

e^{X} = \frac{p}{1-p}

X = log(\frac{p}{1-p})

Here, we’ve used some laws of exponents and logs to solve for $X$ . You may also see logit expressed as follows:

X = log(\frac{p}{q})

In this expression, the probability of failure, $q$ , is expressed in terms of the probability of success, $p; q = 1 - p$ , because probabilities sum to 1. Even though in our case, credit default would probably be considered a failure in the sense of real-world outcomes, the positive outcome (response variable = 1 in a binary problem) is conventionally considered “success” in mathematical terminology. The logit function is also called the log odds, because it is the natural logarithm of the odds ratio, $p/q$ . Odds ratios may be familiar from the world of gambling, via phrases such as “the odds are 2 to 1 that team $a$ will defeat team $b$ .”

Logistic regression is a linear model

In general, what we’ve called capital $X$ in these manipulations can stand for a linear combination of all the features. For example, this would be $X = θ_0 + θ_1X_1 + θ_2X_2$ in our simple case of two features. Logistic regression is considered a linear model because the features included in $X$ are, in fact, only subject to a linear combination when the response variable is considered to be the log odds. This is an alternative way of formulating the problem, as compared to the sigmoid equation. Putting the pieces together, the features $X_1, X_2,…, X_j$ look like this in the sigmoid equation version of logistic regression:

p = \frac{1}{1+e^{-(θ_0 + θ_1X_1 + θ_2X_2+...θ_jX_j)}}

But they look like this in the log odds version, which is why logistic regression is called a linear model:

θ_0 + θ_1X_1 + θ_2X_2+...θ_jX_j = log\frac{p}{q}

Because of this way of looking at logistic regression, ideally, the features of a logistic regression model would be linear in the log odds of the response variable. We will see what is meant by this in the next lesson.

Logistic regression is part of a broader class of statistical models called Generalized Linear Models (GLMs). GLMs are connected to the fundamental concept of ordinary linear regression, which may have one feature (that is, the line of best fit, y = mx + b, for a single feature, x) or more than one in multiple linear regression. The mathematical connection between GLMs and linear regression is the link function. The link function of logistic regression is the logit function we just learned about.

Get hands-on with 1400+ tech skills courses.

Introduction

Data Exploration and Cleaning

(Challenge) Exploring Remaining Financial Features in Dataset

Introduction to scikit-learn and Model Evaluation

(Challenge) Logistic Regression and Precision-Recall Curve

Details of Logistic Regression and Feature Extraction

(Challenge) Logistic Regression Model and Coefficients

The Bias-Variance Trade-Off

(Challenge) Cross-Validation and Feature Engineering

Decision Trees and Random Forests

(Challenge) Cross-Validation Grid Search with Random Forest

Gradient Boosting, XGBoost, and SHAP Values

(Challenge) XGBoost and SHAP Explanation for Case Study Data

Test Set Analysis, Financial Insights, and Delivery to the Client

(Challenge) Deriving Financial Insights

Appendix

Why Is Logistic Regression Considered a Linear Model?

Logistic regression as a linear model

What is a linear model?

Sigmoid and `logit` functions

Logistic regression is a linear model

Introduction

Data Exploration and Cleaning

(Challenge) Exploring Remaining Financial Features in Dataset

Introduction to scikit-learn and Model Evaluation

(Challenge) Logistic Regression and Precision-Recall Curve

Details of Logistic Regression and Feature Extraction

(Challenge) Logistic Regression Model and Coefficients

The Bias-Variance Trade-Off

(Challenge) Cross-Validation and Feature Engineering

Decision Trees and Random Forests

(Challenge) Cross-Validation Grid Search with Random Forest

Gradient Boosting, XGBoost, and SHAP Values

(Challenge) XGBoost and SHAP Explanation for Case Study Data

Test Set Analysis, Financial Insights, and Delivery to the Client

(Challenge) Deriving Financial Insights

Appendix

Why Is Logistic Regression Considered a Linear Model?

Logistic regression as a linear model

What is a linear model?

Sigmoid and logit functions

Logistic regression is a linear model

Sigmoid and `logit` functions