Gradient Descent to Find Optimal Parameter Values

Learn about the method of finding parameter values for a logistic regression using log-loss cost as an optimization problem.

We'll cover the following

Optimization problem for logistic regression
Understanding gradient descent
- Updating the parameters

Optimization problem for logistic regression

The problem of finding the parameter values (coefficients and intercept) for a logistic regression model using a log-loss cost boils down to a problem of optimization: we would like to find the set of parameters that results in the minimum cost, because costs are higher for worse predictions. In other words, we want the set of parameters that is the “least wrong” on average over all of the training samples. This process is done for you automatically by the fit method of the logistic regression model in scikit-learn. There are different solution techniques for finding the set of parameters with the lowest cost, and you can choose which one you would like to use with the solver keyword when you are instantiating the model class. All of these methods work somewhat differently. However, they are all based on the concept of gradient descent.

Understanding gradient descent

The gradient descent process starts with an initial guess. The choice of the initial guess is not that important for logistic regression and you don’t need to make it manually; this is handled by the solver keyword. However, for more advanced machine learning algorithms such as deep neural networks, selection of the initial guesses for parameters requires more attention.

For the sake of illustration, we will consider a problem where there is only one parameter to estimate. We’ll look at the value of a hypothetical cost function $(y = f(x) = x^2 – 2x)$ and devise a gradient descent procedure to find the value of the parameter, $x$ , for which the cost, $y$ , is the lowest. Here, we choose some $x$ values, create a function that returns the value of the cost function, and look at the value of the cost function over this range of parameters.

The code to do this is as follows:

X_poly = np.linspace(-3,5,81) print(X_poly[:5], '...', X_poly[-5:])

Here is the output of the print statement:

[-3. -2.9 -2.8 -2.7 -2.6] ... [4.6 4.7 4.8 4.9 5. ]

The remaining code snippet is as follows:

def cost_function(X): 
    return X * (X-2)
y_poly = cost_function(X_poly) 
plt.plot(X_poly, y_poly) 
plt.xlabel('Parameter value') 
plt.ylabel('Cost function') 
plt.title('Error surface')

The resulting plot should appear as follows:

Get hands-on with 1400+ tech skills courses.

Introduction

Data Exploration and Cleaning

(Challenge) Exploring Remaining Financial Features in Dataset

Introduction to scikit-learn and Model Evaluation

Fake News Detection Using Scikit-learn

(Challenge) Logistic Regression and Precision-Recall Curve

Details of Logistic Regression and Feature Extraction

(Challenge) Logistic Regression Model and Coefficients

The Bias-Variance Trade-Off

(Challenge) Cross-Validation and Feature Engineering

Decision Trees and Random Forests

(Challenge) Cross-Validation Grid Search with Random Forest

Gradient Boosting, XGBoost, and SHAP Values

(Challenge) XGBoost and SHAP Explanation for Case Study Data

Predict Frog Toxicity with Python and XGBoost

Test Set Analysis, Financial Insights, and Delivery to the Client

(Challenge) Deriving Financial Insights

Appendix

Gradient Descent to Find Optimal Parameter Values

Optimization problem for logistic regression

Understanding gradient descent