Introduction: Logistic Regression and Feature Extraction

Get introduced to our topics for this chapter: feature evaluation and logistic regression as a linear model for assessing feature usefulness in modeling.

Overview

This chapter teaches you how to evaluate features quickly and efficiently, in order to know which ones will probably be most important for a machine learning model. Once we get a taste for this, we’ll explore the inner workings of logistic regression so you can continue your journey to mastery of this fundamental technique. After reading this chapter, you will be able to make a correlation plot of many features and a response variable and interpret logistic regression as a linear model.

In the previous chapter, we developed a few example machine learning models using scikit-learn, to get familiar with how it works. However, the features we used, EDUCATION and LIMIT_BAL, were not chosen in a systematic way.

Assessing feature importance

In this chapter, we will start to develop techniques that can be used to assess features for their usefulness in modeling. This will enable you to make a quick pass over all candidate features, to have an idea of which will be the most important. For the most promising features, we will see how to create visual summaries that serve as useful communication tools.

Examining logistic regression

Next, we will begin our detailed examination of logistic regression. We’ll learn why logistic regression is considered to be a linear model, even if the formulation involves some non-linear functions. We’ll learn what a decision boundary is and see that as a key consequence of its linearity, the decision boundary of logistic regression could make it difficult to accurately classify the response variable. Along the way, we’ll get more familiar with Python, by using list comprehensions and writing functions.

Get hands-on with 1200+ tech skills courses.