Exercise: Visualizing the Feature and Response Variable Relationship

Learn how to visualize the relationship between the features and response variable.

We'll cover the following...

Relationship between features and response variable
Try it yourself

Relationship between features and response variable

In this exercise, you will further your knowledge of plotting functions from Matplotlib that you used earlier in this course. You’ll learn how to customize graphics to better answer specific questions with the data. As you pursue these analyses, you will create insightful visualizations of how the PAY_1 and LIMIT_BAL features relate to the response variable, which may possibly provide support for the hypotheses you formed about these features. This will be done by becoming more familiar with the Matplotlib Application Programming Interface (API), in other words, the syntax you use to interact with Matplotlib. Perform the following steps to complete the exercise:

Calculate a baseline for the response variable of the default rate across the whole dataset using pandas’ mean():
```
overall_default_rate = df['default payment next month'].mean()
overall_default_rate 
```
The output of this should be the following:
```
# 0.2217971797179718 
```
What would be a good way to visualize default rates for different values of the PAY_1 feature?

Recall our observation that this feature is sort of like a hybrid categorical and numerical feature. We’ll choose to plot it in a way that is typical for categorical features, due to the relatively small number of unique values. In the chapter “Data Exploration and Cleaning,” we did value_counts of this feature as part of data exploration, then later we learned about groupby/mean when looking at the EDUCATION feature. groupby/mean would be a good way to visualize the default rate again here, for different payment statuses.

Use this code to create a groupby/mean aggregation:

group_by_pay_mean_y = df.groupby('PAY_1').agg( {'default payment next month':np.mean}) 
group_by_pay_mean_y

The output should look as follows: ...

Introduction

Data Exploration and Cleaning

(Challenge) Exploring Remaining Financial Features in Dataset

Introduction to scikit-learn and Model Evaluation

Fake News Detection Using Scikit-learn

(Challenge) Logistic Regression and Precision-Recall Curve

Details of Logistic Regression and Feature Extraction

(Challenge) Logistic Regression Model and Coefficients

The Bias-Variance Trade-Off

(Challenge) Cross-Validation and Feature Engineering

Decision Trees and Random Forests

(Challenge) Cross-Validation Grid Search with Random Forest

Gradient Boosting, XGBoost, and SHAP Values

(Challenge) XGBoost and SHAP Explanation for Case Study Data

Predict Frog Toxicity with Python and XGBoost

Test Set Analysis, Financial Insights, and Delivery to the Client

(Challenge) Deriving Financial Insights

Appendix

Exercise: Visualizing the Feature and Response Variable Relationship

Relationship between features and response variable