Exercise: Implementing OHE for a Categorical Feature

Learn to implement one-hot encoding for categorical features.

We'll cover the following...

Using pandas to create one-hot encoding
Try it yourself

Using pandas to create one-hot encoding

In this exercise, we will “reverse engineer” the EDUCATION feature in the dataset to obtain the text labels that represent the different education levels, then show how to use pandas to create an OHE. As a preliminary step, please set up the environment and load progress from previous exercises:

import pandas as pd
import matplotlib as mpl #additional plotting functionality
mpl.rcParams['figure.dpi'] = 400 #high resolution figures
df_clean_2 = pd.read_csv('df_clean_2_01.csv')

First, let’s consider our EDUCATION feature before it was encoded as an ordinal. From the data dictionary, we know that 1 = graduate school, 2 = university, 3 = high school, 4 = others. We would like to recreate a column that has these strings, instead of numbers. Perform the following steps in the Jupyter notebook at the end of the lesson to complete the exercise.

Create an empty column for the categorical labels called EDUCATION_CAT. Using the following command, every row will contain the string 'none':
```
df_clean_2['EDUCATION_CAT'] =
```

...

Ask

Introduction

Data Exploration and Cleaning

(Challenge) Exploring Remaining Financial Features in Dataset

Introduction to scikit-learn and Model Evaluation

Fake News Detection Using Scikit-learn

(Challenge) Logistic Regression and Precision-Recall Curve

Details of Logistic Regression and Feature Extraction

(Challenge) Logistic Regression Model and Coefficients

The Bias-Variance Trade-Off

(Challenge) Cross-Validation and Feature Engineering

Decision Trees and Random Forests

(Challenge) Cross-Validation Grid Search with Random Forest

Gradient Boosting, XGBoost, and SHAP Values

(Challenge) XGBoost and SHAP Explanation for Case Study Data

Predict Frog Toxicity with Python and XGBoost

Test Set Analysis, Financial Insights, and Delivery to the Client

(Challenge) Deriving Financial Insights

Appendix

Exercise: Implementing OHE for a Categorical Feature

Using pandas to create one-hot encoding