Exercise: Implementing OHE for a Categorical Feature
Learn to implement one-hot encoding for categorical features.
We'll cover the following
Using pandas to create one-hot encoding
In this exercise, we will “reverse engineer” the EDUCATION
feature in the dataset to obtain the text labels that represent the different education levels, then show how to use pandas to create an OHE. As a preliminary step, please set up the environment and load progress from previous exercises:
import pandas as pd
import matplotlib as mpl #additional plotting functionality
mpl.rcParams['figure.dpi'] = 400 #high resolution figures
df_clean_2 = pd.read_csv('df_clean_2_01.csv')
First, let’s consider our EDUCATION
feature before it was encoded as an ordinal. From the data dictionary, we know that 1 = graduate school, 2 = university, 3 = high school, 4 = others. We would like to recreate a column that has these strings, instead of numbers. Perform the following steps in the Jupyter notebook at the end of the lesson to complete the exercise.
-
Create an empty column for the categorical labels called
EDUCATION_CAT
. Using the following command, every row will contain the string'none'
:df_clean_2['EDUCATION_CAT'] = 'none'
-
Examine the first few rows of the DataFrame for the
EDUCATION
andEDUCATION_CAT
columns:df_clean_2[['EDUCATION', 'EDUCATION_CAT']].head(10)
The output should appear as follows:
Get hands-on with 1200+ tech skills courses.