Deep Dive: Categorical Features
Learn about the transformation of categorical features and their significance.
We'll cover the following...
Understanding categorical features
Machine learning algorithms only work with numbers. If your data contains text features, for example, these would require transformation to numbers in some way. We learned in the previous lesson that the data for our case study is, in fact, entirely numerical. However, it’s worth thinking about how it got to be that way. In particular, consider the EDUCATION feature.
This is an example of what is called a categorical feature: you can imagine that as raw data, this column consisted of the text labels “graduate school,” “university,” “high school,” and “others.” These are called the levels of the categorical feature; here, there are four levels. It is only through a mapping, which has already been chosen for us, that this data exists as the numbers 1, 2, 3, and 4 in our dataset. This particular assignment of categories to numbers creates what is known as an ordinal feature, because the levels are mapped to numbers in order. As a data scientist, at a minimum, you need ...