Many Categories SSE

Build knowledge of how CART regression trees find the most optimal splits for categorical data.

Splitting binary categorical features

The simplest case for categorical features is where there are only two possible values (i.e., the feature is binary). In the case of binary categorical features, the CART regression tree algorithm calculates the SSE for the feature by choosing one of the categories, splitting the data, calculating the SSE for the left-hand and right-hand data, and then adding the two SSEs.

Splitting many category features

When a categorical feature has three or more categories (i.e., levels), the CART regression tree algorithm optimizes to evaluate a minimum number of potential splits. ...

Ask