Merging Data

Merge the main training dataset with its corresponding feature data.

We'll cover the following

Chapter Goals:

  • Create the final dataset by merging the training and features DataFrames

A. The final dataset

For the same organizational reasons we had in merging the features and stores DataFrames, we’ll now merge the training and combined features DataFrames.

Remember that the stores DataFrame contains potentially useful features listed weekly by store, and the stores DataFrame contains the type and size of each store.

Press + to interact
train_df = pd.read_csv('weekly_sales.csv')
print(train_df.columns.tolist())
# Merged and imputed stores + features
print(merged_features.columns.tolist())

The code above shows that the two DataFrames share the features 'Store', 'Date', and 'IsHoliday'. Therefore, we merge the DataFrames on these three features.

While the 'Date' feature is useful in the sense that it allows us to identify important values for a given week, like unemployment rate or CPI, it’s not used directly in training a machine learning model. Therefore, we drop it from the final dataset.

Press + to interact
features = ['Store', 'Date', 'IsHoliday']
final_dataset = train_df.merge(merged_features, on=features)
final_dataset = final_dataset.drop(columns=['Date'])
print(final_dataset.columns.tolist())

Get hands-on with 1300+ tech skills courses.