Merging—Scenarios
Understand and apply various pandas techniques for different data scenarios.
Handling different JOIN scenarios
Beyond the JOIN operations explained in the previous chapter, there are numerous variations to how the merge() function can be used, depending on the user requirements and the structure of pandas objects.
Key column on index
Occasionally, we want to join on the index instead of a column. Let’s say our df_agents DataFrame now has the agent_id column inside its index instead of existing as a separate column:
Here is the code that demonstrates how the index of df_agents is modified:
# Set indexdf_agents.set_index('agent_id', inplace=True)# Show rows in agents dataset (after setting index)print(df_agents)
At this point, instead of joining the DataFrames based on the common column agent_id, we need to join the key column agent_id of df_transactions with the index of df_agents. This can be done by leveraging the right_index parameter (or left_index if df_agents is set as the left table) of the merge() function.
# Perform left join of df_transactions with df_agentsdf_output = pd.merge(df_transactions, df_agents,left_on='agent_id', # Merge key is in agent_id column of left tableright_index=True, # Merge key is in index of right tablehow='left')print(df_output)
The output shows that the LEFT JOIN was performed successfully based on the specified merge keys. ...