Data Cleaning

Learn how to perform data cleaning in pandas.

Dropping duplicates

Many datasets have duplicate entries. The drop_duplicates method will remove values that appear more than once. We can determine whether to keep the first or last duplicate value found using the keep parameter. If we set it to 'last', it will use the last value. The default value is 'first'. If we set it to False, it will remove any duplicated values (including the initial value). Notice that this call keeps the original index. Let’s see if there are any duplicates in our dataset.

Get hands-on with 1200+ tech skills courses.