Read CSV Data into Pandas DataFrames

Let's find out how to read CSV data using pandas.

We'll cover the following

Try it yourself

Try executing the code below to see the result.

Press + to interact
from io import StringIO
import pandas as pd
csv_data = '''\
day,hits
2020-01-01,400
2020-02-02,800
2020-02-03,600
'''
df = pd.read_csv(StringIO(csv_data))
print(df['day'].dt.month.unique())

Explanation

The comma-separated values (CSV) format does not have a schema. Everything we read from it is a string. Pandas does a great job of guessing the data types inside the CSV, but sometimes it needs help.

We can use .dtypes to find out what types a DataFrame has, like this:

In [3]: df.dtypes
Out[3]:
day object
hits int64
dtype: object

The object dtype usually means a str (Python string). The read_csv function has many parameters, including parse_dates.

The parse_dates use the dateutil parser to handle various formats. But, it also needs help sometimes. For example, is 1/5/2020 January 5 (US format) or May 1 (EU format)?

We can use the day_first parameter to read_csv, or better, pick a time format that is unambiguous like RFC 3339 (that is, 2020-01-05T10:20:30).

Solution

Press + to interact
from io import StringIO
import pandas as pd
csv_data = '''\
day,hits
2020-01-01,400
2020-02-02,800
2020-02-03,600
'''
df = pd.read_csv(StringIO(csv_data), parse_dates=['day'])
print(df['day'].dt.month.unique())

Get hands-on with 1300+ tech skills courses.