Read CSV Data into Pandas DataFrames
Let's find out how to read CSV data using pandas.
We'll cover the following
Try it yourself
Try executing the code below to see the result.
from io import StringIOimport pandas as pdcsv_data = '''\day,hits2020-01-01,4002020-02-02,8002020-02-03,600'''df = pd.read_csv(StringIO(csv_data))print(df['day'].dt.month.unique())
Explanation
The comma-separated values (CSV) format does not have a schema. Everything we read from it is a string. Pandas does a great job of guessing the data types inside the CSV, but sometimes it needs help.
We can use .dtypes
to find out what types a DataFrame
has, like this:
In [3]: df.dtypes
Out[3]:
day object
hits int64
dtype: object
The object
dtype
usually means a str
(Python string). The read_csv
function has many parameters, including parse_dates
.
The parse_dates
use the dateutil
parser to handle various formats. But, it also needs help sometimes. For example, is 1/5/2020 January 5 (US format) or May 1 (EU format)?
We can use the day_first
parameter to read_csv
, or better, pick a time format
that is unambiguous like RFC 3339 (that is, 2020-01-05T10:20:30).
Solution
from io import StringIOimport pandas as pdcsv_data = '''\day,hits2020-01-01,4002020-02-02,8002020-02-03,600'''df = pd.read_csv(StringIO(csv_data), parse_dates=['day'])print(df['day'].dt.month.unique())
Get hands-on with 1300+ tech skills courses.