Create a Pandas Timestamp and date_range Object
Let's find out how to use a pandas Timestamp and date_range.
We'll cover the following
Try it yourself
Try executing the code below to see the result.
import pandas as pdstart = pd.Timestamp.fromtimestamp(0).strftime('%Y-%m-%d')times = pd.date_range(start=start, freq='M', periods=2)print(times)
Explanation
There are couple of puzzling things here:
M
is a month frequency.- The first date is January 31st and not January 1st.
Let’s start with M
being a month frequency. You’ve probably used the infamous strftime
or its cousin strptime
to convert datetime
to or from strings. In those cases, M
stands for minute:
In [1]: t = pd.Timestamp(2020, 5, 10, 14, 21, 30)
In [2]: t.strftime('%H:%M')
Out[2]: '14:21'
One of the things most programmers like about pandas is that it’s one of the best-documented open-source packages available. But, pandas is a big library, and sometimes it’s hard to find what we’re looking for.
If we look at the pandas.date_range
documentation, we’ll see the following:
"freq: str or DateOffset, default ‘D’
Frequency strings can have multiples, e.g., ‘5H’."
The M
in date_range
stands for month-end frequency whereas the minute frequency is T
or min
.
This solves one puzzle and also gives us a hint as to why we see January 31st and not January 1st. In the puxxle Y3K, we saw that time 0, or epoch time, is counted from January 1, 1970.
In [3]: pd.Timestamp(0)
Out[3]: Timestamp('1970-01-01 00:00:00')
If we follow the code of pandas.date_range
, we’ll see it converts the freq
from str
to a pandas.DateOffset
. Then, the date_range
will use pandas.DataOffset.apply
on start. From
there it’ll add the offset for period
times.
Here’s how this is done:
In [4]: from pandas.tseries.frequencies import to_offset
In [5]: start = pd.Timestamp(0)
In [6]: offset = to_offset('M')
In [7]: offset
Out[7]: <MonthEnd>
In [8]: t0 = offset.apply(start)
In [9]: t0
Out[9]: Timestamp('1970-01-31 00:00:00')
In [10]: t0 + offset
Out[10]: Timestamp('1970-02-28 00:00:00')
This is what we see in this teaser’s output. Note that frequencies don’t have to be whole units. The following will give us a date range in five-minute intervals.
In [11]: pd.date_range(start=pd.Timestamp(0), periods=3, freq='5T')
Out[21]:
DatetimeIndex(['1970-01-01 00:00:00', '1970-01-01 00:05:00',
'1970-01-01 00:10:00'],
dtype='datetime64[ns]', freq='5T')
Get hands-on with 1300+ tech skills courses.