The sum() Method with Pandas Series
Let's find out how to use the floordiv operator on pandas series elements.
We'll cover the following
Try it yourself
Try executing the code below to see the result.
import pandas as pdv1 = pd.Series([0, 2, 4])v2 = pd.Series([0, 1, 2])out = v1 // v2print(out.sum())
Explanation
There are a few things going on in this teaser. The first has to do with the //
operator in
out = v1 // v2
. This is the
In [1]: 7/2
Out[1]: 3.5
In [2]: 7//2
Out[2]: 3
The //
operator is useful when we want to calculate indices (for example, in a binary search).
The next odd thing is that we managed to divide by . If we try to divide by in the Python shell, it’ll fail.
In [3]: 1/0
...
ZeroDivisionError: division by zero
Pandas, and the underlying NumPy array, uses different numbers than Python. That’s because Python numbers are Python objects which use much more space than machine numbers. Python numbers can grow as much as we want, but pandas/NumPy numbers are limited to their size in bits.
In [4]: 2<<100
Out[4]: 2535301200456458802993406410752
In [4]: np.int64(2)<<100
Out[4]: 0
<< is the left shift operator.
Below, we can see that the type of v1
and v2
is int64
:
In [5]: v1.dtype
Out[5]: dtype('int64')
This gives us a clue as to why the division by 0 worked.
In [6]: np.int64(0)/np.int64(0)
<ipython-input-62-76db10acbf60>:1: RuntimeWarning: invalid value encountered
in long_scalars np.int64(0)/np.int64(0)
Out[6]: nan
There is a warning, but we get a nan
. The output nan
is a special float value meaning “not a number”. It’s usually used to indicate missing values. Since integers don’t have a special empty value, Pandas changed the dtype
of out
to float64
.
In [7]: out.dtype
Out[7]: dtype('float64')
⚠️ Hint: Look out for bugs! This
dtype
change can lead to some interesting bugs that we need to watch out for.
In newer versions of pandas, there’s a new IntegerArray
type that can have missing values. Pandas has several more missing types. For example, there’s NaT
for missing time. We can use the pandas.isnull
function to check for missing values.
The last thing we should keep in mind is how summing up a series with nan
values works. If we’re
coming from NumPy, we would expect nan
as a result, like this.
In [8]: out.values
Out[8]: array([nan, 2., 2.])
In [9]: out.values.sum()
Out[9]: nan
In NumPy, we need to use nansum
to ignore nan
values. Here’s how that looks:
In [10]: np.nansum(out.values)
Out[10]: 4.0
Pandas takes a different approach. It sees nan
more as a missing value than
its literal meaning of “not a number” and ignores it in most operations.
In [11]: out.sum()
Out[11]: 4.0
Get hands-on with 1300+ tech skills courses.