The sum() Method with Pandas Series

Let's find out how to use the floordiv operator on pandas series elements.

We'll cover the following

Try it yourself

Try executing the code below to see the result.

Press + to interact
import pandas as pd
v1 = pd.Series([0, 2, 4])
v2 = pd.Series([0, 1, 2])
out = v1 // v2
print(out.sum())

Explanation

There are a few things going on in this teaser. The first has to do with the // operator in out = v1 // v2. This is the floordivReturns an integral part of the quotient. operator in Python. Unlike the regular division, it returns an integer.

In [1]: 7/2
Out[1]: 3.5
In [2]: 7//2
Out[2]: 3

The // operator is useful when we want to calculate indices (for example, in a binary search).

The next odd thing is that we managed to divide by 00. If we try to divide by 00 in the Python shell, it’ll fail.

In [3]: 1/0
...
ZeroDivisionError: division by zero

Pandas, and the underlying NumPy array, uses different numbers than Python. That’s because Python numbers are Python objects which use much more space than machine numbers. Python numbers can grow as much as we want, but pandas/NumPy numbers are limited to their size in bits.

In [4]: 2<<100
Out[4]: 2535301200456458802993406410752
In [4]: np.int64(2)<<100
Out[4]: 0
<< is the left shift operator.

Below, we can see that the type of v1 and v2 is int64:

In [5]: v1.dtype
Out[5]: dtype('int64')

This gives us a clue as to why the division by 0 worked.

In [6]: np.int64(0)/np.int64(0)
<ipython-input-62-76db10acbf60>:1: RuntimeWarning: invalid value encountered
in long_scalars np.int64(0)/np.int64(0)
Out[6]: nan

There is a warning, but we get a nan. The output nan is a special float value meaning “not a number”. It’s usually used to indicate missing values. Since integers don’t have a special empty value, Pandas changed the dtype of out to float64.

In [7]: out.dtype
Out[7]: dtype('float64')

⚠️ Hint: Look out for bugs! This dtype change can lead to some interesting bugs that we need to watch out for.

In newer versions of pandas, there’s a new IntegerArray type that can have missing values. Pandas has several more missing types. For example, there’s NaT for missing time. We can use the pandas.isnull function to check for missing values.

The last thing we should keep in mind is how summing up a series with nan values works. If we’re coming from NumPy, we would expect nan as a result, like this.

In [8]: out.values
Out[8]: array([nan, 2., 2.])
In [9]: out.values.sum()
Out[9]: nan

In NumPy, we need to use nansum to ignore nan values. Here’s how that looks:

In [10]: np.nansum(out.values)
Out[10]: 4.0

Pandas takes a different approach. It sees nan more as a missing value than its literal meaning of “not a number” and ignores it in most operations.

In [11]: out.sum()
Out[11]: 4.0

Get hands-on with 1300+ tech skills courses.