Multiplying Values of Pandas Series
Let's find out how a pandas series works with the all method and equality operator.
We'll cover the following
Try it yourself
Try executing the code below to see the result.
import pandas as pdv = pd.Series([.1, 1., 1.1])out = v * vexpected = pd.Series([.01, 1., 1.21])if (out == expected).all():print('Math rocks!')else:print('Please reinstall universe & reboot.')
Explanation
The out == expected
command returns a Boolean pandas.Series
. The all
method returns True
if all elements are True
.
When we look at out
and expected
, they seem the same.
In [1]: out
Out[1]:
0 0.01
1 1.00
2 1.21
dtype: float64
In [2]: expected
Out[2]:
0 0.01
1 1.00
2 1.21
dtype: float64
But, when we compare them, we see something strange.
In [2]: out == expected
Out[2]:
0 False
1 True
2 False
dtype: bool
In both out
and expected
, only the middle value 1.00
is equal.
Looking deeper, we can see the problem.
In [3]: print(out[2])
1.2100000000000002
There’s a difference between how pandas shows the value and how
print
does.
💡 String representation
Always remember that the string representation of an object is not the object itself. The Treachery of Images painting illustrates this concept beautifully.
Upon seeing such issues, some new developers come to the message boards and say, “We found a bug in pandas!” The usual answer given by programming veterans is, “Read the manual."
What to do about floating-point issues?
As Grant Edwards once said, “The floating-point is sort of like quantum physics: the closer we look, the messier it gets.”
The basic idea behind this issue is that floating-point numbers sacrifice accuracy for speed. But, it’s a trade-off that we often do a lot in computer science.
The result we see conforms with the floating-point specification. If we run it, we’ll see the same output with the same code in Go, Rust, C, Java, and so on.
The main point we need to remember is that they are not accurate, and as the number increases, accuracy gets even worse.
Floating-point issues arise quite often, so we’ll probably need to compare a pandas.Series
or pandas.DataFrame
at some point. Please keep in mind that everything won’t exactly equal. Instead, we have the option of coming up with an acceptable threshold and using the numpy.allclose
function.
In [4]: import numpy as np
In [5]: np.allclose(out, expected)
Out[5]: True
The numpy.allclose
function has many options we can tweak.
Solution
import numpy as npimport pandas as pdv = pd.Series([.1, 1., 1.1])out = v * vexpected = pd.Series([.01, 1., 1.21])if np.allclose(out, expected):print('Math rocks!')else:print('Please reinstall universe & reboot.')
If we need better accuracy, we can look into the decimal module, which provides correctly rounded decimal floating-point arithmetic using the round()
function.
Get hands-on with 1300+ tech skills courses.