What's New in pandas 2.0

Discover the new features and capabilities in the new pandas 2.0 version.

We'll cover the following

New features in pandas 2.0

New features in `pandas` 2.0

The pandas library version 2.0 was launched in April 2023 amidst plenty of fanfare and excitement after three years of development. Given the popularity of the library, the upgrade from pandas 1.0 to 2.0 comprises numerous key changes that greatly impact many users. Let’s take a look at some of the key new features introduced in pandas 2.0, which is the version we use in this course.

Improved performance and memory efficiency

The pandas 2.0 update introduced PyArrow (a Python library built on top of Arrow) as the backing memory format for DataFrames, which used to be based on inefficient NumPy data structures. With these new Arrow extension arrays and memory structures as the backend, there is a vast improvement in speed and memory utilization because we can leverage the C++ implementation of Arrow.

Previously, inefficient memory usage caused by the original NumPy backend was a common problem that caused many users to explore alternative tools, such as Spark, Ray, etc. With the use of PyArrow as the backend, users can now work with pandas more efficiently and enjoy faster operations from the columnar in-memory data representation.

Support for non-nanosecond resolution in timestamps

A persistent problem within pandas was the exclusive usage of nanosecond resolution for timestamps. This led to the inability to represent dates prior to September 21st, 1677, or beyond April 11th, 2264, which created difficulties for researchers examining time series data across multiple millennia.

Incorporated within the version 2.0 update is enhanced support for additional resolutions, including second, millisecond, and microsecond precision.

Enhanced support for nullable dtypes

Previously, handling null values was challenging due to pandas' reliance on NumPy, which didn’t support null values for certain data types like integer dtypes. This issue led to the automatic conversion of integer columns to float dtype when a null value was introduced, potentially leading to a loss of precision.

The pandas 2.0 update has significantly improved the handling of nullable data types, allowing a unique null value to be assigned variables instead of typical values for specific data types.

This enhancement is facilitated by the inclusion of a new parameter, dtype_backend, which returns a DataFrame with nullable data types when set to numpy_nullable for most I/O functions, as shown in the example for CSV files below:

Get hands-on with 1400+ tech skills courses.

Before We Begin

Reading Data into pandas

Combining Data

Reshaping and Manipulating Data

Encoding Data Types

Handling Numerical Data

Handling Categorical Data

Handling Text Data

Handling Time Series Data

Handling Sparse Data Structures

Handling Missing Data

Data Analysis and Visualization with sidetable and Bokeh

Leveraging Further Features of pandas

Utilizing Extended Libraries

Wrap Up

Appendix

Time Series Analysis and Visualization Using Python and Plotly

What's New in pandas 2.0

New features in `pandas` 2.0

Improved performance and memory efficiency

Support for non-nanosecond resolution in timestamps

Enhanced support for nullable dtypes

Before We Begin

Reading Data into pandas

Combining Data

Reshaping and Manipulating Data

Encoding Data Types

Handling Numerical Data

Handling Categorical Data

Handling Text Data

Handling Time Series Data

Handling Sparse Data Structures

Handling Missing Data

Data Analysis and Visualization with sidetable and Bokeh

Leveraging Further Features of pandas

Utilizing Extended Libraries

Wrap Up

Appendix

Time Series Analysis and Visualization Using Python and Plotly

What's New in pandas 2.0

New features in pandas 2.0

Improved performance and memory efficiency

Support for non-nanosecond resolution in timestamps

Enhanced support for nullable dtypes

New features in `pandas` 2.0