Introducing Altair
Get introduced to the basic concepts of marks and encodings in Altair.
Vega-Altair, or Altair for short, is a declarative Python library for statistical visualization. It relies on the Vega and Vega-Lite visualization grammars, which describe the visual appearance and interactivity of visualizations in JSON.
Declarative vs. imperative libraries
There are two types of visualization libraries:
Imperative libraries focus on how to build a visualization, such as manually specifying the steps to build the visualization (axis, size, legend, and labels). Matplotlib is an example of an imperative library.
Declarative libraries focus on what we want to see. We specify the data and type of visualization we want to see. The library will do the manipulations to create the visualization for us automatically. Altair is an example of a declarative library.
The following table shows the differences between imperative and declarative libraries.
Imperative vs. Declarative Libraries
Imperative Library | Declarative Library |
Specifies explicit instructions to build the visualization | Describes the output |
Only provides the tools, we perform steps manually | Performs everything automatically |
Main elements of an Altair chart
Every Altair chart comprises three main elements: the data, the mark, and the encoding.
The Chart
object
A Chart
object is the entry point element in Altair. Every Altair chart receives as input a single argument, that is, the dataset.
To start creating a chart, import the Altair library and then create the chart.
import altair as altalt.Chart(dataset)
The dataset can be in one of the following formats:
pandas DataFrame
Data or related object (i.e.,
UrlData
,InlineData
,NamedData
)URL pointing to a JSON or CSV file
Object supporting the
geo_interface
(e.g., GeoPandas, GeoDataFrame, and so on)
The mark property
A mark is a Chart
property that defines how to represent data. Examples of marks include bar charts, line charts, area charts, and many more. To specify a mark, append it to the Altair Chart
.
import altair as altalt.Chart(dataset).mark_bar()
The example tells Altair to draw a bar chart. In general, the name of each mark is mark_<type_of_graph>()
. The following table shows the most famous mark charts in Altair:
Common Marks in Altair
Name | Description |
| A bar chart |
| A line chart |
| A scatter plot with configurable point shapes |
| A scatter plot with filled circles |
Encodings
Encodings specify where to represent data, including its position, size, color, and more. To define an encoding, we append the encode()
property to the Chart
.
import altair as altalt.Chart(dataset).mark_bar().encode()
Example
Consider the Christmas trees dataset.
Christmas Tree Dataset
Year | RealTree | FakeTree |
2004 | 27100000 | 9000000 |
... | ... | ... |
2016 | 27400000 | 18600000 |
We load it as a pandas
DataFrame and then draw a simple line chart in altair
of the RealTree
column versus the Year
column.
import altair as altimport pandas as pdimport osdf = pd.read_csv('/data/christmas_trees.csv')chart = alt.Chart(df).mark_line().encode(x = 'Year:O', # O for ordinal datay = 'RealTree:Q' # Q for quantitative data)chart.save('chart.html')os.system('cat chart.html')
The example uses the mark_line()
property to draw a line and specifies the x
and y
axes in the encode()
property. For each column, we must also select the type (O
for Year
and Q
for RealTree
). We use the last two statements of the code to render the chart.
Click the “Run” button to see the produced chart.
Let’s practice!
Now, we’ll play with the previous snippet of code:
We’ll change the mark property to
mark_bar()
ormark_point()
.We’ll represent the
FakeTree
column in the y-axis instead of theRealTree
one.
Drawing multiple lines
The previous chart draws only a single line, representing a single dataset column. There are two strategies to show multiple lines.
The first strategy uses Altair’s concept of layer, which overlaps different graphs. We’ll build a base chart with the basic encoding for the x-axis, and then use the layer()
property to draw two separate lines.
base = alt.Chart(df).encode(x='Year:O')chart = alt.layer(base.mark_line(color='blue').encode(y='FakeTree:Q'),base.mark_line(color='red').encode(y='RealTree:Q'))
The second strategy transforms the original pandas DataFrame through the melt()
method from wide to long and then plots the melted DataFrame in Altair.
The following figure shows how melt('Year')
works.
The melt()
method receives the column to not compact (Year
) as input and combines the remaining columns into a single column named value
.
import altair as altimport pandas as pdimport osdf = pd.read_csv('/data/christmas_trees.csv')data = df.melt('Year')chart = alt.Chart(data).mark_line().encode(x='Year',y='value',color='variable')chart.save('chart.html')os.system('cat chart.html')
This example adds a new attribute to encoding, called color
, which specifies how to render a color. The example uses different colors depending on the value of the variable
column.