Primary Key Clustering Columns

Delve into the clustering columns component of Cassandra's primary key, its impact on on-disk data storage, and its usage in defining sort order and performing range queries.

As covered earlier, a table’s primary key in Cassandra comprises one or more partition keys and zero or more clustering columns. The partition key(s) always appear first in the primary key, followed by any clustering columns.

Clustering column(s) establish the uniqueness of the record and define the sorting order of rows inside a partition. An Apache Cassandra table may have zero or more clustering columns listed after the partition key in the PRIMARY KEY clause. The partition key and clustering columns together form a table’s primary key. In other words, clustering columns are the other half of the primary key, listed after the partition key. 

In the absence of clustering column(s), each partition has only one row, as demonstrated with the courses table defined earlier, in the section Simple partition key.

If the primary key consists of multiple columns, and the partition key is enclosed in parenthesis, all columns listed after the parenthesis are considered clustering columns. In the absence of parentheses enclosing the partition key, only the first column is considered the partition key, and all columns listed after it are assumed to be clustering columns. The following table illustrates the partition key and clustering column(s) for various primary key statements.

Get hands-on with 1300+ tech skills courses.