Cassandra's Data Model

This lesson explains Cassandra's design goals and its data model.

We'll cover the following

Design goals of Cassandra
Data model

Table
Schema
Primary key

Partition key component
Clustering columns component

Distributing the partitions of the table over the available nodes
Replicating partitions across the nodes
Storage engines for nodes

Cassandra is a distributed datastore that combines ideas from the DynamoG. DeCandia et al., “Dynamo: Amazon’s Highly Available Key-value Store,” in Proceedings of twenty-first ACM SIGOPS symposium on Operating systems principles, 2007. and the BigtableF. Chang et al., “Bigtable: A Distributed Storage System for Structured Data,” in Proceedings of 7th {USENIX} Symposium on Operating Systems Design and Implementation (OSDI), 2006. paper.

Note: Besides Dynamo there is also a separate distributed system, called DynamoDB. This is commercially available, but details around its internal architecture have not been shared publicly yet. However, this system has a lot of similarities with Cassandra, such as the data model and tunable consistency.

CassandraA. Lakshman and P. Malik, “Cassandra — A Decentralized Structured Storage System,” Operating Systems Review, 2010. was originally developed by Facebook, but it was then open-sourced and became an Apache project.During this period, it has evolved significantly from its original implementation.

Note: The information in this chapter refers to the state of this project at the time of writing this course.

Design goals of Cassandra

The main design goals of Cassandra are:

Extremely high availability
Performance (high throughput/low latency with emphasis on write-heavy workloads) with unbounded, incremental scalability

Note: In order to achieve these goals Cassandra trades off some other properties, such as strong consistency.

Get hands-on with 1400+ tech skills courses.

Before Getting Started

Introduction to Distributed Systems

Basic Concepts and Theorems

Distributed Transactions

Achieving Isolation

Achieving Atomicity

Concluding Distributed Transactions

Consensus

Time

Order

Networking

Security

Security Protocols

From Theory to Practice

Case Study 1: Distributed File Systems

Case Study 2: Distributed Coordination Service

Case Study 3: Distributed Data Stores

Case Study 4: Distributed Messaging System

Case Study 5: Distributed Cluster Management

Case Study 6: Distributed Ledger

Case Study 7: Distributed Data Processing Systems

Practices & Patterns

Communication Patterns

Coordination Patterns

Data Synchronization

Shared-nothing Architectures

Distributed Locking

Compatibility Patterns

Dealing with Failure

Distributed Tracing

Concluding this Course

Cassandra's Data Model

Design goals of Cassandra