Amazon DocumentDB

Learn about the Amazon DocumentDB (with MongoDB Compatible), the uses of its global cluster, high availability, backup, and restore.

Amazon DocumentDB (with MongoDB compatibility) is a fast, scalable, highly available, and fully-managed non-relational document database that handles database management tasks like configuring high availability, durability, software patching, scaling, and backing up.

It’s specially designed for performance gain when working with document workloads. Because of its compatibility with MongoDB, it uses the same commands and tools to insert and query documents.

Press + to interact

Components of DocumentDB

Amazon DocumentDB mainly comprises Clusters, Instances, and cluster volume. 

DocumentDB cluster

A DB cluster is a collection of computational resources (instances) and cluster volume to store data. DocumentDB has two types of clusters: instance-based and elastic clusters. We are responsible for configuration and resource management for an instance-based cluster, while in the elastic cluster, AWS manages the resources automatically.

A cluster can have a minimum of 0 and a maximum of 16 instances. Out of 16, one is a primary instance, and the remaining are secondary. The primary instance can entertain read and write requests, but the secondary instances are read replicas.

Instance

An instance is a computation resource with the computation power and memory according to its instance class. The instance class and type should be selected according to the application’s requirements. Provisioning a DB instance with extra resources can cost us dearly. We can change the instance type and class of the DB instance at any time. 

Note: DocumentDB cluster is a part of VPC and it can be accessed through EC2 instances or other AWS services in the same VPC.

Volume

Volume is an essential part of a cluster and DocumentDB automatically scales out it as the application grows. DocumentDB cluster volume can expand maximum of 128 TiB. All instances in a cluster share the same volume, which helps reduce compute scaling time. Because of the shared volume, whenever a new read replica is added, it doesn’t need to copy previous data and is ready to use within approximately 8–10 minutes. DocumentDB makes six copies of the cluster’s volume across three Availability Zones.

Press + to interact
High level diagram of DocumentDB cluster
High level diagram of DocumentDB cluster

Global cluster

Amazon DocumentDB also supports a global cluster consisting of one primary region and can have up to 5 secondary read-only clusters in different regions. The primary cluster entertains all write requests, and all the new data gets replicated to the secondary clusters automatically, which makes the read latency under seconds.

Uses of global cluster 

The global cluster comes with the following benefits:

  • Disaster recovery: In the unfortunate event of a region outage, any secondary cluster can be promoted to the primary cluster and this failover happens within minutes. RTO is under 1 minute, and RPO is typically in seconds, but it can vary depending on the network lag.

  • Data locality: All the information is available worldwide because of its presence in different AWS Regions. The read requests are entertained using the nearest region’s replica with the minimum possible latency.

  • Scalable secondary cluster: We can scale secondary clusters and add up to 16 read replicas to meet read-extensive requirements.

Note: To create a global cluster, there should be one primary region and at least one secondary region. In case of regional failover, we need to manually promote secondary cluster to the role of primary.

Press + to interact
High level diagram of global cluster
High level diagram of global cluster

High availability and fault tolerance

Amazon DocumentDB is a highly available database by design. We can configure high availability by adding replica instances in different AZs. Because the whole cluster shares the same storage volume, all the reader replicas can be used as soon as they become available. As mentioned earlier, the storage volume is replicated six-way to three availability zones, so whenever the primary instance performs a write operation, the data is available to the secondary instances for read operations within 100 milliseconds across multiple AZs. 

These replicas can be configured as a failover target. When a failure occurs in a primary instance, one replica is promoted to the primary instance, and the failover process usually takes 30 seconds to complete. 

Press + to interact
High level view of failover process
High level view of failover process

Backup and restore

There are two types of backups in DocumentDB: Continuous and Snapshots.

  • Continuous backups are taken automatically by the service throughout the day and are available for the retention period. The default retention period for continuous backups is 1 day, but can be extended to 35 days. We can do point-in-time recovery on the continuous backup with 5 minutes of RPO.

  • Snapshots are full and permanent backups that can be automated or manual. If we want to keep our backup beyond 35 days, we should take a snapshot of our database. 

Note: We can’t stop backups in DocumentDB, which are stored in AWS S3.

Get hands-on with 1300+ tech skills courses.