Amazon DocumentDB

Learn about the Amazon DocumentDB (with MongoDB Compatible), the uses of its global cluster, high availability, backup, and restore.

We'll cover the following

Components of DocumentDB
Global cluster
- Uses of global cluster
High availability and fault tolerance
Backup and restore

Press + to interact

Components of DocumentDB

Amazon DocumentDB mainly comprises Clusters, Instances, and cluster volume.

DocumentDB cluster

A DB cluster is a collection of computational resources (instances) and cluster volume to store data. DocumentDB has two types of clusters: instance-based and elastic clusters. We are responsible for configuration and resource management for an instance-based cluster, while in the elastic cluster, AWS manages the resources automatically.

A cluster can have a minimum of 0 and a maximum of 16 instances. Out of 16, one is a primary instance, and the remaining are secondary. The primary instance can entertain read and write requests, but the secondary instances are read replicas.

Instance

An instance is a computation resource with the computation power and memory according to its instance class. The instance class and type should be selected according to the application’s requirements. Provisioning a DB instance with extra resources can cost us dearly. We can change the instance type and class of the DB instance at any time.

Note: DocumentDB cluster is a part of VPC and it can be accessed through EC2 instances or other AWS services in the same VPC.

Volume

Volume is an essential part of a cluster and DocumentDB automatically scales out it as the application grows. DocumentDB cluster volume can expand maximum of 128 TiB. All instances in a cluster share the same volume, which helps reduce compute scaling time. Because of the shared volume, whenever a new read replica is added, it doesn’t need to copy previous data and is ready to use within approximately 8–10 minutes. DocumentDB makes six copies of the cluster’s volume across three Availability Zones.

Press + to interact

Global cluster

Amazon DocumentDB also supports a global cluster consisting of one primary region and can have up to 5 secondary read-only clusters in different regions. The primary cluster entertains all write requests, and all the new data gets replicated to the secondary clusters automatically, which makes the read latency under seconds.

Uses of global cluster

The global cluster comes with the following benefits:

Disaster recovery: In the unfortunate event of a region outage, any secondary cluster can be promoted to the primary cluster and this failover happens within minutes. RTO is under 1 minute, and RPO is typically in seconds, but it can vary depending on the network lag.
Data locality: All the information is available worldwide because of its presence in different AWS Regions. The read requests are entertained using the nearest region’s replica with the minimum possible latency.
Scalable secondary cluster: We can scale secondary clusters and add up to 16 read replicas to meet read-extensive requirements.

Note: To create a global cluster, there should be one primary region and at least one secondary region. In case of regional failover, we need to manually promote secondary cluster to the role of primary.

Press + to interact

High availability and fault tolerance

Amazon DocumentDB is a highly available database by design. We can configure high availability by adding replica instances in different AZs. Because the whole cluster shares the same storage volume, all the reader replicas can be used as soon as they become available. As mentioned earlier, the storage volume is replicated six-way to three availability zones, so whenever the primary instance performs a write operation, the data is available to the secondary instances for read operations within 100 milliseconds across multiple AZs.

These replicas can be configured as a failover target. When a failure occurs in a primary instance, one replica is promoted to the primary instance, and the failover process usually takes 30 seconds to complete.

Press + to interact

Backup and restore

There are two types of backups in DocumentDB: Continuous and Snapshots.

Continuous backups are taken automatically by the service throughout the day and are available for the retention period. The default retention period for continuous backups is 1 day, but can be extended to 35 days. We can do point-in-time recovery on the continuous backup with 5 minutes of RPO.
Snapshots are full and permanent backups that can be automated or manual. If we want to keep our backup beyond 35 days, we should take a snapshot of our database.

Note: We can’t stop backups in DocumentDB, which are stored in AWS S3.

Get hands-on with 1300+ tech skills courses.

Introduction

AWS Fundamentals

Understanding Cloud Computing Essentials— From Zero to Hero

Identity and Access Management

Securing AWS Resources: Managing Access with IAM

AWS IAM Permission Boundaries

Using AWS IAM Access Analyzer

Compute Services

Understanding AWS Compute Services — From Zero to Hero

Amazon EC2: Elastic Compute Cloud

Working with Instances: An Amazon EC2 Walkthrough

Managing Instance Volumes Using EBS

Networking

Understanding Networking Services in AWS—From Zero to Hero

Controlling VPC Traffic Using Network ACLs

Managing Peer Connections between Amazon Virtual Private Clouds

Accessing AWS Services over AWS PrivateLink Using VPC Endpoints

Monitoring IP Traffic Using VPC Flow Logs

Route 53

Serverless Computing

Getting to Know AWS Lambda

Building and Deploying Serverless Applications with AWS SAM

Developing RESTful Microservices with API Gateway and DynamoDB

Building a WebSocket-Based Chat Application Using API Gateway

Mastering AWS AppSync Lambda Resolvers

Application Integration

Getting Started with Amazon Simple Queue Service (SQS)

Handling Amazon SNS Notifications with AWS Lambda

Build a Fanout Serverless Architecture using SNS, SQS, and Lambda

Decoupling Serverless Applications with Amazon EventBridge

Getting Started with AWS Step Functions

Containers

Getting Started with Amazon ECS

Create an EKS Cluster and Deploy an Application

High Availability and Scalability

Managing Application Traffic Using Elastic Load Balancers

Understanding Auto Scaling Group (ASG) in AWS

Mastering Amazon EC2 Dynamic Scaling Policies

Storage

Understanding AWS Storage Options—From Zero to Hero

Simple Storage Service (S3)

Working with AWS S3 Cross-Region Replication

Resizing Images with S3 Batch Operations and AWS Lambda

Managing Data Access with Amazon S3 Access Points

File Storage and Transfer

Getting Started with Amazon FSx for Windows File Server

Databases

Working with Relational Databases: A Beginner's Guide to AWS RDS

Getting Started with Amazon Aurora Database Engine

Working with NoSQL Databases: A Beginner's Guide to AWS DynamoDB

Exploring Graphs with Amazon Neptune

Getting Started with Amazon Keyspaces

Achieving Ultra-Fast Performance Using Amazon MemoryDB for Redis

Improving Database Performance with Amazon ElastiCache for Redis

Migration and Transfer

Use of AWS Database Migration Service from Aurora MySQL to S3

Security and Compliance

Getting Started with AWS Key Management Service (KMS)

Encrypting S3 Buckets and EBS Volumes Using KMS

Protecting Web Applications Using AWS WAF

Managing Aurora DB Credentials and API Keys Using Secrets Manager

Finding Vulnerabilities on EC2 Instances Using AWS Inspector

Deployment Services

Mastering AWS Deployment Services—From Zero to Hero

CloudFormation

Getting to Know AWS CloudFormation

AWS CloudFormation Updates: Change Sets and Stack Policies

Mastering AWS CloudFormation Helper Scripts

Machine Learning

Understanding Machine Learning Services on AWS—From Zero to Hero

Deploying a Machine Learning Model with Amazon SageMaker

Getting Started with Amazon Fraud Detector

Build an Educative Chatbot with Conversational AI Using AWS Lex

Content Delivery and Optimization

Analytics

Analyzing S3 Data and CloudTrail Logs Using Amazon Athena

Getting Started with Amazon EMR

Getting Started with Amazon Redshift

Building ETL Pipelines on AWS

Create a Data Lake with Lake Formation and Analyze It with Athena