S3 Replication
Learn how to enable cross-region and same-region replication in S3 buckets.
In computing and data management, replication refers to creating and maintaining copies of data, objects, or systems in multiple locations. The primary goal of replication is to enhance data availability, fault tolerance, and performance by ensuring that identical copies of data are distributed across different locations or systems.
Replication in S3 buckets
Amazon S3 buckets allow us to asynchronously replicate objects from the source to the target bucket. S3 allows us to replicate objects between buckets in similar or different accounts and regions. Also, we can set up multiple destination buckets for replication.
A commonly asked question is why we use S3 replication when we can conveniently upload objects manually to replicate them. Well, S3 replication is helpful in a bunch of cases:
Replicate objects with metadata: S3 replication copies objects along with the original metadata, such as creation made, modification date, and more.
Time-bound replication: To meet the compliance requirements, we might need to replicate objects in a limited time. S3 replication offers Replication Time Control (RTC) which ensures replication of 99% of the object within 15 minutes.
Replicate to multiple storage classes: We can replicate data quickly to a different storage class to optimize costs.
Data redundancy: Replicate data across regions to improve network latency and failure tolerance.
Replication configuration
A replication configuration is the set of rules and settings that define how object replication is managed between source and destination S3 buckets. It is written in XML
format and contains the following important elements:
Status (mandatory): Specifies if the rule is
Enabled
orDisabled
. S3 does not consider a disabled rule while replicating.IAMRole: The S3 buckets require permissions to get the bucket configurations and read-write the objects. Thus, the S3 should have the necessary permissions to replicate objects from source to destination buckets. We can restrict an IAM role to allow replication of all the objects or a subset of objects based on filters.
Priority (mandatory): Determines which rule will be given priority when there is a conflict among multiple rules with the same destination bucket.
Destination (mandatory): Specifies the destination bucket(s).
Filters: Filters allow us to replicate only a subset of objects. We can configure filters based on the key prefixes and tags of an object. For example, we have two folders jun an S3 bucket named
/images
and/documents
, and we only want to replicate the/images
folders. To handle this case, we can configure a filter with the prefiximages
. Thus, all the objects with the prefiximages
in their URL will be replicated to the destination bucket.DeleteMarkerReplication (mandatory): S3 stores delete markers when we delete an object in a version-enabled S3 bucket. However, in a configuration profile, we can specify if we want to replicate delete markers or not.
Note: To configure replication on an S3 bucket, we must enable versioning on both source and destination buckets.
S3 buckets offer multiple options to replicate the objects. Let’s learn about each of these options.
Cross-Region Replication
Cross-Region Replication (CRR) is a live replication technique where we can replicate objects across regions as they are written into the bucket. It is commonly used to achieve data redundancy.
For example, consider a multinational company that relies heavily on cloud storage for its critical data and applications. The company wants to ensure that its data is resilient to regional outages, natural disasters, or any unforeseen events that could impact the availability of its primary data center.
To address this, the company employs Cross-Region Replication in Amazon S3. They set up replication rules to automatically copy objects from their primary S3 bucket in one AWS region to a secondary S3 bucket in a geographically distant region. This secondary region can act as a backup and minimize latency for the compute services in the region.
Another common use case of CRR is to achieve data redundancy or decrease data access latency by bringing the data close to the users accessing the bucket from various regions.
Same-Region Replication
Same-Region Replication (SRR), as the name suggests, allows us to replicate objects in the same region. This type of replication is asynchronous, meaning that changes made to the source bucket are not immediately reflected in the destination bucket. SRR achieves eventual consistency but is not ideal for synchronous real-time replication scenarios.
Same-region replication is typically used to ensure high availability and protect against the potential loss of data. Also, we can use it to aggregate objects, such as logs, from multiple source buckets across a region into one destination bucket.
Two-way or bidirectional replication
Bidirectional replication, also known as two-way replication, synchronizes changes made at both the source and destination bi-directionally. In this setup, data modifications can occur independently at each end, and the changes are propagated in both directions to maintain consistency between the systems.
To understand its use case, consider a scenario where a sales team operates in different regions, each with its database storing sales orders. Bidirectional replication allows sales representatives in each region to independently enter and update sales orders. The changes made in one region are then replicated bidirectionally to other regions, ensuring the sales order data is consistent across all locations.
S3 Batch Replication
S3 Batch Replication allows us to replicate the existing objects in a bucket. Unlike SRR and CRR, which replicate objects as they are added to the bucket, S3 Batch Replication is an on-demand process. We can use Batch Replications for the following scenarios:
Replicate the existing objects before configuring SRR or CRR.
Replicate objects that failed to replicate using the
FAILED
replication status.Replicate the replicas of the objects as they can only be replicated using S3 Batch Operations.
Replicate objects that have already replicated to a different region.
Get hands-on with 1300+ tech skills courses.