S3 Lifecycle Rules
Learn how to save storage costs using S3 Lifecycle policies that automatically move objects between different storage classes.
We'll cover the following
Amazon S3 (Simple Storage Service) is an economical object storage solution. However, since it is mostly used for backups, logs, and archives, data can quickly pile up and incur a large cost. We have already discussed that S3 offers a suite of storage options tailored to meet the required latency at a minimal cost. Using the right storage solution for the use case, enterprises and developers can considerably save their storage costs.
But what if our data access requirements vary? For example, consider a software application that stores logs on an S3 bucket and requires them frequently for a week. After that, we can delete them. Similarly, some documents might require frequent access in the initial few weeks or months and would be rarely accessed as time passes by. In such a situation, it can be a hassle to manually transmit objects between multiple storage classes to optimize cost.
To automate the transitioning between the storage classes or deletion of objects based on the configurations, S3 provides S3 Lifecycles. An S3 Lifecycle Configuration is a set of rules that defines when to transmit an object or delete it. There are two types of S3 Configuration rules:
Transition: These rules define the transition of objects between different storage classes.
Expiration: These rules define the deletion of objects from a storage class.
We can configure S3 Lifecycles by determining the access frequency of the objects in the bucket. To facilitate developers determine the access patterns of objects in a bucket, S3 provides two options:
S3 Storage lens: Storage lens works at the AWS organization level, the highest level in the hierarchy, all the way down to the account level, bucket level, and even prefix level. We can use it to get a snapshot of the number of buckets in an organization, the storage space they are taking up, and the cost. Also, it created a bubble chart based on the retrieval rate of each bucket, helping in decoding the information and choosing the right configurations for the S3 Lifecycle.
S3 Storage class analysis: Storage class analysis provides access and usage patterns information at the bucket level.
Lifecycle configuration template
The S3 Lifecycle configuration document is an XML document. Each document can have multiple rules, and each rule has an ID, status, filters, and lifecycle actions such as transition and expirations. Let's dive in to learn about these further.
Status
The status determines if the Lifecycle rule is enabled or disabled. A disabled rule does not transmit or delete the objects.
Filters
Filters allow us to define a subset of objects to which we want to apply the rules. Below are some filters we can use in our S3 Lifecycle configurations. These can be used in combination using the AND
tag.
Filters using key prefixes: Suppose we want to empty the
images
folder after 30 days. So, we'll configure a rule with the filter using the key prefiximages/
.Filters using object tags: We can apply an object’s tag using its key and value.
Filters using object size: We can apply filters using the maximum object size and the minimum object size.
As per the usual practice, it is recommended to apply a filter of 128KB of data in the S3 Lifecycle configuration to avoid racking up costs when transitioning to Deep Archive.
Transitions
We can define rules to transition objects from one storage class to another using S3 Lifecycle configurations. However, there are certain limitations when defining the transitions. The Amazon S3 Lifecycle configurations support the following waterfall model for the transitions.
The important points to note are:
We can transition objects in S3 Standard to any storage class but can not transition back to S3 Standard from any storage class.
Any storage class can transition to Deep Archive storage.
Apart from these, there are a few constraints while defining the transitions. Important ones to note are:
Before transitioning objects to S3 Standard IA or S3 One Zone IA, we must store them in S3 for 30 days. This is not supported by S3 Lifecycles because newer objects are accessed frequently or may get deleted before 30 days, which is not economical for these S3 storage classes.
The minimum storage charge for S3 Standard IA and S3 One Zone IA is 30 days. This means you can not configure a transition to S3 Glacier Flexible Retrieval or S3 Glacier Deep Archive if it occurs before 30 days.
Within these constraints, we can combine these actions to manage the entire lifecycle of an object. For example, we might have an object frequently accessed in the first 30 days and then infrequently accessed for 90 days. After that, we may archive it or delete it permanently.
To handle this scenario, we can configure a rule to initially transfer the object to S3 Intelligent-Tiering, S3 Standard-IA, or S3 One Zone-IA storage and later on to S3 Glacier Flexible Retrieval or S3 Glacier Deep Archive to archive the object. If the data is unnecessary, we can configure the rule to delete the object and save on storage costs.
Expiring objects
When expiring or deleting objects in a bucket, S3 Lifecycle considers the versioning state of the bucket.
Non-versioned bucket: S3 puts the objects in the queue for removal.
Versioning-enabled bucket: The object deletion only applies to the current version. If the current object is not a delete marker, it creates a delete marker and makes it the current version. It does not take any action if the current object already has a delete marker. If the current object has a delete marker and it’s the only object version, called an expired delete marker, S3 removes it.
Versioning-suspended bucket: S3 creates a delete marker with a null version ID and replaces the object with a null version ID, effectively deleting the object.
S3 Lifecycles allow us to configure separate rules for current and noncurrent object versions. This can be particularly helpful in applications that actively create new versions that keep on accumulating and incurring costs. We can configure rules to delete the older versions after a certain time period or when they hit a certain limit.
We can determine when an object is scheduled for deletion using the HEAD object or GET object API, which returns the response header with the deletion date and time.
When we add Lifecycle configuration rules to a bucket, they can take time to reflect. Also, when the rule matches, the transition and expiration can take some time. However, we are not charged for the objects that were delayed. It’s worth noting that S3 Intelligent Tiering operates differently in this regard, as billing adjustments occur only upon object transitions.
Use case: News agency catalog
Consider a news agency that uses S3 object storage to maintain the catalog of images, videos, and audio. They might need these files from time to time for their news headlines and blogs. Since they have acquired the copyrights for the media files, they must maintain a record of decades-old data. In addition to this, they also manage the blog drafts using S3 versioning.
Since hundreds of images, videos, and audio are added to the S3 bucket on a daily basis, the data is piling up, as well as the bills. Keeping all the files in a single storage class, irrespective of the access frequency and required access latency, is not economical. Therefore, to manage their storage, they configure S3 Lifecycle rules.
The S3 bucket consists of two main folders: media_files/
and blogs/
. The media_files/
stores the images and videos and blogs/
stores the drafts of blogs.
To optimize the costs, we’ll configure an S3 Lifecycle rule to carry out the following tasks:
Migrate objects to S3 Standard IA after 2 months.
Migrate objects to S3 Glacier Flexible retrieval after 6 months.
The folders consist of large video files in GBs and small images of less than 1 MB. However, we already know that archiving small objects is not economical as the transition costs exceed the saved cost. Therefore, we will add the following filters:
Migrate objects and a content length between 128 and 1024 bytes after a year to S3 Glacier Flexible retrieval. It is generally a good practice to use this filter while archiving data as it optimizes cost manifolds. The code snippet given below shows our configuration rule.
Objects larger than 128 KB will be moved to Glacier Flexible retrieval after 6 months, as configured above.
{"Rules": [{"ID": "MoveToStandardIA","Filter": {"Prefix": "media_files/"},"Status": "Enabled","Transitions": [{"Days": 60,"StorageClass": "STANDARD_IA"}]},{"ID": "MoveToGlacier","Filter": {"Prefix": "media_files/","And": [{"Prefix": "","Tag": {"Key": "archive","Value": "true"}}]},"Status": "Enabled","Transitions": [{"Days": 180,"StorageClass": "GLACIER"}]},{"ID": "MoveSmallObjectsToGlacier","Filter": {"Prefix": "media_files/","And": [{"NumericLessThan": {"content-length": 128}}]},"Status": "Enabled","Transitions": [{"Days": 365,"StorageClass": "GLACIER"}]}]}
We can effectively use Lifecycle to delete expired delete markers and incomplete requests for multi-part uploads. and observe the improvements using a storage lens.
Get hands-on with 1300+ tech skills courses.