S3 Lifecycle Rules

Learn how to save storage costs using S3 Lifecycle policies that automatically move objects between different storage classes.

We'll cover the following

Lifecycle configuration template
Use case: News agency catalog

Press + to interact

But what if our data access requirements vary? For example, consider a software application that stores logs on an S3 bucket and requires them frequently for a week. After that, we can delete them. Similarly, some documents might require frequent access in the initial few weeks or months and would be rarely accessed as time passes by. In such a situation, it can be a hassle to manually transmit objects between multiple storage classes to optimize cost.

To automate the transitioning between the storage classes or deletion of objects based on the configurations, S3 provides S3 Lifecycles. An S3 Lifecycle Configuration is a set of rules that defines when to transmit an object or delete it. There are two types of S3 Configuration rules:

Transition: These rules define the transition of objects between different storage classes.
Expiration: These rules define the deletion of objects from a storage class.

We can configure S3 Lifecycles by determining the access frequency of the objects in the bucket. To facilitate developers determine the access patterns of objects in a bucket, S3 provides two options:

S3 Storage lens: Storage lens works at the AWS organization level, the highest level in the hierarchy, all the way down to the account level, bucket level, and even prefix level. We can use it to get a snapshot of the number of buckets in an organization, the storage space they are taking up, and the cost. Also, it created a bubble chart based on the retrieval rate of each bucket, helping in decoding the information and choosing the right configurations for the S3 Lifecycle.

Press + to interact

S3 Storage class analysis: Storage class analysis provides access and usage patterns information at the bucket level.

Lifecycle configuration template

The S3 Lifecycle configuration document is an XML document. Each document can have multiple rules, and each rule has an ID, status, filters, and lifecycle actions such as transition and expirations. Let's dive in to learn about these further.

Status

The status determines if the Lifecycle rule is enabled or disabled. A disabled rule does not transmit or delete the objects.

Filters

Filters allow us to define a subset of objects to which we want to apply the rules. Below are some filters we can use in our S3 Lifecycle configurations. These can be used in combination using the AND tag.

Filters using key prefixes: Suppose we want to empty the images folder after 30 days. So, we'll configure a rule with the filter using the key prefix images/.
Filters using object tags: We can apply an object’s tag using its key and value.
Filters using object size: We can apply filters using the maximum object size and the minimum object size.

Press + to interact

The important points to note are:

We can transition objects in S3 Standard to any storage class but can not transition back to S3 Standard from any storage class.
Any storage class can transition to Deep Archive storage.

Apart from these, there are a few constraints while defining the transitions. Important ones to note are:

Before transitioning objects to S3 Standard IA or S3 One Zone IA, we must store them in S3 for 30 days. This is not supported by S3 Lifecycles because newer objects are accessed frequently or may get deleted before 30 days, which is not economical for these S3 storage classes.
The minimum storage charge for S3 Standard IA and S3 One Zone IA is 30 days. This means you can not configure a transition to S3 Glacier Flexible Retrieval or S3 Glacier Deep Archive if it occurs before 30 days.

Within these constraints, we can combine these actions to manage the entire lifecycle of an object. For example, we might have an object frequently accessed in the first 30 days and then infrequently accessed for 90 days. After that, we may archive it or delete it permanently.

Press + to interact

To handle this scenario, we can configure a rule to initially transfer the object to S3 Intelligent-Tiering, S3 Standard-IA, or S3 One Zone-IA storage and later on to S3 Glacier Flexible Retrieval or S3 Glacier Deep Archive to archive the object. If the data is unnecessary, we can configure the rule to delete the object and save on storage costs.

Expiring objects

When expiring or deleting objects in a bucket, S3 Lifecycle considers the versioning state of the bucket.

Non-versioned bucket: S3 puts the objects in the queue for removal.
Versioning-enabled bucket: The object deletion only applies to the current version. If the current object is not a delete marker, it creates a delete marker and makes it the current version. It does not take any action if the current object already has a delete marker. If the current object has a delete marker and it’s the only object version, called an expired delete marker, S3 removes it.
Versioning-suspended bucket: S3 creates a delete marker with a null version ID and replaces the object with a null version ID, effectively deleting the object.

S3 Lifecycles allow us to configure separate rules for current and noncurrent object versions. This can be particularly helpful in applications that actively create new versions that keep on accumulating and incurring costs. We can configure rules to delete the older versions after a certain time period or when they hit a certain limit.

We can determine when an object is scheduled for deletion using the HEAD object or GET object API, which returns the response header with the deletion date and time.

When we add Lifecycle configuration rules to a bucket, they can take time to reflect. Also, when the rule matches, the transition and expiration can take some time. However, we are not charged for the objects that were delayed. It’s worth noting that S3 Intelligent Tiering operates differently in this regard, as billing adjustments occur only upon object transitions.

Use case: News agency catalog

Consider a news agency that uses S3 object storage to maintain the catalog of images, videos, and audio. They might need these files from time to time for their news headlines and blogs. Since they have acquired the copyrights for the media files, they must maintain a record of decades-old data. In addition to this, they also manage the blog drafts using S3 versioning.

Since hundreds of images, videos, and audio are added to the S3 bucket on a daily basis, the data is piling up, as well as the bills. Keeping all the files in a single storage class, irrespective of the access frequency and required access latency, is not economical. Therefore, to manage their storage, they configure S3 Lifecycle rules.

The S3 bucket consists of two main folders: media_files/ and blogs/. The media_files/ stores the images and videos and blogs/ stores the drafts of blogs.

To optimize the costs, we’ll configure an S3 Lifecycle rule to carry out the following tasks:

Migrate objects to S3 Standard IA after 2 months.
Migrate objects to S3 Glacier Flexible retrieval after 6 months.

The folders consist of large video files in GBs and small images of less than 1 MB. However, we already know that archiving small objects is not economical as the transition costs exceed the saved cost. Therefore, we will add the following filters:

Migrate objects and a content length between 128 and 1024 bytes after a year to S3 Glacier Flexible retrieval. It is generally a good practice to use this filter while archiving data as it optimizes cost manifolds. The code snippet given below shows our configuration rule.
Objects larger than 128 KB will be moved to Glacier Flexible retrieval after 6 months, as configured above.

Press + to interact

{
  "Rules": [
    {
      "ID": "MoveToStandardIA",
      "Filter": {
        "Prefix": "media_files/"
      },
      "Status": "Enabled",
      "Transitions": [
        {
          "Days": 60,
          "StorageClass": "STANDARD_IA"
        }
      ]
    },
    {
      "ID": "MoveToGlacier",
      "Filter": {
        "Prefix": "media_files/",
        "And": [
          {
            "Prefix": "",
            "Tag": {
              "Key": "archive",
              "Value": "true"
            }
          }
        ]
      },
      "Status": "Enabled",
      "Transitions": [
        {
          "Days": 180,
          "StorageClass": "GLACIER"
        }
      ]
    },
    {
      "ID": "MoveSmallObjectsToGlacier",
      "Filter": {
        "Prefix": "media_files/",
        "And": [
          {
            "NumericLessThan": {
              "content-length": 128
            }
          }
        ]
      },
      "Status": "Enabled",
      "Transitions": [
        {
          "Days": 365,
          "StorageClass": "GLACIER"
        }
      ]
    }
  ]
}

Get hands-on with 1300+ tech skills courses.

Introduction

AWS Fundamentals

Understanding Cloud Computing Essentials— From Zero to Hero

Identity and Access Management

Securing AWS Resources: Managing Access with IAM

AWS IAM Permission Boundaries

Using AWS IAM Access Analyzer

Compute Services

Understanding AWS Compute Services — From Zero to Hero

Amazon EC2: Elastic Compute Cloud

Working with Instances: An Amazon EC2 Walkthrough

Managing Instance Volumes Using EBS

Networking

Understanding Networking Services in AWS—From Zero to Hero

Controlling VPC Traffic Using Network ACLs

Managing Peer Connections between Amazon Virtual Private Clouds

Accessing AWS Services over AWS PrivateLink Using VPC Endpoints

Monitoring IP Traffic Using VPC Flow Logs

Route 53

Serverless Computing

Getting to Know AWS Lambda

Building and Deploying Serverless Applications with AWS SAM

Developing RESTful Microservices with API Gateway and DynamoDB

Building a WebSocket-Based Chat Application Using API Gateway

Mastering AWS AppSync Lambda Resolvers

Application Integration

Getting Started with Amazon Simple Queue Service (SQS)

Handling Amazon SNS Notifications with AWS Lambda

Build a Fanout Serverless Architecture using SNS, SQS, and Lambda

Decoupling Serverless Applications with Amazon EventBridge

Getting Started with AWS Step Functions

Containers

Getting Started with Amazon ECS

Create an EKS Cluster and Deploy an Application

High Availability and Scalability

Managing Application Traffic Using Elastic Load Balancers

Understanding Auto Scaling Group (ASG) in AWS

Mastering Amazon EC2 Dynamic Scaling Policies

Storage

Understanding AWS Storage Options—From Zero to Hero

Simple Storage Service (S3)

Working with AWS S3 Cross-Region Replication

Resizing Images with S3 Batch Operations and AWS Lambda

Managing Data Access with Amazon S3 Access Points

File Storage and Transfer

Getting Started with Amazon FSx for Windows File Server

Databases

Working with Relational Databases: A Beginner's Guide to AWS RDS

Getting Started with Amazon Aurora Database Engine

Working with NoSQL Databases: A Beginner's Guide to AWS DynamoDB

Exploring Graphs with Amazon Neptune

Getting Started with Amazon Keyspaces

Achieving Ultra-Fast Performance Using Amazon MemoryDB for Redis

Improving Database Performance with Amazon ElastiCache for Redis

Migration and Transfer

Use of AWS Database Migration Service from Aurora MySQL to S3

Security and Compliance

Getting Started with AWS Key Management Service (KMS)

Encrypting S3 Buckets and EBS Volumes Using KMS

Protecting Web Applications Using AWS WAF

Managing Aurora DB Credentials and API Keys Using Secrets Manager

Finding Vulnerabilities on EC2 Instances Using AWS Inspector

Deployment Services

Mastering AWS Deployment Services—From Zero to Hero

CloudFormation

Getting to Know AWS CloudFormation

AWS CloudFormation Updates: Change Sets and Stack Policies

Mastering AWS CloudFormation Helper Scripts

Machine Learning

Understanding Machine Learning Services on AWS—From Zero to Hero

Deploying a Machine Learning Model with Amazon SageMaker

Getting Started with Amazon Fraud Detector

Build an Educative Chatbot with Conversational AI Using AWS Lex

Content Delivery and Optimization

Analytics

Analyzing S3 Data and CloudTrail Logs Using Amazon Athena

Getting Started with Amazon EMR

Getting Started with Amazon Redshift

Building ETL Pipelines on AWS

Create a Data Lake with Lake Formation and Analyze It with Athena