S3 Events and Batch Operations
Learn how to get a notification for any change in the metadata of an existing object in S3 and how to perform similar operation on multiple objects in an S3 bucket using S3 Batch Operations
S3 events
An event in S3 is defined as a change of an object in a bucket. For example, when an object is added to a bucket, we change the metadata of an existing object, delete an object, and more.
S3 event notifications
For some of these events, S3 can publish notifications. To publish a notification, we need to define the action on which the notification should be fired and the destination of the notification. These events can be published to AWS services including Amazon SNS topics, Amazon SQS queues, Lambda function, and EventBridge.
The ability of the S3 events to integrate with these services allows us to create fully event-driven architectures.
Example: Image resizing application
To understand it further, let’s consider the classic image processing example. When we add images to a bucket, we can invoke a Lambda function to resize the image and place it into another S3 bucket.
But how does the Lambda function know when an image is added? The S3 bucket passes on an event message to the destination. This event message consists of basic information such as time, source as well as the object information that triggered the event. Thus, the Lambda function can extract the newly added image’s name from the message and perform the resize operation on the image.
To publish event notifications to the destination resource, S3 requires IAM permissions to access the destination service. Thus, to allow S3 to pass the event to the Lambda function, we’ll attach IAM permission to invoke the Lambda function.
We can configure events using AWS Management Console, AWS CLI, AWS SDK, and REST APIs. This configuration attaches to the S3 as a subresource called notification subresource. In addition to the actions and destinations, we can add filters to the events. Filters are helpful in multiple use cases; for example, we want to publish notifications only if the objects with .png
extension is added to a bucket.
S3 Batch Operations
We might want to perform a similar operation on multiple objects in an S3 bucket. Operating individually on each object can be time-consuming, especially if the bucket scales to hundreds of thousands of objects. Writing an application to automate the process can also be a hassle.
To facilitate the S3 users, S3 provides S3 Batch Operations. The S3 Batch Operations allow us to perform a single job on multiple objects in a bucket. All we need to do is specify the list of objects and the operation and kick on the job. S3 handles the progress tracking, complete report generation, retries, and notifications. Thus, it provides a fully managed, automated, and auditable experience.
The S3 Batch Operation has three main components:
Job: It is the basic unit in S3 Batch Operations, which contains the necessary information to carry out the operation on the specified list of objects.
Manifest: It specifies the objects on which the Batch Operations will be performed. It can be a CSV formatted file.
Operation: It specifies the API action we want to perform on the objects, such as copying objects. The operation is performed on each object individually.
Task: It is the basic execution unit of a job. S3 Batch Operations create a task for each object specified in the manifest file.
While the job executes, we can monitor its progress manually on the S3 console or programatically. We can also configure the S3 Batch Operations to write a report in the desired format, which describes the results of each task and can provide helpful insights in case a task fails.
Use case: Encrypting files in an S3 bucket
S3 Batch Operations are used for various tasks, such as replicating buckets, modifying object tags, modifying object ACLs, image and video processing, and more.
To understand it further, let’s consider a use case. Suppose a company has stored a large amount of unencrypted data in S3 buckets. To improve its security posture, it wants to encrypt each object in the bucket. Doing it manually can be cumbersome, and designing an application to automate the process could be time-consuming.
In such scenarios, S3 Batch Operations can help abundantly. We must create a manifest file with the list of the objects to be encrypted, define an encryption algorithm as the operation, and start the job. The S3 Batch Operations will manage each task in addition to tracking progress and providing a complete execution report.
Get hands-on with 1300+ tech skills courses.