S3 Bucket
Learn how S3 stores data and explore the core features of S3 bucket in AWS.
We'll cover the following
An S3 bucket is a fundamental storage unit in Amazon S3 (Simple Storage Service), which is a scalable and highly durable object storage service provided by Amazon Web Services (AWS). An S3 bucket is essentially a container for storing and organizing objects (files or data) within the S3 storage system.
Objects in a bucket
S3 is a global service which means that the buckets are globally accessible. Thus, each bucket should have a globally unique name.
We can upload any object in an S3 bucket, including text files, spreadsheets, code, images, videos, and more. S3 allows us to store an unlimited amount of data in a bucket. However, the maximum size of an object can not exceed 5TB. Although the structure of a bucket is flat, we can organize the data hierarchically in folders for convenience, with each object having a URL.
We can access objects through the URLs as follows:
https://<region_name>/<bucket-name>/<object_name>
Consider a bucket my-bucket
deployed in us-east-1
region having a folder /images
and a file my-image.png
to it. We can access the image files through the URL as follows:
https://us-east-1/my-bucket/my-image.png
Core features
Let's quickly overview the core features of S3 buckets:
Versioning: S3 buckets offer bucket versioning to maintain versions of an object and restore a previous version. This is useful for tracking changes over time and recovering from accidental deletions or modifications.
Access control: By default, objects in an S3 bucket are not publically accessible. Access to S3 buckets and their objects is controlled through a combination of bucket policies, access control lists (ACLs), and AWS Identity and Access Management (IAM) roles and policies. This allows users to define who can access the data and what actions they can perform.
Static website hosting: Since each object in an S3 bucket is accessible through a URL, we can use S3 buckets to host static websites.
Monitoring and logging: S3 buckets maintain access logs by default. We can also enable activity logs using Amazon CloudWatch and set up alarms. We can analyze these logs to determine any potential threats.
Storage classes: S3 provides different storage classes, such as Standard, Intelligent-Tiering, Glacier, etc. Each class is optimized for different use cases based on factors like access frequency and cost.
Replication: To improve data availability, we can replicate buckets across multiple regions in AWS. S3 buckets seamlessly replicate buckets across regions to decrease the overhead on the developer's side.
Transfer Acceleration: The S3 Transfer Acceleration feature allows fast and secure transfer of S3 bucket contents.
Encryption: To maintain data confidentiality and integrity, S3 buckets encrypt data by default using Amazon S3 managed keys (SSE-S3). We can further gain control over encryption by using services like KMS.
S3 Events: We can use events in an S3 bucket, such as uploading an item to the bucket, deleting an item, and more, to trigger other AWS services, such as the Lambda function.
We will learn more about these features in detail in the sections ahead.
Use case: Data ingestion pipeline
S3 buckets are more than just a storage service. For example, we can use S3 buckets to create complete data ingestion pipelines, data analytics applications, and complete event-driven architectures.
Consider one such example. Suppose we want to design an e-commerce clickstream analysis pipeline. In this application, we’ll gather the clickstream data from the web server logs, capturing page views, product clicks, and user interactions.
The architecture diagram below shows the infrastructure of our application. The logs are directly stored on an S3 bucket, which generates an event every time an item is added to the bucket and invokes a Lambda function. The Lambda function extracts the relevant information from the stream, performs some analysis on the data, and stores it back in the S3 bucket.
To perform ad-hoc analysis on the data in the S3 bucket, we can use Amazon Athena. Also, we can generate visual reports of this data using Amazon Quicksight.
Create a free account to view this lesson.
Continue your learning journey with a 14-day free trial.
By signing up, you agree to Educative's Terms of Service and Privacy Policy