Elastic Block Store (EBS)
Explore how to store data using Elastic Block Store in AWS and its core concepts.
Elastic Block Store (EBS) is a block storage service provided by AWS. EBS volumes are the basic unit in EBS that offer persistent block-level storage options that can be mounted to EC2 instances, offering high-performance and low-latency storage options. Let’s dive in deep to learn more about EBS volumes.
EBS is a zonal service, which means it is deployed in an availability zone (AZ) in a region. EBS volumes have a built-in resilience when a volume fails; however, when an AZ fails, volumes in it can suffer.
EBS volumes
EBS volumes are like hard drives that can be mounted to the EC2 instances for additional storage. We can mount multiple volumes to a single instance. By default, a root volume is attached to the EC2 instance, which is not persistent. This means we lose the data in the default volume as soon as we terminate the instance. Any additional EBS volume mounted to the EC2 instance retains the data independent of the EC2 instances’ state. Therefore, we can always access the data in an EBS volume by mounting it to another EC2 instance if the volume is not formatted.
Types of EBS volumes
EBS offers multiple types of volumes which differ in performance and cost. We can choose among these according to our requirements to optimize performance and cost. Volumes are gauged based on these four factors:
General purpose SSD volumes
The general purpose is based on Solid State Drives (SDD). SDD volumes are optimized for IOPS, best for workloads involving frequent read and write operations. They are particularly helpful when the I/O payload size is small.
General-purpose volumes are typically used for testing and development. They can also store log files or small-scale applications where average latency is acceptable and data is not accessed frequently.
We have two types of general-purpose volumes: gp2
and gp3
. The gp2
volumes implement a credit mechanism to determine the amount of IOPS it can perform. Let’s understand the mechanism. Consider a bucket of credit replenished at 3 IOPS per second, and we can burst up to 100 to 16,000 IOPS. Consider one such bucket for a volume of 100 GiB such as shown below:
The IOPS are replenished at a rate of 300 per second. Suppose we start with a low input-output rate, such as 200 IOPS. The number of IOPS being consumed is less than the IOPS replenished, which means our IOPS credit accumulates. Suppose the input-output operations for the volume increase up to 300 IOPS after some time. This is the baseline performance when the IOPS being consumed equals the amount of IOPS being replenished. At this point, the accumulated credit remains as it is. As more time passes, we experience a burst in input-output operations up to 400 IOPS. Thus, our volume would now start using up the accumulated credit.
The ability of the bucket to handle burst times depends upon the accumulated credit. This means if your application experiences occasional bursts in traffic, gp2
volumes allow you to benefit from the burst balance accumulated over time.
On the other hand, gp3
volumes offer a consistent baseline performance IOPS irrespective of the size of the volume. These volumes do not charge for the burst times and are ideal where I/O requirements are more predictable and consistent.
Provisioned IOPS volumes
As the name suggests, provisioned IOPS volumes are optimized for provisioning IOPS. These volumes are ideal for use cases requiring more than baseline performance where we want to perform frequent read-write operations. Thus, they are commonly stored in large relational or non-relational databases such as MySQL and more.
Provisioned IOPS volumes allow us to adjust the IOPS independent of the size. However, they are limited by the per-instance performance, which is the maximum performance provided by the EC2 instance to which the volume is mounted.
There are two types of provisioned IOPS systems, io1
and io2 block express
. The io2 express
is a step ahead of io1
in performance. Provisioned IOPS volumes are helpful when we need sub-millisecond latency and high performance for smaller volumes.
Throughput-optimized HDD volumes
Hard disk drives (HDD) are optimized for throughput. Thus, they are commonly used in scenarios where we have a large size of synchronized I/O.
Throughput-optimized HDD volumes, or st1
, offer a maximum of 500 IOPS; however, the payload size is up to 500 MB, which means it can transfer a maximum of 500 MBs of data in one second. They are suitable for scenarios where data is read or written in large, sequential chunks, and the emphasis is on sustained data throughput rather than low-latency random access. For example, we might use them for big data, log processing, or data warehousing.
Cold HDD volumes
Cold HDDs are optimized for costs. Cold storage in the context of data storage typically refers to a type of storage that is optimized for long-term retention of data at a lower cost compared to more frequently accessed or hot storage solutions.
Cold HDD volumes, or sc1
, are designed for infrequent or cold workloads where lower-cost storage is prioritized over high-performance, frequent access. Cold HDD volumes are suitable for specific use cases where the data access pattern is infrequent and the workload can tolerate higher latencies.
It is commonly used as a backup storage, data snapshots, and archival. For example, consider a company with regulatory requirements for data retention. They can conveniently use Cold HDD volumes as they are more economical.
The table below summarizes the three types of storage.
Types Of EBS Volumns
Type of Volumes | Size of Volume | Max IOPS | Use case |
General purpose SSD | 1GiB—16TiB | 16000 | Testing and development |
Provisioned IOPS | 4GiB—16TiB | 64000 | Large database and applications with frequent read-write operations |
Throughput-optimized HDD | 500GiB—16TiB | 500 | Large scale data processing, data warehousing |
Cold HDD | 500GiB—16TiB | 250 | Long term data retention |
EBS snapshots
EBS snapshots are the backups of the EBS volumes at a particular instant. Snapshots are typically stored in an S3 bucket, only accessible through the S3 API or AWS CLI. They contain the entire information required to restore the data in an EBS volume.
EBS snapshots are incremental. This means we only take snapshots of the data added after the most recent snapshot to save time and storage space. Therefore, the first snapshot is a full snapshot of all the blocks in the volume. The next snapshot would be an incremental snapshot of only the blocks added or modified after the last snapshot.
EBS snapshots provide multiple features to lock a snapshot and protect it from malicious activities and deletion, monitor the snapshot locks, and copy a snapshot. In short, EBS ensures that our data is recoverable and safe.
EBS snapshots are regionally resilient. The duplicates of snapshots are stored across multiple AZs in a region, so if one AZ fails, another takes over. We can use snapshots to restore the data in other AZs and to migrate the data.
Fast Snapshot Restore (FSR)
EBS volumes lazily restore volumes from an S3 bucket. If we try to access any block of data that has not been loaded yet, EBS fetches it from S3 urgently. However, this is not as quick as reading directly from the EBS volumes.
EBS snapshots offer Fast Snapshot Restore (FSR) to resolve this issue. It instantly restores a fully initiated EBS volume from the snapshot, eliminating the I/O latency when it is first accessed.
To leverage FSR, we must explicitly enable FSR for the snapshot and specify an availability zone. Upon restoration from that snapshot, a fully initiated EBS volume is restored in the specified availability zone.
EBS encryption
Amazon Elastic Block Store encryption is a feature that allows us to encrypt our EBS volumes and snapshots using industry-standard AES-256 data encryption. It uses AWS Key Management Service (KMS), a service that allows us to create and control encryption keys. Encryption occurs on the host EC2 servers, which protect data at rest and in transit. The following data is encrypted in EBS encryption:
-
Data at rest inside the volume
-
Data in transit between the volume and the instance
-
Snapshots created from the volume
-
Volumes created using encrypted snapshots
Get hands-on with 1300+ tech skills courses.