API Throttling

Understand the concept of API throttling and how it can be leveraged to secure our API.

Why throttle?

Anything in excess is wrong. More so when using AWS resources. We all want more users to utilize our application, but when the traffic bursts beyond limits, it could overburden the system. The traffic surge could also be caused by a hacker, not actual good traffic.

The world is full of hackers. Unfortunately, as our products gain popularity, the more likely they are to be targeted. When hackers try to attack the application by simultaneously making too many API calls, it shows up as a DDoS attack on the API Gateway. AWS has several services dedicated to security against such attacks. The AWS Shield is the most popular for guarding against DDoS attacks.

The simplest way to hold back such attacks is through throttling. We can restrict the burst of API invocations with this configuration. With the appropriate design for throttling, we can ensure that DDoS attacks are stopped at the gateway without troubling the systems behind it. Throttling limits the number of concurrent API invocations to ensure that our system isn’t surprised by more requests than it can handle.

Request rate and burst

API Gateway allows us to define throttling in two components, namely the number of requests per second and the burst count. The rate determines the maximum aggregate rate over the second, and the burst defines the maximum number of concurrent requests. However, to understand them in detail, we must understand the underlying token bucket algorithm used to implement such throttling.

Token bucket algorithm

It’s a simple algorithm. For example, consider a bucket of size b. We have a process that continuously supplies new tokens to this bucket at rate r. We get a new token every 1/r second.

If we don’t use (remove) those tokens, the bucket fills over time, and any new tokens added to the bucket will overflow. We can’t have more than `b` tokens in the bucket at any time. If we remove tokens from the bucket, it refills as we add new tokens every 1/r second. We can draw a maximum of b tokens from the bucket at any time. Over a long duration, we can’t average more than r tokens per second. We can continuously withdraw tokens from this bucket subject to these constraints.

The API gateway throttling works similarly to this token bucket. It helps us restrict the incoming traffic with these two measures. The throttling burst limit is the maximum size of a burst of API requests. The throttling rate is the highest rate of the invocations allowed over time.

Get hands-on with 1200+ tech skills courses.