Health Checks

Understand how Route 53 performs health checks on the resources.

AWS Route 53 health checks are a useful and powerful feature. Health checks are independent entities from resource records. While records define how to route traffic, health checks provide separate mechanisms to monitor a resource’s health. Health checks allow us to monitor endpoints’ health and availability, such as web servers, load balancers, or other endpoints. Our Route 53 records then reference these health check results to make informed routing decisions.

How health check works

The health checks in Route 53 provide us with a lot of flexibility. We can configure custom health checks based on our requirements. Health checks continuously monitor the specified endpoints by sending HTTP, TCP, or HTTPS requests to the endpoint at regular intervals. Endpoints are then classified as healthy or unhealthy based on the evaluation criteria. If an endpoint is determined to be unhealthy, Route 53 automatically redirects the traffic to healthy endpoints. Health checks are performed by a fleet of health checkers distributed globally.

Press + to interact

The health checkers evaluate the health of an endpoint based on the response time and the failure threshold. If we want to check the health of systems hosted on the public internet, we need to allow these checks to occur from health checkers. Route 53 aggregates the data from the health checkers and determines whether the endpoint is healthy. If more than 18% of health checkers report that the endpoint is healthy, it is considered healthy. Otherwise, it is declared unhealthy.

Health check criteria

The health checkers determine the health of an endpoint based on its response time. The response times differ for health check request types:

  • HTTP/HTTPS: A successful health check requires the endpoint to send back an HTTP status code in the 2xx or 3xx range within a 2-second timeframe after the connection is established.

  • TCP: To ensure a healthy endpoint, Route 53 expects a TCP connection to be established within 10 seconds.

  • HTTP/HTTPS with string matching: Route 53 needs to connect to the endpoint within four seconds, and the endpoint should send back a status code in the 2xx or 3xx range within two seconds after connecting. Route 53 then searches for the specified string, which should be present in the first 5120 bytes of the response body. The health check fails if the string is unavailable in the first 5120 bytes of the response.

Press + to interact
Health check
Health check

Calculated health checks

While standard Route 53 health checks offer a robust way to monitor endpoint health, calculated health checks take it a step further. They enable the aggregation of results from multiple health checks. These health checks can be accumulated from different resources or a single resource. This yields a more nuanced health assessment for multiple resources.

The health check that monitors other health checks is called a parent health check. The subsequent health checks are called child health checks. A parent health check can monitor up to 255 children’s health checks. Parent health checks serve as primary monitors for groups of related resources, aggregating the health status of multiple child health checks associated with different endpoints or components. This hierarchical structure provides a more organized and comprehensive approach to monitoring the health of the infrastructure.

Press + to interact
Calculated health check
Calculated health check

In the illustration above, the parent health check, i.e., calculated health check, would have AND logic on all children. Depending on the use case, we can also configure it to use OR or NOT logic to evaluate the final result.

We can even set up a fleet of health checkers for a single resource. In such a case, Route 53 adds up the number of health checks that are considered healthy for that resource and then checks if that number passes the threshold of healthy checks required for that resource to be marked as healthy.

Health check using CloudWatch alarms

We can create alarms for AWS resources like databases, load balancers, instances, etc. Route 53 monitors the stream associated with a CloudWatch alarm instead of monitoring the alarm state. The data stream likely reflects the metric that CloudWatch alarm monitors (e.g., CPU utilization for a web server). If the data stream signals that the alarm state is “OK,” the resource is considered healthy. On the other hand, if the data stream indicates an “ALARM” state, the resource is considered unhealthy.

Press + to interact
CloudWatch alarms
CloudWatch alarms

In cases where the data stream lacks sufficient information to deduce the alarm state, determining the health check status depends upon the configured setting for the health check status. The setting for health check status can be one of the following:

  • Healthy

  • Unhealthy

  • Last known status

Benefits of health checks

  • Improved reliability: Health checks proactively identify potential issues before they impact users, minimizing downtime and ensuring a highly available infrastructure.

  • Automatic failover: Route 53 automatically redirects traffic away from unhealthy endpoints to healthy ones, maintaining a seamless user experience.

  • Configurable monitoring: Set up health checks tailored to our specific needs. Choose from HTTP, TCP, or HTTPS requests, and define the criteria for determining a healthy endpoint.

  • Integration with DNS routing: Health checks seamlessly integrate with Route 53’s DNS routing. We can configure routing policies based on the health status of endpoints. For example, we can implement failover routing to automatically redirect traffic to healthy resources during outages.

Get hands-on with 1300+ tech skills courses.