Amazon CloudWatch

Get a detailed introduction to the Amazon CloudWatch service and how it works.

While it’s essential to be quick and efficient when operating our cloud infrastructure, it’s equally important to govern it according to our organization’s guidelines and compliance standards.

Amazon CloudWatch is one such service that helps us govern the cloud infrastructure. This is because the Amazon CloudWatch service provides us with observability of any services and resources that are deployed on the AWS cloud, as well as any on-premises infrastructures or other cloud infrastructures.

Press + to interact

With this observability, we can respond to any drastic changes in the infrastructure or identify and optimize any resources acting as bottlenecks within the infrastructure.

Introduction to Amazon CloudWatch

Amazon CloudWatch is an AWS tool used primarily for monitoring cloud infrastructures, including those infrastructures that exist on clouds other than AWS or on-premises. CloudWatch tracks and collects insights, called metrics, from the monitored AWS resources in the form of a time-ordered set of data points.

CloudWatch also offers a centralized logging system, called CloudWatch logs, that allows us to collect, compile, and analyze any log files from various AWS resources. With CloudWatch logs, we can detect real-time operational changes in cloud resources.

Press + to interact

We can utilize these logs by setting up a CloudWatch alarm, which gets triggered when a metric crosses a predefined limit and, in turn, also triggers a CloudWatch event. CloudWatch events are the appropriate actions we need to perform in the cloud as an automated response. These CloudWatch events can be both time-based and event-based.

Key concepts

Here’s a list of key CloudWatch concepts that we need to know about:

  • Metrics: We can define a metric as a quantifiable measure for tracking and assessing a resource or service’s status, performance, and availability. As previously discussed, a metric represents a set of time-ordered data points about the specific aspects of a resource, such as CPU usage, memory consumption, or network traffic, providing valuable insights into the operational state of that resource.

    • Namespaces: A container of CloudWatch metrics can be described as and called a namespace. Using namespaces, we can isolate metrics from different applications to avoid them being amalgamated and read within the same statistic. We can create any custom namespace we like with a completely customizable name. However, in the case of an AWS namespace where metrics are categorized by a specific AWS service, AWS follows the AWS/<service-name> naming convention. For example, the namespace for the S3 service would be AWS/S3.

    • Dimensions: To be able to differentiate between and identify a metric, we use its dimensions. A dimension is simply a key-value pair that provides additional information about a metric with which we can differentiate and filter metrics based on specific criteria. For example, for most EC2 instance metrics, the dimension can be an instance ID, with which we can filter out metrics for EC2 instances that match the instance IDs we want.

  • CloudWatch logs: CloudWatch Logs are real-time log data from different AWS resources and services that we can use to understand and troubleshoot any performance and operational issues.

  • CloudWatch statistic: CloudWatch statistic is an aggregation of metric data over a defined time interval. Each statistic also has a unit of measure. These aggregations can be made based on factors like the metric’s namespace, metric name, dimensions, and unit of measure.

Press + to interact
Key resources in CloudWatch
Key resources in CloudWatch
  • Alarms: A CloudWatch alarm is a powerful tool that can monitor different metrics of an AWS resource, such as an EC2 instance where it monitors metrics like CPU utilization, network traffic, disk usage, and more. An alarm can also respond with an appropriate action whenever it is triggered. A CloudWatch alarm has the following three main statuses that indicate its state:

    • OK: This status indicates the metric is within the defined threshold, indicating normal behavior.

    • ALARM: This status indicates that the metric has crossed the threshold defined in the alarm, signifying an unusual or undesired state. This state often triggers preconfigured actions.

    • INSUFFICIENT_DATA: This status indicates that the alarm has just started or there isn’t enough data for the alarm to determine the metric state. This can happen if the metric is new or if data points are missing or not available for some reason.

  • CloudWatch events: Amazon CloudWatch events is a service within AWS that enables automated response to changes in AWS resources and applications by detecting and responding to real-time operational changes. An event-based event triggers actions in response to events from AWS services and resources, while a time-based event triggers actions in response to a scheduled timer.

  • CloudWatch agent: The CloudWatch agent can be used to collect internal system-level metrics from EC2 instances or even collect custom data. We can also install the CloudWatch agent on other cloud and on-premises resources to enable CloudWatch monitoring on them as well.

How CloudWatch works

Amazon CloudWatch collects, tracks, and monitors various metrics and logs from AWS services, applications, and on-premises resources. The following diagram illustrates the workflow of the Amazon CloudWatch service:

Press + to interact
Amazon CloudWatch workflow
Amazon CloudWatch workflow

Here’s a breakdown of the CloudWatch alarm creation workflow:

  • Collecting metrics: CloudWatch collects and creates a repository of metrics either from AWS services or from custom data.

  • Creating statistics: A statistic is calculated and created on CloudWatch based on these metrics.

  • Retrieve statistics and logs: We can either use the AWS Management Console to access the CloudWatch statistics and logs or set up an API to access those resources through a third-party resource.

  • Set up alarm: We configure alarms to perform specified actions when a certain metric threshold is achieved.

  • Send SNS notifications/Auto-scale EC2 instance: When the alarm is triggered, we can either send an SNS notification or scale EC2 instances using the EC2 auto-scaling service.

Example: CloudWatch alarm

Here’s an example of how we can set up a CloudWatch billing alarm to monitor charges on the AWS cloud. By setting up billing alarms, we can proactively monitor our spending against a budget threshold and avoid unexpected charges. These alarms provide real-time notifications, allowing us to take immediate action if costs get unexpectedly high.

  • Enable billing: The first step is enabling billing on our AWS account. To do so, we need to go to the AWS Billing console and allow CloudWatch billing alerts under alert preferences.

  • Choose the appropriate billing metric: When configuring an alarm on CloudWatch, we need to set the Billing/Total Estimated Charge metric as the metric we want to monitor.

  • Choose alarm period: Next, set a suitable time interval for the Cloud alarm to check the metric.

  • Set billing limit: Choose a billing limit and set the appropriate comparison operator that’s applicable here.

  • Set up SNS notification: Set up SNS notification to send alerts when the billing alarm is triggered.

Here’s an illustration of how such an alarm would work:

Press + to interact
CloudWatch billing alarm
CloudWatch billing alarm

After setting up a billing alarm, we can now be better informed when our actual cost starts getting more than our estimated costs, taking the necessary steps to control and significantly reduce any excess charge we may be incurring.

Use cases

Here are some use cases of CloudWatch:

  • Monitoring performance of resources: We can use CloudWatch to monitor the performance of any cloud resources and deployed applications, making sure they’re optimized and in working order.

  • Performing root cause analysis: We can use CloudWatch to perform a root cause analysis of any issues within our infrastructure, allowing us to quickly debug and resolve them.

  • Optimizing resource management: We can leverage CloudWatch alarms and events to manage costs by triggering actions when the required threshold is met. This means we can scale up resources or terminate services when these alarms are triggered.

  • Testing website impacts: We can use CloudWatch to estimate the impact on our website by collecting logs and web requests over a period of time.

Benefits

Here are some of the benefits of the CloudWatch service:

  • We can visualize and analyze service and resource data with complete observability.

  • We can operate efficiently and set up automation within our infrastructure.

  • We can instantly access the integrated dashboard view of our cloud resources.

  • We can regularly monitor and collect enhanced insights into the end user experiences.

  • We can set up CloudWatch to automatically perform required actions when alarms get triggered, such as terminating EC2 instances or autoscaling groups, without human intervention.

Understanding costs

Here are the key points to understand the costs associated with CloudWatch:

  • Free tier: AWS offers a free tier for CloudWatch, which includes basic resource monitoring but with a limited number of metrics, alarms, storage, and log data ingestion.

  • Standard vs. detailed monitoring: EC2 instances, for example, come with free basic monitoring, but detailed monitoring is a paid feature.

  • Custom and high-resolution metrics: These are also a paid feature. Custom metrics are those created by the user, and high-resolution metrics allow data to be recorded at intervals shorter than the standard 1-minute period.

Press + to interact
  • Dashboards and alarms: The creation of CloudWatch dashboards and alarms incurs costs, with prices depending on the number and type.

  • Logs: Ingesting, storing, and analyzing log data in CloudWatch Logs is subject to charges based on the volume of data.

  • Data transfer and API calls: Costs are associated with data transfer and the number of API calls made.

Note: Pricing of most resources on AWS follows the pay-as-you-go approach. This means that we only pay based on what not-free-to-use AWS resources we use and how we use them; there are no minimum fees and no required upfront commitments.

Understanding that CloudWatch pricing is usage-based and varies by the type and amount of data monitored. Remember that exact costs can vary; we can check them on the AWS pricing page for accuracy.

To get the most accurate pricing for CloudWatch, go to the official Amazon CloudWatch Pricing web page.


This lesson taught us about the Amazon CloudWatch service, its potential benefits, and how it helps monitor resource and service metrics and configure alarms based on metrics to trigger automatic events and actions within our cloud infrastructure.

Get hands-on with 1300+ tech skills courses.