Amazon Redshift - Master AWS Certified Solutions Architect Associate SAA-C03 Exam

Press + to interact

How Amazon Redshift works

Redshift uses a distributed architecture with multiple nodes working in parallel to execute queries. This Massively Parallel (MPP) architecture enables Redshift to run complex queries and quickly scale horizontally to process large datasets.

The core component of Redshift is a cluster. A cluster consists of multiple compute nodes and a leader node.

We connect to the leader node to interact with the cluster. The leader node generates a query execution plan and aggregates results. It manages the communication between the client and the compute nodes.
The compute nodes perform the query and respond to the leader node with the results. Each compute node has its own memory and processing power based on the instance’s size. As the workload increases, we can change the compute instance’s type, number of instances, or both.

Redshift creates a separate isolated network for the leader and compute nodes within which the nodes communicate over high bandwidth connections and custom communication protocols. The illustration below depicts an overview of the Redshift architecture.

Press + to interact

Snapshots for fault tolerance

Reshift supports both Single-AZ (Availability Zone) and Multi-AZ clusters. The Multi-AZ clusters are fault tolerant. However, Redshift offers snapshots to recover a Single-AZ cluster from a disaster.

Amazon Redshift snapshots are point-in-time backups of data warehouse clusters. When the snapshot is taken, it captures the entire cluster state, including data, schema, and configuration settings. These snapshots are incremental in nature and are stored on an S3 bucket.

We can restore a snapshot to a cluster within the same or different AWS region. This feature can be helpful to backup data in a different region or availability zone to increase fault tolerance. Redshift allows us to manually create snapshots or automate them to create snapshots after every 12 hours or 5 GB of data change.

Press + to interact

Note that, in the Redshift Spectrum, the cluster and S3 bucket should be in the same region.

Use case: Business analytics

Consider a retail company that wants to analyze its sales data to gain insights into customer behavior, product performance, and market trends. They have large volumes of transactional data stored in various databases, including sales records, customer information, and inventory data.

The company uses Amazon Redshift as a centralized data warehouse to consolidate and store all its transactional data in a single location. To load the data from multiple data sources to a Redshift cluster, they use AWS Glue, a data integration service, to extract, transform, and load data. The data analysts and data scientists use SQL queries and Business Intelligence (BI) tools like Amazon QuickSight, Tableau, or Looker to analyze the data stored in Redshift. They create data models, dashboards, and reports to visualize key metrics such as sales revenue, customer demographics, and product trends.

To advance their analytics strategies, they can use Redshift ML and derive deeper insights from the data. As the volume of data grows or the complexity of analytics increases, the company can easily scale its Redshift cluster up or down to meet its evolving needs.

Press + to interact

Introduction

AWS Fundamentals

Understanding Cloud Computing Essentials— From Zero to Hero

Identity and Access Management

Securing AWS Resources: Managing Access with IAM

AWS IAM Permission Boundaries

Using AWS IAM Access Analyzer

Compute Services

Understanding AWS Compute Services — From Zero to Hero

Amazon EC2: Elastic Compute Cloud

Working with Instances: An Amazon EC2 Walkthrough

Managing Instance Volumes Using EBS

Networking

Understanding Networking Services in AWS—From Zero to Hero

Controlling VPC Traffic Using Network ACLs

Managing Peer Connections between Amazon Virtual Private Clouds

Accessing AWS Services over AWS PrivateLink Using VPC Endpoints

Monitoring IP Traffic Using VPC Flow Logs

Route 53

Serverless Computing

Getting to Know AWS Lambda

Building and Deploying Serverless Applications with AWS SAM

Developing RESTful Microservices with API Gateway and DynamoDB

Building a WebSocket-Based Chat Application Using API Gateway

Mastering AWS AppSync Lambda Resolvers

Application Integration

Getting Started with Amazon Simple Queue Service (SQS)

Handling Amazon SNS Notifications with AWS Lambda

Build a Fanout Serverless Architecture using SNS, SQS, and Lambda

Decoupling Serverless Applications with Amazon EventBridge

Getting Started with AWS Step Functions

Containers

Getting Started with Amazon ECS

Create an EKS Cluster and Deploy an Application

High Availability and Scalability

Managing Application Traffic Using Elastic Load Balancers

Understanding Auto Scaling Group (ASG) in AWS

Mastering Amazon EC2 Dynamic Scaling Policies

Storage

Understanding AWS Storage Options—From Zero to Hero

Simple Storage Service (S3)

Working with AWS S3 Cross-Region Replication

Resizing Images with S3 Batch Operations and AWS Lambda

Managing Data Access with Amazon S3 Access Points

File Storage and Transfer

Getting Started with Amazon FSx for Windows File Server

Databases

Working with Relational Databases: A Beginner's Guide to AWS RDS

Getting Started with Amazon Aurora Database Engine

Working with NoSQL Databases: A Beginner's Guide to AWS DynamoDB

Exploring Graphs with Amazon Neptune

Getting Started with Amazon Keyspaces

Achieving Ultra-Fast Performance Using Amazon MemoryDB for Redis

Improving Database Performance with Amazon ElastiCache for Redis

Migration and Transfer

Use of AWS Database Migration Service from Aurora MySQL to S3

Security and Compliance

Getting Started with AWS Key Management Service (KMS)

Encrypting S3 Buckets and EBS Volumes Using KMS

Protecting Web Applications Using AWS WAF

Managing Aurora DB Credentials and API Keys Using Secrets Manager

Finding Vulnerabilities on EC2 Instances Using AWS Inspector

Deployment Services

Mastering AWS Deployment Services—From Zero to Hero

CloudFormation

Getting to Know AWS CloudFormation

AWS CloudFormation Updates: Change Sets and Stack Policies

Mastering AWS CloudFormation Helper Scripts

Machine Learning

Understanding Machine Learning Services on AWS—From Zero to Hero

Deploying a Machine Learning Model with Amazon SageMaker

Getting Started with Amazon Fraud Detector

Build an Educative Chatbot with Conversational AI Using AWS Lex

Content Delivery and Optimization

Analytics

Analyzing S3 Data and CloudTrail Logs Using Amazon Athena

Getting Started with Amazon EMR

Getting Started with Amazon Redshift

Building ETL Pipelines on AWS

Create a Data Lake with Lake Formation and Analyze It with Athena