Amazon SageMaker
Get a detailed introduction to the Amazon SageMaker service and how it works.
In this lesson, we will explore Amazon SageMaker. SageMaker provides a higher level of customization, enabling users to create and develop their machine-learning models. This contrasts with other managed machine learning services we’ve learned about, which are tailored for specific functions such as text translation or audio transcription.
Introduction to SageMaker
Amazon SageMaker is a comprehensive platform designed to simplify the process of building, training, and deploying machine learning models. It offers a suite of tools and services that cater to different stages of the machine learning workflow, making it accessible to developers and data scientists alike.
Core components
Here are the key core components associated with SageMaker:
Notebook instances: SageMaker offers managed Jupyter Notebook instances, allowing users to create and run notebooks for data exploration, model development, and experimentation.
Training jobs: Users can train machine learning models at scale using SageMaker’s managed infrastructure. This component supports training with built-in algorithms or custom algorithms provided by users.
Model hosting: SageMaker facilitates model deployment for real-time or batch inference. Deployed models are automatically scaled to handle varying traffic levels and have built-in monitoring and logging capabilities.
Endpoints: SageMaker endpoints enable interaction with deployed models for inference tasks. Users can send input data to an endpoint and receive real-time predictions from the deployed model.
Data processing: SageMaker provides capabilities for processing large volumes of data at scale. SageMaker Processing allows users to run data preprocessing tasks in a managed environment, simplifying data preparation for training.
Model monitoring: SageMaker includes tools for real-time monitoring of deployed models. Users can set up alerts and triggers to be notified of any issues or anomalies in model performance.
Automatic model tuning: SageMaker offers functionality for automatic hyperparameter tuning of machine learning models. This feature helps users find the best-performing model configurations without manual experimentation.
SageMaker instances
SageMaker utilizes various instances at each step of the machine-learning process. Below, they are detailed as follows:
Processing instances: Used for data preprocessing tasks, like cleaning and transforming data before training.
Training instances: Training instances: Used to train machine learning models by identifying patterns from the historical data.
Inference instances: Used for making predictions based on trained models, helping you get answers from your models in real time.
Transform instances: Used for batch data transformations, applying trained models to large datasets efficiently.
Below is the step-by-step workflow for developing a machine-learning model utilizing SageMaker:
Data input: Users import data from diverse sources into SageMaker processing instances for initial preprocessing and cleaning.
Preprocessing: The processed data is then forwarded to the training instance, where machine learning algorithms are applied to identify patterns from historical data.
Model training: Training instances are utilized to refine the machine learning models based on the identified patterns.
Model storage: The resultant model can be stored within the Model Registry for future reference and deployment.
Real-time analysis: The trained model is deployed on inference instances, allowing for real-time predictions and analyses of new data.
To get more accurate estimates on pricing, visit the SageMaker pricing page.
SageMaker services
Within SageMaker, various tools are available to streamline different aspects of machine learning development.
SageMaker Studio Classic
SageMaker Studio is an integrated development environment (IDE) that provides a single, unified interface for all stages of the machine learning life cycle. It offers features such as built-in collaboration tools, easy access to data, model debugging capabilities, and integrated monitoring and debugging.
In addition to facilitating model development, SageMaker Studio supports MLOps practices.
We can source data from platforms such as GitHub, CodeCommit, S3, or any other relevant source, triggering the pipeline in SageMaker Studio Classic.
Once triggered, the pipeline automatically initiates the pre-processing step, which involves cleaning the data.
The cleaned data is then forwarded to the training phase, followed by the evaluation phase.
Following evaluation, the model is registered in the repository, and a staging endpoint is created. The Lead Engineer of the project can validate and push it to
Prod
as well.The newly deployed model is then accessible through the endpoint, allowing users to invoke the model and obtain results.
JumpStart
JumpStart is a collection of pre-built machine learning solutions and models that help users quickly get started with common use cases. It provides ready-to-use templates and workflows for tasks like image classification, text analysis, and time series forecasting, reducing the time and effort required to build custom models from scratch.
Ground Truth
Ground Truth is a labeling service that helps users create high-quality labeled datasets for training machine learning models. It provides tools for labeling data manually or using automated techniques, as well as workflows for managing labeling jobs and quality control, ensuring accurate and reliable training data.
Data Wrangler
Data Wrangler simplifies the process of data preparation and cleaning, allowing users to easily explore, transform, and visualize their datasets. It provides a visual interface for tasks like data cleaning, feature engineering, and data formatting, enabling data scientists to efficiently prepare their data for training.
Get hands-on with 1300+ tech skills courses.