Disaster-Proofing Machine Learning Pipelines.png

The machine learning (ML) pipeline involves a complex relationship between the data, the model, and its implementation—each with its own risks that can adversely affect the utility and profitability of the solution. This course is a primer on what these risks are, where they come from, and how to mitigate them effectively.

In this course, you’ll start with a comprehensive look at the data side of the pipeline, including data privacy, data drift, and more. You’ll learn how to mitigate these in theory and practice. You’ll also discover problems related to ML models such as bias, security, and adversarial attacks. Finally, you’ll learn some of the alternative AI paradigms that exist in the world today—from causal AI to federated learning to generative AI.

A deep understanding of where problems can arise is a critical part of a data engineer or data scientist’s ML knowledge. From a career perspective, this course’s content can effectively address the real risks faced by developers while setting up ML pipelines.

Mitigating Disasters in ML Pipelines

Learn some mathematical approaches to bias mitigation in ML.

Introduction

Disasters in Data

Disasters in Models

Alternatives to Traditional ML

Conclusion

Assessment: Disasters in ML Pipelines

Theory of Data Bias Mitigation

Non-ML approaches

Oversampling and undersampling

Oversampling