scikit-learn

Learn about scikit-learn and how it simplifies training and prediction.

We'll cover the following

Importance of scikit-learn in predictive modeling
Simplifying training and prediction with scikit-learn
Try it yourself

While pandas will save you a lot of time loading, examining, and cleaning data, the machine learning algorithms that will enable you to do predictive modeling are located in other packages.

Importance of scikit-learn in predictive modeling

Scikit-learn is a foundational machine learning package for Python that contains many useful algorithms and has also influenced the design and syntax of other machine learning libraries in Python. For this reason, we focus on scikit-learn to develop skills in the practice of predictive modeling. While it’s impossible for any one package to offer everything, scikit-learn comes pretty close in terms of accommodating a wide range of classic approaches for classification, regression, and unsupervised learning. However, it does not offer much functionality for some more recent advancements, such as deep learning.

Here are a few other related packages you should be aware of:

SciPy:

Most of the packages we’ve used so far, such as NumPy and pandas, are actually part of the SciPy ecosystem.
SciPy offers lightweight functions for classic methods such as linear regression and linear programming.

StatsModels:

More oriented toward statistics and maybe more comfortable for users familiar with R.
Can get p-values and confidence intervals on regression coefficients.
Capability for time series models such as ARIMA.

XGBoost and LightGBM:

Offer a suite of state-of-the-art ensemble models that often outperform random forests. We will learn about XGBoost in Gradient Boosting, SHAP Values, and Dealing with Missing Data.

TensorFlow, Keras, and PyTorch:

Packages that offer deep learning capabilities.

Get hands-on with 1400+ tech skills courses.

Introduction

Data Exploration and Cleaning

(Challenge) Exploring Remaining Financial Features in Dataset

Introduction to scikit-learn and Model Evaluation

Fake News Detection Using Scikit-learn

(Challenge) Logistic Regression and Precision-Recall Curve

Details of Logistic Regression and Feature Extraction

(Challenge) Logistic Regression Model and Coefficients

The Bias-Variance Trade-Off

(Challenge) Cross-Validation and Feature Engineering

Decision Trees and Random Forests

(Challenge) Cross-Validation Grid Search with Random Forest

Gradient Boosting, XGBoost, and SHAP Values

(Challenge) XGBoost and SHAP Explanation for Case Study Data

Predict Frog Toxicity with Python and XGBoost

Test Set Analysis, Financial Insights, and Delivery to the Client

(Challenge) Deriving Financial Insights

Appendix

scikit-learn

Importance of scikit-learn in predictive modeling