Introduction
An overview of unsupervised learning and clustering.
We'll cover the following
In this chapter, continuing with scikit-learn, you will use unsupervised learning methods—methods of extracting insights from unlabeled datasets. Specifically, you'll learn about different clustering algorithms and how they are able to group together similar data observations.
A. Unsupervised learning
So far, we've used only supervised learning methods because we've exclusively been dealing with labeled datasets. However, in the real world, many datasets are completely unlabeled because labeling datasets involves additional work and foresight. Rather than ignoring all these unlabeled datasets, we can still extract meaningful insights using unsupervised learning.
Since we have only data observations to work with and no labels, unsupervised learning methods are centered around finding similarities/differences between data observations and making inferences based on those findings. The most commonly used form of unsupervised learning is clustering. As the name suggests, clustering algorithms gather data into distinct groups (clusters), where each cluster consists of similar data observations.
Clustering is used in many different applications, from anomaly detection (detecting real vs. fraudulent data) to market research (grouping customers together based on their purchase history). In the upcoming chapters, you'll learn about a variety of clustering algorithms commonly used in data science, as well as other tools for finding similarities between data observations.
Create a free account to view this lesson.
Continue your learning journey with a 14-day free trial.
By signing up, you agree to Educative's Terms of Service and Privacy Policy