Fundamentals of Machine Learning

Get an overview of GANs and fundamentals of machine learning.

GANs have created a revolutionary storm in the machine learning (ML) community. To some extent, they have changed how people solve practical problems in computer vision (CV) and natural language processing (NLP). Before we dive right into the storm, let us prepare ourselves with the fundamental insights of GANs. 

Press + to interact
Inter-vertical overlap in GANs
Inter-vertical overlap in GANs

How GANs work—an analogy

To introduce how GANs work, let’s use an analogy:

A long, long time ago, there were two neighboring kingdoms on an island. One was called Netland, and the other was called Ganland. Both kingdoms produced fine wine, armor, and weapons. In Netland, the king demanded that the blacksmiths who specialized in making armor worked at the east corner of the castle, while those who made swords worked at the west side so that the lords and knights could choose the best equipment the kingdom had to offer. The king of Ganland, on the other hand, put all of the blacksmiths in the same corner and demanded that the armor makers and sword makers should test their work against each other every day. If a sword broke through the armor, the sword would sell at a good price, and the armor would be melted and reforged. If it didn’t, the sword would be remade, and men would strive to buy the armor. One day, the two kings were arguing over which kingdom made better wine until the quarrel escalated into war. Though outnumbered, the soldiers of Ganland wore the armor and swords that had been improved for years in the daily adversarial tests, and the Netland soldiers could not break their strong armor nor withstand their sharp swords. In the end, the defeated king of Netland, however reluctant he was, agreed that Ganland had better wine and blacksmiths.

Machine learning—classification and generation

ML is the study of recognizing patterns from data without hardcoded rules given by humans. The recognition of patterns (pattern recognition or PR) is the automatic discovery of the similarities and differences among raw data, which is an essential way to realize artificial intelligence (AI) that only exists in novels and movies. Although it is hard to tell when exactly real AI will come to birth in the future, the development of ML has given us much confidence in recent years. ML has already been widely used in many fields, such as CV, NLP, recommendation systems, intelligent transportation systems (ITS), medical diagnoses, robotics, and advertising.

An ML model is typically described as a system that takes in data and gives certain outputs based on the parameters it contains. The learning of the model is adjusting the parameters to get better outputs. As illustrated in the following diagram, we feed training data into the model and get a certain output. We then use one or several criteria to measure the output to tell how well our model performs. In this step, a set of desired outputs (or ground truth) with respect to the training data would be very helpful. If ground truth data is used in training, this process is often called supervised learning. If not, it is often regarded as unsupervised learning.

We constantly adjust the model’s parameters based on its performance (in other words, whether it gives us the results we want) so that it yields better results in the future. This process is called model training. The training of a model takes as long as it pleases us. Typically, we stop the training after a certain number of iterations or when the performance is good enough. When the training process has finished, we apply the trained model to predict new data (testing data). This process is called model testing. Sometimes, people use different data sets for training and testing to see how well the model performs on samples it never meets, which is called the generalization capability. Sometimes, an additional step called model evaluation is involved when the parameters of the model are so complicated that we need another set of data to see whether our model or training process has been designed well.

Press + to interact
Model training and model testing
Model training and model testing

What types of problems this model can solve is essentially determined by the types of input and output data we want. For example, a classification model takes an input of any number of dimensions (audio, text, image, or video) and gives a 1-dimensional output (single values indicating the predicted labels). A generative model typically takes a 1-dimensional input (a latent vector) and generates high-dimension outputs (images, videos, or 3D models). It maps low-dimensional data to high-dimensional data, at the same time, trying to make the output samples look as convincing as possible. However, it is worth pointing out that we’ll meet generative models that don’t obey this rule later in the course. It is just a simple rule to bear in mind.

When it comes to AI, there are two groups of believers in the community. The symbolists acknowledge the necessity of human experience and knowledge. They believe the low-level patterns constitute high-level decisions based on explicit rules given by humans. The connectionists believe that AI can be realized by an analogous network similar to human neural systems, and adjusting the connections between simple neurons is the key to this system. Apparently, the exploding development of deep learning adds a score to the connectionists’ side.

Adversarial generative concept

Traditionally, generative problems are solved by statistics-based methods such as a Boltzmann machine, Markov chain, or variational encoder. As mathematically profound as they are, the generated samples are, as of yet, far from perfect. A classification model maps high-dimensional data to low-dimension, while a generative model often maps low-dimension data to high-dimension ones. People in both fields have been working hard to improve their models. Let’s look back to the little made-up opening story. Can we get the two different models to work against each other and improve themselves at the same time? If we take the output of a generative model as the input of the classification model, we can measure the performance of the generative model (the armor) with the classification model (the sword). At the same time, we can improve the classification model (the sword) by feeding generated samples (the armor) and real samples since we agree that more data is often better for training ML models.