Dropout
Learn about dropout and how it can reduce overfitting in large neural networks.
We'll cover the following
Chapter Goals:
- Understand why we use dropout in neural networks
- Apply dropout to a fully-connected layer
A. Co-adaptation
Co-adaptation refers to when multiple neurons in a layer extract the same, or very similar, hidden features from the input data. This can happen when the connection weights for two different neurons are nearly identical.
When a fully-connected layer has a large number of neurons, co-adaptation is more likely to occur. This can be a problem for two reasons. First, it is a waste of computation when we have redundant neurons computing the same output. Second, if many neurons are extracting the same features, it adds more significance to those features for our model. This leads to overfitting if the duplicate extracted features are specific to only the training set.
B. Dropout
The way we minimize co-adaptation for fully-connected layers with many neurons is by applying dropout during training. In dropout, we randomly shut down some fraction of a layer’s neurons at each training step by zeroing out the neuron values. The fraction of neurons to be zero’d out is known as the dropout rate, . The remaining neurons have their values multiplied by so that the overall sum of the neuron values remains the same.
The two images represent dropout applied to a layer of 6 units, shown at multiple training steps. The dropout rate is 1/3, and the remaining 4 neurons at each training step have their value scaled by x1.5.
By randomly dropping a fraction of the neurons, we are essentially choosing a random sample of the neurons to use at each training step. Therefore, each individual neuron works with many different subsets of the other neurons rather than all of them at once. This helps each neuron avoid over-relying on other neurons to correct its mistakes (the underlying cause of co-adaptation), while still allowing the neurons to learn different things from one another.
Time to Code!
In this chapter, we'll create the apply_dropout
function, which applies dropout to the dense
fully-connected layer, with a dropout rate of 0.4. In TensorFlow, dropout is implemented via the tf.keras.layers.Dropout
function. The function takes in a required argument of an input tensor, which in this case will be dense
.
Some important keyword arguments are:
rate
: The dropout rate, default is0.5
.training
: Whether or not the model is in training mode, default isFalse
. We only apply dropout during training.
We use the parameter is_training
to see whether or not we need to apply dropout.
Set dropout
equal to tf.keras.layers.Dropout
applied with required argument dense
. We'll set the rate
argument to 0.4
and training
argument to is_training
.
Then return dropout
.
import tensorflow as tfclass MNISTModel(object):# Model Initializationdef __init__(self, input_dim, output_size):self.input_dim = input_dimself.output_size = output_sizedef model_layers(self, inputs, is_training):reshaped_inputs = tf.reshape(inputs, [-1, self.input_dim, self.input_dim, 1])# Convolutional Layer #1conv1 = tf.keras.layers.Conv2D(filters=32,kernel_size=[5, 5],padding='same',activation='relu',name='conv1')(reshaped_inputs)# Pooling Layer #1pool1 = tf.keras.layers.MaxPool2D(pool_size=[2, 2],strides=2,name='pool1')(conv1)# Convolutional Layer #2conv2 = tf.keras.layers.Conv2D(filters=64,kernel_size=[5, 5],padding='same',activation='relu',name='conv1')(pool1)# Pooling Layer #2pool2 = tf.keras.layers.MaxPool2D(pool_size=[2, 2],strides=2,name='pool2')(conv2)# Dense Layerhwc = pool2.shape.as_list()[1:]flattened_size = hwc[0] * hwc[1] * hwc[2]pool2_flat = tf.reshape(pool2, [-1, flattened_size])dense = tf.keras.layers.Dense(1024, activation='relu', name='dense')(pool2_flat)# CODE HERE
Get hands-on with 1300+ tech skills courses.