Data Splits Using the Slicing API

DL algorithms require large datasets to train models. Once the model is trained, we have to find its performance on unseen examples to assess its generalization ability. To this end, we have to split our dataset into various partitions. This lesson presents common dataset partitions and uses TensorFlow Datasets (TFDS) to demonstrate dataset splits using the slicing API of the TF framework.

Common dataset splits

It’s common practice to split a dataset into three partitions for training, validating, and testing a DL model. The following figure presents three partitions of a full dataset. The greater length of the training set indicates that the number of training examples is greater than the examples in the other two partitions.

Get hands-on with 1200+ tech skills courses.