8  Modeling pipeline with Neural networks

In previous chapters, we covered the fundamentals of PyTorch and applied it to a linear regression example. Now, we will expand on that knowledge by constructing neural networks using PyTorch.

8.1 Modeling Pipeline overview

Fig 8.1: Modeling Pipeline in Pytorch

Let’s take a look at the different steps involved in creating a typical modeling pipeline in PyTorch -

  • Getting the data - PyTorch provides several tools for loading and preprocessing data, such as the torchvision library for image-related tasks or torchtext for natural language processing. You can also create custom data loaders to load data in your desired format.

  • Build Dataloaders -Once you have your data, you’ll need to create data loaders, which are responsible for batching and shuffling the data during training. Data loaders are essential for efficient training, as they allow you to load and preprocess data in parallel, making use of the GPU capabilities for faster training.

  • Define Model - Next, you’ll need to define your model architecture. PyTorch provides a wide range of pre-defined layers and modules that you can use to build your neural network. You can also create custom layers or models by subclassing PyTorch’s nn.Module class. Defining your model involves specifying the layers, their connectivity, and any other parameters or hyperparameters that you need for your specific task.

  • Build Optimizer and Scheduler - Once your model is defined, you’ll need to configure an optimizer and a scheduler. The optimizer is responsible for updating the model’s parameters during training to minimize the loss, while the scheduler adjusts the learning rate to optimize the model’s performance. PyTorch provides various optimization algorithms, such as SGD, Adam, or RMSprop, and scheduling techniques like learning rate decay or cyclical learning rates.

  • Run training and validation loops - With your data loaders, model, optimizer, and scheduler in place, you’re ready to start the training loop. The training loop typically involves iterating over the data loaders, forwarding the inputs through the model, computing the loss, and backpropagating the gradients to update the model’s parameters. You’ll also need to evaluate your model’s performance on a validation set to monitor its progress during training and avoid overfitting.

  • Deploy - Once your model has been trained, you can deploy it for inference on new data. PyTorch provides tools for saving and loading model checkpoints, which allows you to reuse your trained model in different applications. You can deploy your model in a variety of environments, such as edge devices, cloud servers, or web applications, depending on your specific requirements.

In summary, a typical modeling pipeline in PyTorch involves getting the data, building data loaders, defining the model architecture, configuring the optimizer and scheduler, implementing the training and validation loop, and finally deploying the trained model for inference in various environments.

Let’s dive into the practical implementation of a modeling pipeline in PyTorch using the popular MNISTdataset as an example. We’ll follow the steps outlined above to build our first neural network from scratch.

8.2 Downloading Data from Kaggle

The dataset we will be utilizing is the MNIST png dataset from Kaggle, as opposed to the CSV version, for a more practical experience.

Here are few steps you need to perform before we download the data -

  • If you don’t have a Kaggle account, you can make one for free here.

  • To download the dataset, you will need kaggle installed, you can run the following command in notebook or CLI.

    !pip install kaggle >> /dev/null
  • Have a kaggle.json stored in ~/.kaggle. You can get your token by going to Your Profile -> Account -> Create New API Token.

Once you have the above three steps done, run the API command provided:

!kaggle datasets download -d jidhumohan/mnist-png -p "../data/"
Downloading mnist-png.zip to ../data
 87%|█████████████████████████████████     | 51.0M/58.6M [00:01<00:00, 33.4MB/s]
100%|██████████████████████████████████████| 58.6M/58.6M [00:01<00:00, 33.9MB/s]

To examine the file system, we will utilize the fastcore Path function. It enhances the functionality of python’s Path class and simplifies the process of inspecting directories and folders.

from fastcore.xtras import Path
zip_path = Path("../data/mnist-png.zip")
zip_path.exists() # Check if the file exist

The data has been persisted to the mnist-png.zip file on the local system, within the ../data directory. The next step is to utilize the zipfile package to extract the contents of the archive.


The execution of the following code block will take a significant amount of time(6-10 mins) as it involves the extraction of 70,000 PNG images.

# Output directory
dPath = Path("../data/")

# Unzipping data file in output directory
import zipfile
with zipfile.ZipFile(zip_path, "r") as zip_ref:

# Removing the original zip file

# Removing duplicate folder in the unzipped data
import shutil
dPath = dPath/'mnist_png'

Next, we inspect the extracted folder.

(#2) [Path('../data/mnist_png/testing'),Path('../data/mnist_png/training')]

Data contains of two folder training and testing. Next, we inspect training folder.

(#10) [Path('../data/mnist_png/training/0'),Path('../data/mnist_png/training/1'),Path('../data/mnist_png/training/2'),Path('../data/mnist_png/training/3'),Path('../data/mnist_png/training/4'),Path('../data/mnist_png/training/5'),Path('../data/mnist_png/training/6'),Path('../data/mnist_png/training/7'),Path('../data/mnist_png/training/8'),Path('../data/mnist_png/training/9')]

The training folder comprises of subfolders for each digit ranging from 0 to 9.

(#5923) [Path('../data/mnist_png/training/0/1.png'),Path('../data/mnist_png/training/0/1000.png'),Path('../data/mnist_png/training/0/10005.png'),Path('../data/mnist_png/training/0/10010.png'),Path('../data/mnist_png/training/0/10022.png'),Path('../data/mnist_png/training/0/10025.png'),Path('../data/mnist_png/training/0/10026.png'),Path('../data/mnist_png/training/0/10045.png'),Path('../data/mnist_png/training/0/10069.png'),Path('../data/mnist_png/training/0/10071.png')...]

Each of these digit subfolders contains images. We will proceed to load a few of these images.

from PIL import Image
from IPython.display import display
for img in [Image.open((dPath/'training/0').ls()[0]), 

8.3 Creating Dataset Object

As previously discussed, prior to training the model, it is necessary to establish a data pipeline in PyTorch. This includes defining a Dataset object and subsequently loading it via a PyTorch Dataloader.

8.3.1 Using Pure Pytorch

Initially, we will demonstrate the process of constructing a custom image Dataset object using pure PyTorch. To begin, we will import the necessary libraries.

import torch
from torch import nn
from torch.utils.data import Dataset
import glob
import numpy as np

The glob library can be utilized to obtain the filepaths of all images within a directory.

train_paths = glob.glob(str(dPath/'training/**/*.png'))
test_paths = glob.glob(str(dPath/'testing/**/*.png'))
print(f'Training images count: {len(train_paths)} \nTesting images count: {len(test_paths)}')
Training images count: 60000 
Testing images count: 10000
['../data/mnist_png/training/0/1.png', '../data/mnist_png/training/0/1000.png', '../data/mnist_png/training/0/10005.png', '../data/mnist_png/training/0/10010.png', '../data/mnist_png/training/0/10022.png']

By utilizing glob, we have successfully obtained the filepaths of all images within the training and testing folders. We can see there are 60,000 training images and 10,000 testing images. The next step is to extract the labels from the folder names.

train_targets = list(map(lambda x: int(x.split('/')[-2]), train_paths))
test_targets  = list(map(lambda x: int(x.split('/')[-2]), test_paths))
print(f'Training labels count: {len(train_targets)} \nTesting labels count: {len(test_targets)}')
Training labels count: 60000 
Testing labels count: 10000
[2 3 8 5 7]

Now let’s define our custom image Dataset class.

class ImageDataset(Dataset):
    def __init__(self, X, y):
        self.img_paths = X
        self.targets  = y

    def __len__(self): 
        return len(self.img_paths)

    def __getitem__(self, idx):
        current_sample = torch.tensor(np.array(Image.open(self.img_paths[idx]))).flatten()/255.
        current_target = self.targets[idx]
        return (

As we can see above, ImageDataset is a custom PyTorch Dataset class. Let’s walk through the components -

  • The class takes two inputs in its constructor, X and y, which are lists of image file paths and corresponding labels respectively. These are stored as class variables self.img_paths and self.targets.
  • The __len__ method returns the number of images in the dataset by returning the length of self.img_paths list.
  • The __getitem__ method is called when a specific sample is requested from the dataset. It takes an index as an argument, and returns a tuple of the image data and the corresponding label for that index. The image is processed as follows -
    • It opens the image file at the index passed in the argument using PIL(Python Imaging Library) Image.open function
    • Converts it to a numpy array
    • Flattens it (convert it from 28x28 2d array to 784 1-d array)
    • Normalizes it by dividing by 255 floating number

We will now proceed to instantiate our ImageDataset class for both thetraining and testing datasets

train_ds = ImageDataset(X=train_paths, y=train_targets)
test_ds = ImageDataset(X=test_paths, y=test_targets)

print(f'One object: Image Tensor of shape {train_ds[0][0].shape}, Label: {train_ds[0][1]}')
print(f'One object: Image Tensor of shape {train_ds[20000][0].shape}, Label: {train_ds[20000][1]}')
One object: Image Tensor of shape torch.Size([784]), Label: 0
One object: Image Tensor of shape torch.Size([784]), Label: 3

8.3.2 Using Torchvision

We have demonstrated the procedure of creating a custom ImageDataset object. Now we will examine how to simplify this process by utilizing the torchvision package. The torchvision package encompasses commonly used datasets, model architectures, and image transformations for computer vision tasks.

To begin, we will import the datasets and transforms modules from the torchvision package.

from torchvision import datasets
from torchvision import transforms
from tqdm.auto import tqdm

Next we will use datasets and transform modules to load our MNIST images.

transform = transforms.Compose([
    transforms.Lambda(lambda x: torch.flatten(x))

## Create a dataset
train_ds = datasets.ImageFolder(root = dPath/'training/', 

test_ds = datasets.ImageFolder(root=dPath/'testing', transform=transform)

print(f'Length of train dataset: {len(train_ds)}, test_dataset: {len(test_ds)}')
print(f'One object: Image Tensor of shape {train_ds[0][0].shape}, Label: {train_ds[0][1]}')
Length of train dataset: 60000, test_dataset: 10000
One object: Image Tensor of shape torch.Size([784]), Label: 0

Let’s look at the code above:

The first step is to define a transform object using the transforms.Compose function. This function takes a list of transformation functions as an argument and applies them in the order they are passed in. In this case, the following transformations are applied:

  • transforms.Grayscale(): Convert the images to grayscale
  • transforms.ToTensor(): Converts the images to PyTorch tensors
  • transforms.Lambda(lambda x: torch.flatten(x)): Flatten the tensors from 28x28 2-D arrayto 784 1-D array

Next, it creates two datasets for training and testing using the datasets.ImageFolder class. It takes the root directory of the dataset and the transform object as the arguments. It automatically creates a label for each image by taking the name of the folder where the image is stored.

The code then prints the length of the train and test datasets and the shape and label of the first object in the train dataset. The datasets.ImageFolder class is a convenient way to create a Pytorch dataset from a directory of images and it is useful when you have the data in a structured way.

8.4 Create a Dataloader

Create a dataloader using torch.utils.data.DataLoader function.

import os
num_workers = int(os.cpu_count()/2)
train_dls = torch.utils.data.DataLoader(train_ds, batch_size=128, shuffle=True, num_workers=num_workers)
test_dls = torch.utils.data.DataLoader(test_ds, batch_size=128, shuffle=False, num_workers=num_workers)

The torch.utils.data.DataLoader class takes a dataset object as an argument and returns an iterator over the dataset object. It can be used to load the data in batches, shuffle the data, and apply other useful functionality.

In the above code, following parameters are passed to the DataLoader:

  • train_ds and test_ds are the training and testing datasets respectively.
  • batch_size=128: The number of samples per batch.
  • shuffle=True for the training dataset, and shuffle=False for the testing dataset: whether to shuffle the data before iterating through it.
  • num_workers=num_workers: the number of worker threads to use for loading the data. Here it is set to half of the number of CPU cores using os.cpu_count() method.

It returns two data loaders, one for the training dataset and one for the testing dataset. The data loaders can be used as iterators to access the data in batches. This allows to load the data in smaller chunks, making it more memory efficient and faster to train.

Let’s look at one batch.

batch = next(iter(train_dls))
batch[0].shape, batch[1]
(torch.Size([128, 784]),
 tensor([5, 4, 1, 5, 7, 5, 4, 7, 2, 1, 5, 7, 6, 5, 8, 6, 3, 7, 8, 0, 4, 4, 4, 0,
         6, 7, 1, 4, 0, 6, 3, 9, 1, 0, 1, 9, 4, 1, 0, 1, 9, 3, 8, 2, 6, 2, 1, 2,
         1, 0, 2, 4, 7, 4, 7, 3, 3, 4, 3, 3, 4, 4, 7, 3, 3, 4, 6, 5, 1, 0, 2, 3,
         0, 4, 5, 7, 1, 5, 0, 1, 1, 3, 0, 0, 1, 4, 0, 6, 2, 3, 8, 1, 8, 1, 2, 5,
         5, 8, 9, 9, 9, 3, 1, 1, 3, 4, 1, 7, 8, 0, 1, 1, 2, 9, 1, 5, 3, 4, 0, 6,
         1, 4, 0, 8, 9, 1, 7, 4]))

As we can observe, each batch comprises of an input tensor of shape (128x784) representing 128 images of flattened (28x28) dimension, and a label tensor of shape (128) representing the corresponding digit labels for the images.

8.5 Defining our Training and Validation loops

We will now implement the training loop. It is similar to the training loop we constructed in chapter 6.

## Training loop
def train_one_epoch(model, data_loader, optimizer, loss_func):
    total_loss, nums = 0, 0
    for batch in tqdm(iter(data_loader)):
        ## Taking one mini-batch
        xb, yb = batch[0].to(dev), batch[1].to(dev)
        y_pred = model.forward(xb)
        ## Calculation mean square error per min-batch
        nums += len(yb)
        loss = loss_func(y_pred, yb)
        total_loss += loss.item() * len(yb)

        ## Computing gradients per mini-batch
        ## Update model parameters and zero grad
    return  total_loss / nums

The train_one_epoch function takes 4 arguments:

  • model: The model to be trained
  • data_loader: The data loader for the training dataset
  • optimizer: The optimizer used to update the model parameters
  • loss_func: The loss function used to calculate the error of the model

The function uses a for loop to iterate through the data loader. For each mini-batch of data, it performs the following steps:

  • It loads the data and the labels from the data loader and sends it to the device.
  • It makes a forward pass through the model to get the predictions and then calculates the loss using the loss function.
  • It computes the gradients of the model parameters with respect to the loss.
  • It updates the model parameters using the optimizer and zero the gradients.
  • The total_loss and nums variables are used to keep track of the total loss and number of samples seen during the epoch.
def validate_one_epoch(model, data_loader, loss_func):
    loss, nums, acc = 0, 0, 0
    with torch.no_grad():
        for batch in tqdm(iter(data_loader)):
            xb, yb = batch[0].to(dev), batch[1].to(dev)
            y_pred = model.forward(xb)
            nums += len(yb)
            loss += loss_func(y_pred, yb).item() * len(yb)
            acc += sum(y_pred.argmax(axis=1) == yb).item()
    return loss/nums, acc/nums

The validate_one_epoch function takes 3 arguments:

  • model: The model to be validated
  • data_loader: The data loader for the validation dataset
  • loss_func: The loss function used to calculate the error of the model

This function also uses a for loop to iterate through the data loader. For each mini-batch of data, it performs the following steps:

  • It loads the data and the labels from the data loader and sends it to the device.
  • It makes a forward pass through the model to get the predictions and then calculates the loss using the loss function.
  • It compares the predictions to the labels to calculate the accuracy.
  • The loss, nums, and acc variables are used to keep track of the total loss, number of samples seen during the epoch and accuracy respectively.

8.6 Training using a Fully Connected/ Multi Layer Perceptron Model

Let’s define our model.

class MLP(nn.Module):
    def __init__(self, n_in, n_out):
        self.model = nn.Sequential(
            nn.Linear(n_in, 256),
            nn.Linear(256, 128),
            nn.Linear(128, n_out)
    def forward(self, x):
        return self.model(x)

The code above defines an MLP model as a Pytorch nn.Module class. The class takes in two arguments, n_in and n_out which represents the number of input features and the number of output features of the model respectively. The class is a simple Multi-layer Perceptron model with 3 hidden layers. Each hidden layer have a linear layer with a ReLU activation function. The forward method takes in input tensor x and returns the output by passing it through the defined sequential model.

Let’s define our training parameters.

dev = torch.device("cuda") if torch.cuda.is_available() else torch.device("cpu")
loss_func = nn.CrossEntropyLoss()
model = MLP(784,10).to(dev)
optimizer = torch.optim.SGD(model.parameters(), lr=1e-3)
epochs = 5

This code is preparing the model, loss function, optimizer, and the number of training epochs to train the MLP model.

  • dev = torch.device("cuda") if torch.cuda.is_available() else torch.device("cpu"): This line of code is determining which device to use for training. If a CUDA-enabled GPU is available, the model and data will be moved to the GPU for faster training, otherwise it will use the CPU.
  • loss_func = nn.CrossEntropyLoss(): This line of code is defining the loss function for the model. CrossEntropyLoss is a commonly used loss function for multi-class classification problems.
  • model = MLP(784,10).to(dev): This line of code is instantiating the MLP model with 784 input features and 10 output features, and then moving it to the device.
  • optimizer = torch.optim.SGD(model.parameters(), lr=1e-3): This line of code is creating an optimizer with Stochastic Gradient Descent (SGD) algorithm and a learning rate of 1e-3. The optimizer updates the model parameters during training to minimize the loss function.
  • epochs = 5: This line of code is specifying the number of training epochs. An epoch is one complete pass through the entire training dataset.

We will now evaluate the performance of our model on the validation dataset before training.

test_loss, test_acc = validate_one_epoch(model=model, data_loader=test_dls, loss_func=loss_func)
print(f"Random model: Test Loss: {test_loss:.4f}, Test Accuracy: {test_acc:.4f}")
Random model: Test Loss: 2.3025, Test Accuracy: 0.0772

As anticipated, the model’s accuracy is low, around 7-8%, due to the fact that it has not been trained yet.

We will now encapsulate our previously defined functions in a fit function, which will be responsible for both training and evaluating the model.

def fit(epochs, model, loss_func, opt, train_dls, valid_dls):
    for epoch in range(5):    
        train_loss = train_one_epoch(model=model, data_loader=train_dls, optimizer=optimizer, loss_func=loss_func)
        test_loss, test_acc = validate_one_epoch(model=model, data_loader=valid_dls, loss_func=loss_func)
        print(f"Epoch {epoch+1},Train Loss: {train_loss:.4f}, Test Loss: {test_loss:.4f}, Valid Accuracy: {test_acc:.4f}")

The fit function uses a for loop to iterate over the number of training epochs. In each iteration, it calls the following functions:

  • train_one_epoch: It trains the model for one epoch using the training data and optimizer.
  • validate_one_epoch: It evaluates the model on the validation dataset and returns the loss and accuracy.

It prints the training loss, validation loss and validation accuracy for each epoch. Let’s use the fit function to train our model.

fit(epochs, model, loss_func, optimizer, train_dls, test_dls)
Epoch 1,Train Loss: 2.2945, Test Loss: 2.2852, Valid Accuracy: 0.1614
Epoch 2,Train Loss: 2.2770, Test Loss: 2.2659, Valid Accuracy: 0.2339
Epoch 3,Train Loss: 2.2564, Test Loss: 2.2426, Valid Accuracy: 0.3347
Epoch 4,Train Loss: 2.2310, Test Loss: 2.2131, Valid Accuracy: 0.4615
Epoch 5,Train Loss: 2.1983, Test Loss: 2.1752, Valid Accuracy: 0.5695

As we can observe, the model is training effectively and we were able to increase the accuracy from 7-8% to 57% by training for only five epochs.

Now, we will replace the optimizer in our fitfunction to the AdamW optimizer from the torch.optim module, and rerun the fit function.

model = MLP(784,10).to(dev)
optimizer = torch.optim.AdamW(model.parameters(), lr=1e-3)
fit(epochs, model, loss_func, optimizer, train_dls, test_dls)
Epoch 1,Train Loss: 0.3480, Test Loss: 0.1660, Valid Accuracy: 0.9501
Epoch 2,Train Loss: 0.1392, Test Loss: 0.1289, Valid Accuracy: 0.9597
Epoch 3,Train Loss: 0.0895, Test Loss: 0.0931, Valid Accuracy: 0.9699
Epoch 4,Train Loss: 0.0659, Test Loss: 0.0758, Valid Accuracy: 0.9759
Epoch 5,Train Loss: 0.0490, Test Loss: 0.0700, Valid Accuracy: 0.9797

By utilizing the AdamW optimizer and MLP model, we can see that after 5 epochs, we have a highly accurate model with a 98% accuracy as compared to random prediction of 7-8%.

8.7 Training using a simple CNN model

As previously demonstrated, the fit function is highly adaptable as we were able to change our optimizer without making any modifications to the function. Now, we will replace our MLP model with a CNN (Convolutional Neural Network) model. We will begin by defining a basic CNN network.

import torch.nn.functional as F
class Mnist_CNN(nn.Module):
    def __init__(self):
        self.conv1 = nn.Conv2d(1, 16, kernel_size=3, stride=2, padding=1)
        self.conv2 = nn.Conv2d(16, 32, kernel_size=3, stride=2, padding=1)
        self.conv3 = nn.Conv2d(32, 10, kernel_size=3, stride=2, padding=1)

    def forward(self, xb):
        xb = xb.view(-1, 1, 28, 28)
        xb = F.relu(self.conv1(xb))
        xb = F.relu(self.conv2(xb))
        xb = F.relu(self.conv3(xb))
        xb = F.avg_pool2d(xb, 4)
        return xb.view(-1, xb.size(1))

The code above defines a class called Mnist_CNN which is a subclass of nn.Module. It creates an object of the class and initiates three 2D convolutional layers(conv1, conv2, conv3) with different input and output channels, kernel size, stride and padding. The forward method applies the convolution operation on the input tensor with relu activation function, then average pooling is applied to the output tensor and the final output tensor is reshaped to a 1-D tensor.

Now, we can pass an instance of this model to the fit function for training and validation.

model = Mnist_CNN().to(dev)
optimizer = torch.optim.AdamW(model.parameters(), lr=1e-3)
fit(epochs, model, loss_func, optimizer, train_dls, test_dls)
Epoch 1,Train Loss: 1.8228, Test Loss: 1.4976, Valid Accuracy: 0.5442
Epoch 2,Train Loss: 1.3562, Test Loss: 1.2602, Valid Accuracy: 0.5958
Epoch 3,Train Loss: 1.2113, Test Loss: 1.1522, Valid Accuracy: 0.6144
Epoch 4,Train Loss: 1.1286, Test Loss: 1.0886, Valid Accuracy: 0.6187
Epoch 5,Train Loss: 1.0741, Test Loss: 1.0454, Valid Accuracy: 0.6308

As can be observed, we are able to seamlessly switch from an MLP to a CNN model by utilizing the adaptable fit function and train the model.

8.8 Conclusion

In this chapter, we progressed from a basic linear regression example to building an image classifier using MLP and CNN models. We gained practical experience in creating custom Dataset and Dataloader objects and were introduced to the torchvision library for simplifying this process. Additionally, we developed a versatile fit function, which can be utilized with various models, optimizers, and loss functions for training our models.

The idea of flexibility as demonstrated in the fit function is not unique, and there are many frameworks that aim to simplify the model training process by offering high-level APIs, allowing machine learning scientists to focus on building and solving problems, while the frameworks handle the majority of the complexity. Later in the book, we will repeat the same exercise using the fastai library, which is a highly flexible and performant framework built on top of PyTorch, and observe how we can construct neural networks with minimal lines of code.