## 6.1 Introduction to Optimizers

In previous chapters, we saw how to load data and trained a linear regression model using mini-batch gradient descent. In practice, we don’t need to write our own implementation of gradient descent as Pytorch provides various inbuilt optimizers algorithm. There are many different optimizers available in PyTorch, and each one has its own set of hyperparameters that can be tuned. Some of the most popular optimizers include:

• SGD (Stochastic Gradient Descent): This is a simple optimizer that updates the model’s parameters using the gradient of the loss with respect to the parameters
• Adam (Adaptive Moment Estimation): This optimizer is based on the concept of momentum, which can help the optimizer to converge more quickly to a good solution. Adam also includes adaptive learning rates, which means that the optimizer can automatically adjust the learning rates of different parameters based on the historical gradient information
• RMSprop (Root Mean Square Propagation): This optimizer is similar to Adam, but it uses a different weighting for the gradient history

## 6.2 Exercise: Linear Regression

Let’s look at how we can start using Pytorch’s optimizer by continuing the previous linear regression example. Notice, this time we will use four input features instead of one in our previous examples.

``````# Importing required functions
import torch
import numpy as np
from sklearn.datasets import make_regression
from sklearn.model_selection import train_test_split

# Generate dataset with linear property
X, y, coef = make_regression(
n_samples=1500,
n_features=4,  # Using four features
n_informative=4,
noise=0.3,
coef=True,
random_state=0,
bias=2
)

print(f'Input feature size: {X.shape}')``````
``Input feature size: (1500, 4)``

Now we will create a custom `Dataset` class.

``````# Creating our custom TabularDataset
class TabularDataset(Dataset):
def __init__(self, data, targets):
self.data = data
self.targets = targets

def __len__(self):
return self.data.shape

def __getitem__(self, idx):
current_sample = self.data[idx]
current_target = self.targets[idx]
return {
"X": torch.tensor(current_sample, dtype=torch.float),
"y": torch.tensor(current_target, dtype=torch.float)
}``````

We have modified the `TabularDataset` class to handle additional features. Now, the class takes two inputs: `data` which includes our four features, and `targets` which is our target variable.

``````# Making a train-test split
X_train, X_test, y_train, y_test = train_test_split(
X, y, test_size=0.33)

# Creating Tabular Dataset
train_dataset = TabularDataset(X_train, y_train)
test_dataset = TabularDataset(X_test, y_test)

We have divided our sample into a training set and a test set and used the `TabularDataset` class to create train and test objects. Finally, we created data loaders for the training set and test set using these objects.

Note

In the code, the training data is shuffled using the Dataloader while the testing data is not. This is a common practice when training a machine learning model.

``````class Linear:
def __init__(self, n_in, n_out):
self.params = [self.w, self.b]

def forward(self, x):
return x @ self.w + self.b

# Initializing model
torch.manual_seed(4)
model = Linear(X.shape, 1)

print(f"Shape of weights: {model.w.shape}")
print(f"Shape of bias: {model.b.shape}")``````
``````Shape of weights: torch.Size([4, 1])
Shape of bias: torch.Size()``````

We are using the same linear model as last time but this time it will take four inputs instead of one input.

``optimizer = torch.optim.SGD(model.params, lr=1e-3)``

Next, we will define our optimizer. We will use PyTorch’s implementation of stochastic gradient descent (SGD) by initializing `torch.optim.SGD`. Here we are passing the model parameters which need to get modified during the training process and a hyperparameter learning rate (`lr`) of `1e-3`.

``````def train_one_epoch(model, data_loader, optimizer):
# Taking one mini-batch
y_pred = model.forward(batch['X']).squeeze()
y_true = batch['y']

# Calculation mean square error per min-batch
loss = torch.square(y_pred - y_true).sum()

loss.backward()

# Update model parameters and zero grad
optimizer.step()

loss = 0
y_pred = model.forward(batch['X']).squeeze()
y_true = batch['y']
loss += torch.square(y_pred - y_true).sum()

For the training loop (defined in `train_one_epoch`), we will go through each mini-batch and do the following:

• Use the model to make a prediction
• Calculate the Mean Squared Error (MSE) and the gradients
• Update the model parameters using the optimizer’s step() function
• Reset the gradients to zero for the next mini-batch using the optimizer’s zero_grad() function”

In the validation loop (defined in `validate_one_epoch`), we will process each mini-batch as follows:

• Use the trained model to make a prediction
• Calculate the Mean Squared Error (MSE) loss and return the overall MSE at the end

Now let’s run through some epochs and train our model.

``````for epoch in range(10):
# run one training loop
# run validation loop on training to compute training loss
# run validation loop on testing to compute test loss

print(f"Epoch {epoch},Train MSE: {train_loss:.4f} Test MSE: {test_loss:.3f}")

print(f"Actual coefficients are: \n{np.round(coef,4)} \nTrained model weights are: \n{np.round(model.w.squeeze().detach().numpy(),4)}")
print(f"Actual Bias term is {2} \nTrained model bias term is \n{model.b.squeeze().detach().numpy().item():.4f}")``````
``````Epoch 0,Train MSE: 13657.7461 Test MSE: 16039.912
Epoch 1,Train MSE: 267.4445 Test MSE: 319.128
Epoch 2,Train MSE: 11.0232 Test MSE: 11.422
Epoch 3,Train MSE: 5.9071 Test MSE: 5.284
Epoch 4,Train MSE: 5.8251 Test MSE: 5.184
Epoch 5,Train MSE: 5.8193 Test MSE: 5.183
Epoch 6,Train MSE: 5.8243 Test MSE: 5.176
Epoch 7,Train MSE: 5.8181 Test MSE: 5.243
Epoch 8,Train MSE: 5.8192 Test MSE: 5.192
Epoch 9,Train MSE: 5.8160 Test MSE: 5.230
Actual coefficients are:
[63.0061 44.1452 84.3648  9.3378]
Trained model weights are:
[63.0008 44.1527 84.3725  9.3218]
Actual Bias term is 2
Trained model bias term is
1.9968``````

As shown above, our model has fit the data well. The actual coefficients and bias used to generate the random data roughly match the weights and bias terms of our model.

## 6.3 Conclusion

In PyTorch, optimizers are used to update the parameters of a model during training. Optimizers adjust the parameters of the model based on the gradients of the loss function with respect to the parameters, in order to minimize the loss.

There are many different optimizers available in PyTorch, including SGD, Adam, RMSprop, and more. You can choose the optimizer that works best for your specific problem and model architecture.