import torch
## initializing a anb b with requires grad = True
= torch.tensor([5.], requires_grad=True)
a = torch.tensor([6.], requires_grad=True)
b
= a**3 - b**2
y y
tensor([89.], grad_fn=<SubBackward0>)
In PyTorch, autograd automatically computes gradients. It is a key part of PyTorch’s deep learning framework and is used to optimize model parameters during training by computing gradients of the loss function with respect to the model’s parameters.
Autograd can compute gradients for both scalar and vector-valued functions, and it can do so efficiently for a large variety of differentiable operations, including matrix and element-wise operations, as well as higher-order derivatives.
Let’s take a simple example of looking at a function. \[y = a^3 - b^2 + 3\]
Differentiation of this function with respect to a and b is going to be:
\[\frac{dy}{da} = 3a^2\]
\[\frac{dy}{db} = -2b\]
So if: \[a = 5, b = 6\]
Gradient with respect to a and b will be: \[\frac{dy}{da} = 3a^2 => 3*5^2 => 75\]
\[\frac{dy}{db} = -2b => -2*6 => -12\]
Now let’s observe these in PyTorch. To make a tensor compute gradients automatically we can initialize them with requires_grad = True
.
import torch
## initializing a anb b with requires grad = True
= torch.tensor([5.], requires_grad=True)
a = torch.tensor([6.], requires_grad=True)
b
= a**3 - b**2
y y
tensor([89.], grad_fn=<SubBackward0>)
To compute the derivatives we can call the backward
method and retrieve gradients from a and b by calling a.grad
and b.grad
.
y.backward()print(f"Gradient of a and b is {a.grad.item()} and {b.grad.item()} respectively.")
Gradient of a and b is 75.0 and -12.0 respectively.
As computed above the gradient of a
and b
is 75 and -12 respectively.
In this section, we will implement a linear regression model in PyTorch and use the Autograd package to optimize the model’s parameters through gradient descent.
Let’s begin by generating linear data that exhibits linear characteristics. We will use the sklearn make_regression
function to do the same.
## Importing required functions
import torch
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from sklearn.datasets import make_regression
import seaborn as sns
import time
from IPython.display import clear_output
"dark")
sns.set_style(%matplotlib inline
def plot_data(x, y, y_pred=None, label=None):
=True)
clear_output(wait=X.squeeze(), y=y)
sns.scatterplot(xif y_pred is not None:
=X.squeeze(), y=y_pred.squeeze(), color='red')
sns.lineplot(x"Input")
plt.xlabel("Target")
plt.ylabel(if label:
plt.title(label)
plt.show()0.5) time.sleep(
## Generate some dataset
= make_regression(n_samples=1500,
X, y, coef =1,
n_features=1,
n_informative=0.3,
noise=True,
coef=0,
random_state=2)
bias= torch.tensor(X, dtype=torch.float32)
X = torch.tensor(y, dtype=torch.float32)
y =f"Coefficient: {coef:.2f}, Bias:{2}") plot_data(X, y, label
Since we are only building a simple linear regression with one feature and one bias term. It can be defined as the following -
class Linear:
def __init__(self, n_in, n_out):
self.w = torch.randn(n_in, n_out).requires_grad_(True)
self.b = torch.randn(n_out).requires_grad_(True)
self.params = [self.w, self.b]
def forward(self, x):
return x @ self.w + self.b
This code above defines a class called Linear
that represents a simple linear regression model. In this case, the __init__
takes two arguments: n_in
and n_out
, which represent the dimensions of the input and output of the linear regression model. The __init__
method initializes the weight matrix w
and the bias vector b
to random values, and also sets them to be requires_grad
to True
, which means that the gradients of these parameters will be calculated during backpropagation. The forward
method defines the forward pass of the linear regression model. It takes an input x
and applies the linear transformation defined by w
and b
, returning the model prediction.
Let’s initialize the model and make a random prediction.
## Initializing model
4)
torch.manual_seed(= Linear(X.shape[1], 1)
model
## Making a random prediction
with torch.no_grad():
= model.forward(X).numpy()
y_pred
## Plotting the prediction
plot_data(X, y, y_pred)
The code above generates random predictions that do not fit the data well. The torch.no_grad()
context manager is used to prevent torch from calculating gradients for the operations within the context.
To improve the model’s performance, we can use the autograd
function to create a simple gradient descent function called step
, which runs one epoch of training. This will allow us to optimize the model’s parameters and improve the accuracy of our predictions.
def step(X, y, model, lr=0.1):
= model.forward(X)
y_pred
## Calculation mean square error
= torch.square(y - y_pred.squeeze()).mean()
loss
## Computing gradients
loss.backward()
## Updating parameters
with torch.no_grad():
for param in model.params:
-= lr * param.grad.data
param
param.grad.data.zero_()return loss
Let’s walk through the step
function:
for i in range(30):
# run one gradient descent epoch
= step(X, y, model)
loss with torch.no_grad():
= model.forward(X).numpy()
y_pred # plot each step with delay
=f"Step: {i+1}, MSE = {loss:.2f}") plot_data(X, y, y_pred, label
As observed above, our model’s performance improved with each epoch and the mean squared error (MSE) decreased consistently.
print(f"True coefficient is {coef.item():.2f} and predicted coefficient is {model.w.item():.2f}.")
print(f"True bias term is {2} and predicted coefficient is {model.b.item():.2f}.")
True coefficient is 0.48 and predicted coefficient is 0.47.
True bias term is 2 and predicted coefficient is 2.00.
Autograd is a key part of PyTorch’s deep learning framework and is an essential tool for optimizing and training neural network models. It is designed to make it easy to implement and train complex models by automatically computing gradients for differentiable operations.