import torch
import numpy as np
2 Tensors
2.1 What are Tensors?
PyTorch provides tensors as its primary data structure. Tensors are similar to NumPy arrays, but they can be used on a GPU to accelerate the computation. PyTorch tensors are similar to NumPy arrays, but they have additional functionality (automatic differentiation) and are designed to take advantage of GPUs for acceleration. Similar to NumPy, tensors in PyTorch support a variety of operations, including indexing, slicing, math operations, linear algebra operations, and more. Let’s dive in by importing the library.
2.2 Initializing a Tensor
There are several ways to initialize tensors in PyTorch. Here are some examples:
Initializing from an iterator like a list
# Initialize a tensor from a list
= torch.tensor([1, 2, 3, 4])
tensor_from_list print("Tensor from list: \n", tensor_from_list)
# Initialize a tensor from a nested list
= torch.tensor([[1, 2], [3, 4]])
tensor_from_nested_list print("Tensor from nested list: \n", tensor_from_nested_list)
Tensor from list:
tensor([1, 2, 3, 4])
Tensor from nested list:
tensor([[1, 2],
[3, 4]])
Initializing from a numpy array
# Create a NumPy array
= np.array([[1, 2], [3, 4]])
numpy_array
# Initialize a tensor from a NumPy array
= torch.from_numpy(numpy_array)
tensor_from_numpy print("Tensor from np array: \n", tensor_from_numpy)
Tensor from np array:
tensor([[1, 2],
[3, 4]])
Initializing from another tensor
# Create a tensor
= torch.tensor([1, 2, 3, 4])
original_tensor
# Initialize a new tensor from the original tensor
= original_tensor.clone()
new_tensor print("Tensor from another tensor: \n", new_tensor)
Tensor from another tensor:
tensor([1, 2, 3, 4])
Constant or random initialization
# Initialize a tensor with all elements set to zero
= torch.zeros(3, 4)
tensor_zeros print("Tensor with all elements set to zero: \n", tensor_zeros)
# Initialize a tensor with all elements set to one
= torch.ones(3, 4)
tensor_ones print("\n Tensor with all elements set to one: \n", tensor_ones)
# Initialize a tensor with all elements set to a specific value
= torch.full((3, 4), fill_value=2.5)
tensor_full print("\n Tensor with all elements set to a specific value: \n", tensor_full)
# Initialize a tensor with random values
= torch.rand(3, 4)
tensor_rand print("\n Tensor with random initialization: \n", tensor_rand)
# Initialize a tensor with random values from a normal distribution
= torch.randn(3, 4)
tensor_randn print("\n Tensor with random values from a normal distribution: \n", tensor_randn)
Tensor with all elements set to zero:
tensor([[0., 0., 0., 0.],
[0., 0., 0., 0.],
[0., 0., 0., 0.]])
Tensor with all elements set to one:
tensor([[1., 1., 1., 1.],
[1., 1., 1., 1.],
[1., 1., 1., 1.]])
Tensor with all elements set to a specific value:
tensor([[2.5000, 2.5000, 2.5000, 2.5000],
[2.5000, 2.5000, 2.5000, 2.5000],
[2.5000, 2.5000, 2.5000, 2.5000]])
Tensor with random initialization:
tensor([[0.8675, 0.0161, 0.5472, 0.7002],
[0.6551, 0.3049, 0.4088, 0.6341],
[0.2363, 0.8951, 0.0335, 0.5779]])
Tensor with random values from a normal distribution:
tensor([[ 1.0550, 0.9214, -1.3023, 0.4119],
[-0.4691, 0.8733, 0.7910, -2.3932],
[-0.6304, -0.8792, 0.4188, 0.4221]])
2.3 Tensor Attributes
It has several attributes that you can access to get information about the tensor. Here are some common attributes of a PyTorch tensor:
shape
: returns the shape of the tensor as a tuple of integers. For example, if the tensor has dimensions (batch_size, num_channels, height, width), the shape would be (batch_size, num_channels, height, width).dtype
: returns the data type of the tensor. For example, the data type could betorch.float32
ortorch.int64
.device
: returns the device on which the tensor is stored. This can be the CPU or a GPU.requires_grad
: a boolean flag indicating whether the tensor requires gradient computation. If set to True, the tensor’s gradients will be computed during backpropagation.grad
: a tensor containing the gradient of the tensor with respect to some scalar value. This attribute is typically used during training with gradient descent.
You can access these attributes by calling them on a tensor object. For example:
= torch.randn(3, 4)
tensor_randn print(f"Shape of tensor : {tensor_randn.shape}")
print(f"Type of tensor : {tensor_randn.dtype}")
print(f"Device tensor is stored on : {tensor_randn.device}")
print(f"Autograd enabled : {tensor_randn.requires_grad}")
print(f"Any stored gradient : {tensor_randn.grad}")
Shape of tensor : torch.Size([3, 4])
Type of tensor : torch.float32
Device tensor is stored on : cpu
Autograd enabled : False
Any stored gradient : None
As we can see above we initialized a random tensor of shape (3,4)
with a torch.float32
data type and it’s currently on a CPU device. Currently, automatic gradient calculations are disabled and no gradient is stored in the tensor.
There are several other attributes that you can access, such as ndim
, size
, numel
, storage
, etc. You can find more information about these attributes in the PyTorch Tensor documentation.
2.4 Tensor Operations
There are several operations you can perform on tensors, let’s look at the most commonly used operations.
2.4.1 Moving tensor from CPU to GPU
To move a tensor from CPU to GPU is a simple command but probably the one which people will use the most.
"cuda") tensor_randn.to(
tensor([[-0.0984, -1.3804, 0.3343, -0.1623],
[ 0.9155, -0.8620, -0.3943, -0.2997],
[-0.1336, -0.7395, -0.7143, -0.0735]], device='cuda:0')
As we can see the tensor_randn is now moved to a Cuda(GPU) device.
2.4.2 Slicing and Indexing
PyTorch tensors similar to NumPy arrays support various slicing and indexing operations.
= torch.randn(3, 4)
tensor_randn tensor_randn
tensor([[-1.3470, 0.2204, 0.2963, -0.9745],
[ 0.1867, -1.8338, -1.1872, -1.2987],
[ 0.0517, -0.3206, 0.3584, -0.4778]])
print(f"First row: \n{tensor_randn[0]}")
print(f"\n First column: \n {tensor_randn[:, 0]}")
print(f"\n Last column: {tensor_randn[..., -1]}")
print(f"\n Selected columns: \n {tensor_randn[:,2:4]}")
## Assignment of column to zero
1] = 0
tensor_randn[:,print("\n Assigning column to zero: \n", tensor_randn)
First row:
tensor([-1.3470, 0.2204, 0.2963, -0.9745])
First column:
tensor([-1.3470, 0.1867, 0.0517])
Last column: tensor([-0.9745, -1.2987, -0.4778])
Selected columns:
tensor([[ 0.2963, -0.9745],
[-1.1872, -1.2987],
[ 0.3584, -0.4778]])
Assigning column to zero:
tensor([[-1.3470, 0.0000, 0.2963, -0.9745],
[ 0.1867, 0.0000, -1.1872, -1.2987],
[ 0.0517, 0.0000, 0.3584, -0.4778]])
2.4.3 Concatenation
The torch.cat
function can be used to concatenate or join multiple tensors together, which is often useful when working with deep learning models.
Let’s take our previously defined tensors and check their shape.
tensor_ones.shape, tensor_zeros.shape, tensor_rand.shape
(torch.Size([3, 4]), torch.Size([3, 4]), torch.Size([3, 4]))
We can concatenate these tensors column wise by using torch.cat
with dim=1
. We will get a resultant tensor with shape (3,12)
.
= torch.cat([tensor_ones, tensor_zeros, tensor_rand], dim=1)
concat_tensor print(concat_tensor.shape)
concat_tensor
torch.Size([3, 12])
tensor([[1.0000, 1.0000, 1.0000, 1.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.8675,
0.0161, 0.5472, 0.7002],
[1.0000, 1.0000, 1.0000, 1.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.6551,
0.3049, 0.4088, 0.6341],
[1.0000, 1.0000, 1.0000, 1.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.2363,
0.8951, 0.0335, 0.5779]])
We can concatenate these tensors row wise by using torch.cat
with dim=0
. We will get a resultant tensor with shape (9,4)
.
= torch.cat([tensor_ones, tensor_zeros, tensor_rand], dim=0)
concat_tensor print(concat_tensor.shape)
concat_tensor
torch.Size([9, 4])
tensor([[1.0000, 1.0000, 1.0000, 1.0000],
[1.0000, 1.0000, 1.0000, 1.0000],
[1.0000, 1.0000, 1.0000, 1.0000],
[0.0000, 0.0000, 0.0000, 0.0000],
[0.0000, 0.0000, 0.0000, 0.0000],
[0.0000, 0.0000, 0.0000, 0.0000],
[0.8675, 0.0161, 0.5472, 0.7002],
[0.6551, 0.3049, 0.4088, 0.6341],
[0.2363, 0.8951, 0.0335, 0.5779]])
2.4.4 Arithmetic operations
In PyTorch, you can perform arithmetic operations on tensors in a similar way to how you would perform them on numpy arrays. Let’s look at some common arithmetic operations -
Element wise addition
= torch.randn((3,4))
tnsr1 print(f"Tensor 1: \n", tnsr1)
= torch.randn((3,4))
tnsr2 print(f"\n Tensor 2: \n", tnsr2)
## Addition
= tnsr1 + tnsr2
tensor_add print(f"\n Tensor additions using + : \n", tensor_add)
= tnsr1.add(tnsr2)
tensor_add print(f"\n Tensor additions using .add : \n", tensor_add)
Tensor 1:
tensor([[-0.4685, -0.7848, -0.4198, 0.0890],
[ 0.2496, 0.2578, 0.6366, -2.0815],
[-1.6914, -0.8824, 1.0809, 1.5308]])
Tensor 2:
tensor([[-0.3125, 1.0860, 0.7340, 0.2249],
[-0.9887, 0.2265, -0.5214, -1.5676],
[ 0.6817, 0.1099, -0.5298, -0.3109]])
Tensor additions using + :
tensor([[-0.7810, 0.3013, 0.3142, 0.3139],
[-0.7391, 0.4843, 0.1152, -3.6492],
[-1.0097, -0.7725, 0.5511, 1.2199]])
Tensor additions using .add :
tensor([[-0.7810, 0.3013, 0.3142, 0.3139],
[-0.7391, 0.4843, 0.1152, -3.6492],
[-1.0097, -0.7725, 0.5511, 1.2199]])
Element wise subtraction
## Subtraction
= tnsr1 - tnsr2
tensor_sub print(f"\n Tensor subtraction using - : \n", tensor_sub)
= tnsr1.sub(tnsr2)
tensor_sub print(f"\n Tensor subtraction using .sub : \n", tensor_sub)
Tensor subtraction using - :
tensor([[-0.1561, -1.8708, -1.1537, -0.1359],
[ 1.2384, 0.0313, 1.1580, -0.5139],
[-2.3732, -0.9923, 1.6107, 1.8417]])
Tensor subtraction using .sub :
tensor([[-0.1561, -1.8708, -1.1537, -0.1359],
[ 1.2384, 0.0313, 1.1580, -0.5139],
[-2.3732, -0.9923, 1.6107, 1.8417]])
Element wise multiplication
## Multiplication
= tnsr1 * tnsr2
tensor_mul print(f"\n Tensor element-wise multiplication using * : \n", tensor_mul)
= tnsr1.mul(tnsr2)
tensor_mul print(f"\n Tensor element-wise multiplication using .mul : \n", tensor_mul)
Tensor element-wise multiplication using * :
tensor([[ 0.1464, -0.8523, -0.3081, 0.0200],
[-0.2468, 0.0584, -0.3319, 3.2631],
[-1.1531, -0.0970, -0.5727, -0.4759]])
Tensor element-wise multiplication using .mul :
tensor([[ 0.1464, -0.8523, -0.3081, 0.0200],
[-0.2468, 0.0584, -0.3319, 3.2631],
[-1.1531, -0.0970, -0.5727, -0.4759]])
Element wise division
## Division
= tnsr1 / tnsr2
tensor_div print(f"\n Tensor element-wise division using / : \n", tensor_div)
= tnsr1.div(tnsr2)
tensor_div print(f"\n Tensor element-wise division using .div : \n", tensor_div)
Tensor element-wise division using + :
tensor([[ 1.4994, -0.7226, -0.5719, 0.3958],
[-0.2525, 1.1381, -1.2209, 1.3278],
[-2.4811, -8.0272, -2.0401, -4.9238]])
Tensor element-wise division using .div :
tensor([[ 1.4994, -0.7226, -0.5719, 0.3958],
[-0.2525, 1.1381, -1.2209, 1.3278],
[-2.4811, -8.0272, -2.0401, -4.9238]])
Matrix multiplication
= tnsr1 @ tnsr2.T
tensor_mm print(f"\n Tensor matrix multiplication using @: \n", tensor_mm)
= tnsr1.matmul(tnsr2.T)
tensor_mm print(f"\n Tensor matrix multiplication using .matmul: \n", tensor_mm)
Tensor matrix multiplication using @:
tensor([[-0.9940, 0.3648, -0.2109],
[ 0.2010, 2.7427, 0.5084],
[ 0.7078, -1.4908, -2.2987]])
Tensor matrix multiplication using .matmul:
tensor([[-0.9940, 0.3648, -0.2109],
[ 0.2010, 2.7427, 0.5084],
[ 0.7078, -1.4908, -2.2987]])
Summing it up
Summing tensors along rows and columns is a common operation. Here is the syntax for this operation:
print(f"Tensor: \n {tnsr1}" )
print(f"\n All up sum: \n {tnsr1.sum()}")
print(f"\n Column wise sum: \n {tnsr1.sum(0)}")
print(f"\n Row wise sum: \n {tnsr1.sum(1)}")
Tensor:
tensor([[-0.4685, -0.7848, -0.4198, 0.0890],
[ 0.2496, 0.2578, 0.6366, -2.0815],
[-1.6914, -0.8824, 1.0809, 1.5308]])
All up sum:
-2.48371958732605
Column wise sum:
tensor([-1.9103, -1.4094, 1.2977, -0.4617])
Row wise sum:
tensor([-1.5841, -0.9375, 0.0378])
2.5 Why GPUs?
Deep learning models often involve large amounts of matrix operations such as matrix multiplication. Let’s do a speed comparison b/w NumPy CPU implementation, Pytorch CPU, and GPU implementation.
2.5.1 Matrix multiplication using NumPy
Let’s initialize one tensor of size (1000,64,64)
and one tensor of size (64,32)
and let’s do a matrix multiplication speed comparison
= np.random.randn(1000, 64, 64)
arr1 = np.random.randn(64, 32) arr2
%timeit -n 50 res = np.matmul(arr1, arr2)
9.7 ms ± 201 µs per loop (mean ± std. dev. of 7 runs, 50 loops each)
As we can see matrix multiplication on NumPy which uses a highly optimized matrix multiplication does the above operation in 9.7 milliseconds.
2.5.2 Matrix multiplication using PyTorch on CPU
Now let’s do the same operation using PyTorch tensors on CPU.
= torch.from_numpy(arr1)
tnsr1 = torch.from_numpy(arr2) tnsr2
%timeit -n 50 res = tnsr1 @ tnsr2
2.78 ms ± 127 µs per loop (mean ± std. dev. of 7 runs, 50 loops each)
We can see that PyTorch on CPU performed the same operation in 2.78 milliseconds which is roughly 3 times faster than the NumPy version.
2.5.3 Matrix multiplication using pytorch on GPU
Let’s do the same operation on GPU using Pytorch.
= tnsr1.to("cuda")
tnsr1 = tnsr2.to("cuda") tnsr2
%timeit -n 50 res = (tnsr1 @ tnsr2)
15.6 µs ± 4.32 µs per loop (mean ± std. dev. of 7 runs, 50 loops each)
As demonstrated by the Matrix multiplication example, the GPU version was completed in 15.6 microseconds, a significant improvement over both the Pytorch CPU version (which took 2.8 milliseconds) and the NumPy implementation (which took 9.7 milliseconds). This speedup is even more pronounced when working with larger matrices.