2  Tensors

2.1 What are Tensors?

PyTorch provides tensors as its primary data structure. Tensors are similar to NumPy arrays, but they can be used on a GPU to accelerate the computation. PyTorch tensors are similar to NumPy arrays, but they have additional functionality (automatic differentiation) and are designed to take advantage of GPUs for acceleration. Similar to NumPy, tensors in PyTorch support a variety of operations, including indexing, slicing, math operations, linear algebra operations, and more. Let’s dive in by importing the library.

import torch
import numpy as np

2.2 Initializing a Tensor

There are several ways to initialize tensors in PyTorch. Here are some examples:

Initializing from an iterator like a list

# Initialize a tensor from a list
tensor_from_list = torch.tensor([1, 2, 3, 4])
print("Tensor from list: \n", tensor_from_list)

# Initialize a tensor from a nested list
tensor_from_nested_list = torch.tensor([[1, 2], [3, 4]])
print("Tensor from nested list: \n", tensor_from_nested_list)
Tensor from list: 
 tensor([1, 2, 3, 4])
Tensor from nested list: 
 tensor([[1, 2],
        [3, 4]])

Initializing from a numpy array

# Create a NumPy array
numpy_array = np.array([[1, 2], [3, 4]])

# Initialize a tensor from a NumPy array
tensor_from_numpy = torch.from_numpy(numpy_array)
print("Tensor from np array: \n", tensor_from_numpy)
Tensor from np array: 
 tensor([[1, 2],
        [3, 4]])

Initializing from another tensor

# Create a tensor
original_tensor = torch.tensor([1, 2, 3, 4])

# Initialize a new tensor from the original tensor
new_tensor = original_tensor.clone()
print("Tensor from another tensor: \n", new_tensor)
Tensor from another tensor: 
 tensor([1, 2, 3, 4])

Constant or random initialization

# Initialize a tensor with all elements set to zero
tensor_zeros = torch.zeros(3, 4)
print("Tensor with all elements set to zero: \n", tensor_zeros)

# Initialize a tensor with all elements set to one
tensor_ones = torch.ones(3, 4)
print("\n Tensor with all elements set to one: \n", tensor_ones)

# Initialize a tensor with all elements set to a specific value
tensor_full = torch.full((3, 4), fill_value=2.5)
print("\n Tensor with all elements set to a specific value: \n", tensor_full)

# Initialize a tensor with random values
tensor_rand = torch.rand(3, 4)
print("\n Tensor with random initialization: \n", tensor_rand)

# Initialize a tensor with random values from a normal distribution
tensor_randn = torch.randn(3, 4)
print("\n Tensor with random values from a normal distribution: \n", tensor_randn)
Tensor with all elements set to zero: 
 tensor([[0., 0., 0., 0.],
        [0., 0., 0., 0.],
        [0., 0., 0., 0.]])

 Tensor with all elements set to one: 
 tensor([[1., 1., 1., 1.],
        [1., 1., 1., 1.],
        [1., 1., 1., 1.]])

 Tensor with all elements set to a specific value: 
 tensor([[2.5000, 2.5000, 2.5000, 2.5000],
        [2.5000, 2.5000, 2.5000, 2.5000],
        [2.5000, 2.5000, 2.5000, 2.5000]])

 Tensor with random initialization: 
 tensor([[0.8675, 0.0161, 0.5472, 0.7002],
        [0.6551, 0.3049, 0.4088, 0.6341],
        [0.2363, 0.8951, 0.0335, 0.5779]])

 Tensor with random values from a normal distribution: 
 tensor([[ 1.0550,  0.9214, -1.3023,  0.4119],
        [-0.4691,  0.8733,  0.7910, -2.3932],
        [-0.6304, -0.8792,  0.4188,  0.4221]])

2.3 Tensor Attributes

It has several attributes that you can access to get information about the tensor. Here are some common attributes of a PyTorch tensor:

  • shape: returns the shape of the tensor as a tuple of integers. For example, if the tensor has dimensions (batch_size, num_channels, height, width), the shape would be (batch_size, num_channels, height, width).
  • dtype: returns the data type of the tensor. For example, the data type could be torch.float32 or torch.int64.
  • device: returns the device on which the tensor is stored. This can be the CPU or a GPU.
  • requires_grad: a boolean flag indicating whether the tensor requires gradient computation. If set to True, the tensor’s gradients will be computed during backpropagation.
  • grad: a tensor containing the gradient of the tensor with respect to some scalar value. This attribute is typically used during training with gradient descent.

You can access these attributes by calling them on a tensor object. For example:

tensor_randn = torch.randn(3, 4)
print(f"Shape of tensor : {tensor_randn.shape}")
print(f"Type of tensor : {tensor_randn.dtype}")
print(f"Device tensor is stored on : {tensor_randn.device}")
print(f"Autograd enabled : {tensor_randn.requires_grad}")
print(f"Any stored gradient : {tensor_randn.grad}")
Shape of tensor : torch.Size([3, 4])
Type of tensor : torch.float32
Device tensor is stored on : cpu
Autograd enabled : False
Any stored gradient : None

As we can see above we initialized a random tensor of shape (3,4) with a torch.float32 data type and it’s currently on a CPU device. Currently, automatic gradient calculations are disabled and no gradient is stored in the tensor.

There are several other attributes that you can access, such as ndim, size, numel, storage, etc. You can find more information about these attributes in the PyTorch Tensor documentation.

2.4 Tensor Operations

There are several operations you can perform on tensors, let’s look at the most commonly used operations.

2.4.1 Moving tensor from CPU to GPU

To move a tensor from CPU to GPU is a simple command but probably the one which people will use the most.

tensor_randn.to("cuda")
tensor([[-0.0984, -1.3804,  0.3343, -0.1623],
        [ 0.9155, -0.8620, -0.3943, -0.2997],
        [-0.1336, -0.7395, -0.7143, -0.0735]], device='cuda:0')

As we can see the tensor_randn is now moved to a Cuda(GPU) device.

2.4.2 Slicing and Indexing

PyTorch tensors similar to NumPy arrays support various slicing and indexing operations.

tensor_randn = torch.randn(3, 4)
tensor_randn
tensor([[-1.3470,  0.2204,  0.2963, -0.9745],
        [ 0.1867, -1.8338, -1.1872, -1.2987],
        [ 0.0517, -0.3206,  0.3584, -0.4778]])
print(f"First row:  \n{tensor_randn[0]}")
print(f"\n First column: \n {tensor_randn[:, 0]}")
print(f"\n Last column: {tensor_randn[..., -1]}")
print(f"\n Selected columns: \n {tensor_randn[:,2:4]}")
## Assignment of column to zero
tensor_randn[:,1] = 0
print("\n Assigning column to zero: \n", tensor_randn)
First row:  
tensor([-1.3470,  0.2204,  0.2963, -0.9745])

 First column: 
 tensor([-1.3470,  0.1867,  0.0517])

 Last column: tensor([-0.9745, -1.2987, -0.4778])

 Selected columns: 
 tensor([[ 0.2963, -0.9745],
        [-1.1872, -1.2987],
        [ 0.3584, -0.4778]])

 Assigning column to zero: 
 tensor([[-1.3470,  0.0000,  0.2963, -0.9745],
        [ 0.1867,  0.0000, -1.1872, -1.2987],
        [ 0.0517,  0.0000,  0.3584, -0.4778]])

2.4.3 Concatenation

The torch.cat function can be used to concatenate or join multiple tensors together, which is often useful when working with deep learning models.

Let’s take our previously defined tensors and check their shape.

tensor_ones.shape, tensor_zeros.shape, tensor_rand.shape
(torch.Size([3, 4]), torch.Size([3, 4]), torch.Size([3, 4]))

We can concatenate these tensors column wise by using torch.cat with dim=1. We will get a resultant tensor with shape (3,12).

concat_tensor = torch.cat([tensor_ones, tensor_zeros, tensor_rand], dim=1)
print(concat_tensor.shape)
concat_tensor
torch.Size([3, 12])
tensor([[1.0000, 1.0000, 1.0000, 1.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.8675,
         0.0161, 0.5472, 0.7002],
        [1.0000, 1.0000, 1.0000, 1.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.6551,
         0.3049, 0.4088, 0.6341],
        [1.0000, 1.0000, 1.0000, 1.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.2363,
         0.8951, 0.0335, 0.5779]])

We can concatenate these tensors row wise by using torch.cat with dim=0. We will get a resultant tensor with shape (9,4).

concat_tensor = torch.cat([tensor_ones, tensor_zeros, tensor_rand], dim=0)
print(concat_tensor.shape)
concat_tensor
torch.Size([9, 4])
tensor([[1.0000, 1.0000, 1.0000, 1.0000],
        [1.0000, 1.0000, 1.0000, 1.0000],
        [1.0000, 1.0000, 1.0000, 1.0000],
        [0.0000, 0.0000, 0.0000, 0.0000],
        [0.0000, 0.0000, 0.0000, 0.0000],
        [0.0000, 0.0000, 0.0000, 0.0000],
        [0.8675, 0.0161, 0.5472, 0.7002],
        [0.6551, 0.3049, 0.4088, 0.6341],
        [0.2363, 0.8951, 0.0335, 0.5779]])

2.4.4 Arithmetic operations

In PyTorch, you can perform arithmetic operations on tensors in a similar way to how you would perform them on numpy arrays. Let’s look at some common arithmetic operations -

Element wise addition

tnsr1 = torch.randn((3,4))
print(f"Tensor 1: \n", tnsr1)
tnsr2 = torch.randn((3,4))
print(f"\n Tensor 2: \n", tnsr2)

## Addition
tensor_add = tnsr1 + tnsr2 
print(f"\n Tensor additions using + : \n", tensor_add)

tensor_add = tnsr1.add(tnsr2)
print(f"\n Tensor additions using .add : \n", tensor_add)
Tensor 1: 
 tensor([[-0.4685, -0.7848, -0.4198,  0.0890],
        [ 0.2496,  0.2578,  0.6366, -2.0815],
        [-1.6914, -0.8824,  1.0809,  1.5308]])

 Tensor 2: 
 tensor([[-0.3125,  1.0860,  0.7340,  0.2249],
        [-0.9887,  0.2265, -0.5214, -1.5676],
        [ 0.6817,  0.1099, -0.5298, -0.3109]])

 Tensor additions using + : 
 tensor([[-0.7810,  0.3013,  0.3142,  0.3139],
        [-0.7391,  0.4843,  0.1152, -3.6492],
        [-1.0097, -0.7725,  0.5511,  1.2199]])

 Tensor additions using .add : 
 tensor([[-0.7810,  0.3013,  0.3142,  0.3139],
        [-0.7391,  0.4843,  0.1152, -3.6492],
        [-1.0097, -0.7725,  0.5511,  1.2199]])

Element wise subtraction

## Subtraction
tensor_sub = tnsr1 - tnsr2 
print(f"\n Tensor subtraction using - : \n", tensor_sub)

tensor_sub = tnsr1.sub(tnsr2)
print(f"\n Tensor subtraction using .sub : \n", tensor_sub)

 Tensor subtraction using - : 
 tensor([[-0.1561, -1.8708, -1.1537, -0.1359],
        [ 1.2384,  0.0313,  1.1580, -0.5139],
        [-2.3732, -0.9923,  1.6107,  1.8417]])

 Tensor subtraction using .sub : 
 tensor([[-0.1561, -1.8708, -1.1537, -0.1359],
        [ 1.2384,  0.0313,  1.1580, -0.5139],
        [-2.3732, -0.9923,  1.6107,  1.8417]])

Element wise multiplication

## Multiplication
tensor_mul = tnsr1 * tnsr2 
print(f"\n Tensor element-wise multiplication using * : \n", tensor_mul)

tensor_mul = tnsr1.mul(tnsr2)
print(f"\n Tensor element-wise multiplication using .mul : \n", tensor_mul)

 Tensor element-wise multiplication using * : 
 tensor([[ 0.1464, -0.8523, -0.3081,  0.0200],
        [-0.2468,  0.0584, -0.3319,  3.2631],
        [-1.1531, -0.0970, -0.5727, -0.4759]])

 Tensor element-wise multiplication using .mul : 
 tensor([[ 0.1464, -0.8523, -0.3081,  0.0200],
        [-0.2468,  0.0584, -0.3319,  3.2631],
        [-1.1531, -0.0970, -0.5727, -0.4759]])

Element wise division

## Division
tensor_div = tnsr1 / tnsr2 
print(f"\n Tensor element-wise division using / : \n", tensor_div)

tensor_div = tnsr1.div(tnsr2)
print(f"\n Tensor element-wise division using .div : \n", tensor_div)

 Tensor element-wise division using + : 
 tensor([[ 1.4994, -0.7226, -0.5719,  0.3958],
        [-0.2525,  1.1381, -1.2209,  1.3278],
        [-2.4811, -8.0272, -2.0401, -4.9238]])

 Tensor element-wise division using .div : 
 tensor([[ 1.4994, -0.7226, -0.5719,  0.3958],
        [-0.2525,  1.1381, -1.2209,  1.3278],
        [-2.4811, -8.0272, -2.0401, -4.9238]])

Matrix multiplication

tensor_mm = tnsr1 @ tnsr2.T
print(f"\n Tensor matrix multiplication using @: \n", tensor_mm)

tensor_mm = tnsr1.matmul(tnsr2.T)
print(f"\n Tensor matrix multiplication using .matmul: \n", tensor_mm)

 Tensor matrix multiplication using @: 
 tensor([[-0.9940,  0.3648, -0.2109],
        [ 0.2010,  2.7427,  0.5084],
        [ 0.7078, -1.4908, -2.2987]])

 Tensor matrix multiplication using .matmul: 
 tensor([[-0.9940,  0.3648, -0.2109],
        [ 0.2010,  2.7427,  0.5084],
        [ 0.7078, -1.4908, -2.2987]])
Note

Observe that tnsr1 and tnsr2 have shape (3,4). To perform matrix multiplication, we used the .T function to transpose tnsr2, which changed its shape to (4,3). The resulting matrix multiplication has a shape (3,3).

Summing it up

Summing tensors along rows and columns is a common operation. Here is the syntax for this operation:

print(f"Tensor: \n {tnsr1}" )
print(f"\n All up sum: \n {tnsr1.sum()}")
print(f"\n Column wise sum: \n {tnsr1.sum(0)}")
print(f"\n Row wise sum: \n {tnsr1.sum(1)}")
Tensor: 
 tensor([[-0.4685, -0.7848, -0.4198,  0.0890],
        [ 0.2496,  0.2578,  0.6366, -2.0815],
        [-1.6914, -0.8824,  1.0809,  1.5308]])

 All up sum: 
 -2.48371958732605

 Column wise sum: 
 tensor([-1.9103, -1.4094,  1.2977, -0.4617])

 Row wise sum: 
 tensor([-1.5841, -0.9375,  0.0378])

2.5 Why GPUs?

Deep learning models often involve large amounts of matrix operations such as matrix multiplication. Let’s do a speed comparison b/w NumPy CPU implementation, Pytorch CPU, and GPU implementation.

2.5.1 Matrix multiplication using NumPy

Let’s initialize one tensor of size (1000,64,64) and one tensor of size (64,32) and let’s do a matrix multiplication speed comparison

arr1 = np.random.randn(1000, 64, 64)
arr2 = np.random.randn(64, 32)
%timeit -n 50 res = np.matmul(arr1, arr2)
9.7 ms ± 201 µs per loop (mean ± std. dev. of 7 runs, 50 loops each)

As we can see matrix multiplication on NumPy which uses a highly optimized matrix multiplication does the above operation in 9.7 milliseconds.

2.5.2 Matrix multiplication using PyTorch on CPU

Now let’s do the same operation using PyTorch tensors on CPU.

tnsr1 = torch.from_numpy(arr1)
tnsr2 = torch.from_numpy(arr2)
%timeit -n 50 res = tnsr1 @ tnsr2
2.78 ms ± 127 µs per loop (mean ± std. dev. of 7 runs, 50 loops each)

We can see that PyTorch on CPU performed the same operation in 2.78 milliseconds which is roughly 3 times faster than the NumPy version.

2.5.3 Matrix multiplication using pytorch on GPU

Let’s do the same operation on GPU using Pytorch.

tnsr1 = tnsr1.to("cuda")
tnsr2 = tnsr2.to("cuda")
%timeit -n 50 res = (tnsr1 @ tnsr2)
15.6 µs ± 4.32 µs per loop (mean ± std. dev. of 7 runs, 50 loops each)

As demonstrated by the Matrix multiplication example, the GPU version was completed in 15.6 microseconds, a significant improvement over both the Pytorch CPU version (which took 2.8 milliseconds) and the NumPy implementation (which took 9.7 milliseconds). This speedup is even more pronounced when working with larger matrices.

2.6 References