- You can access the slide deck that covers Pytorch Here
- You can access the slide deck that covers various concepts related to Transformers Here
- It is recommended to read the slide decks before using the following colab notebooks
- Once you get a good grip on the first four modules, you can easily walk through the documentation or other code to build an application. I will keep updating this repository.
- Recorded videos
-
The Fuel: Tensors
- Difficulty Level: Easy if you have prior experience using Numpy or TensorFlow
- Understand the Pytorch architecture
- Create Tensors of 0d,1d,2d,3d,... (a multidimensional array in numpy)
- Understand the attributes:
storage, stride, offset, device
- Manipulate tensor dimensions
- Operations on tensors
-
The Engine: Autograd
- Difficulty Level: Hard, requires a good understanding of backprop algorithm. However, you can skip this and still follow the subsequent notebooks easily.
- A few more attributes of tensor :
requires_grad, grad, grad_fn, _saved_tensors, backward, retain_grad, zero_grad
- Computation graph: Leaf node (parameters) vs non-leaf node (intermediate computation)
- Accumulate gradient and update with context manager (torch.no_grad)
- Implementing a neural network from scratch
-
The factory: nn.Module, Data Utils
- Difficulty Level: Medium
- Brief tour into the source code of nn.Module
- Everything is a module (layer in other frameworks)
- Stack modules by subclassing nn.Module and build any neural network
- Managing data with
dataset
class andDataLoader
class
-
Convolutional Neural Network Image Classification
- Difficulty Level: Medium
- Using torchvision for datasets
- build CNN and move it to GPU
- Train and test
- Transfer learning
- Image segmentation
Update
- You can use various learning rate schedulers such as ExponentialLR, CosineAnnealing and so on. You just need to call
scheduler.step()
afteroptimizer.step
. Refer to the documentation here - A slight change in instantiating pre-trained models Refer
- Recurrent Neural Network Sequence classification
- Difficulty Level: Hard for pre-processing part, Medium for model building part
- torchdata
- torchtext
- Embedding for words
- Build RNN
- Train,test, infer
Please take a look at the official tutorial series if you want to perform distributed training using a multi-GPU or multi-node setup in PyTorch (requires minimal modifications to the existing code). It covers various approaches, including:
- Distributed Data-Parallel (DDP)
- Fully Sharded Data Parallel (FSDP)
- Model, Tenosr and PipeLine parallelism
Now, let's move on to the Hugging Face library, which further simplifies these training strategies
- Using pre-trained models Notebook
- Difficulty Level: Easy
- AutoTokenizer
- AutoModel
- Fine-Tuning Pre-Trained Models Notebook
- Difficulty Level: Medium
- datasets
- tokenizer
- data collator with padding
- Trainer
- Loading Datasets Notebook
- Difficulty Level: Easy
- Dataset from local data files
- Dataset from Hub
- Preprocessing the dataset: Slice, Select, map, filter, flatten, interleave, concatenate
- Loading from external links
- Build a Custom Tokenizer for translation task Notebook
- Difficulty Level: Medium
- Translation dataset as running example
- Building the tokenizer by encapsulating the Normalizer, pre-tokenizer and tokenization algorithm (BPE)
- Locally Save and Load the tokenizer
- Using it in the Transformer module
- Exercise: Build a Tokenizer with shared vocabulary.
- Training Custom Seq2Seq model using Vanilla Transformer Architecture Notebook
- Difficulty Level: Medium, if you know how to build models in PyTorch.
- Build Vanilla Transformer architecture in Pytorch
- Create a configuration file for a model using PretrainedConfig class
- Wrap it by HF PreTrainedModel class
- Use the custom tokenizer built in the previous notebook
- Use Trainer API to train the model
- Gradient Accumulation - Continual Pre-training Notebook
- Difficulty Level: Easy
- Understand the memory requirement for training and inference
- Understand how gradient accumulation overcomes the limited memory
Pytorch updated Cuda Semantics page on Aug 07 2025. If you are using Multiple GPUs, you must read it before starting to write code. Don't assume!