Getting Started with PyTorch in 5 Steps

This tutorial provides an in-depth introduction to machine learning using PyTorch and its high-level wrapper, PyTorch Lightning. The article covers essential steps from installation to advanced topics, offering a hands-on approach to building and training neural networks, and emphasizing the benefits of using Lightning.

By Matthew Mayo, KDnuggets Managing Editor on September 29, 2023 in Machine Learning

Introduction to PyTorch and PyTorch Lightning

PyTorch is a popular open-source machine learning framework based on Python and optimized for GPU-accelerated computing. Originally developed by developed by Meta AI in 2016 and now part of the Linux Foundation, PyTorch has quickly become one of the most widely used frameworks for deep learning research and applications.

Unlike some other frameworks like TensorFlow, PyTorch uses dynamic computation graphs which allow for greater flexibility and debugging capabilities. The key benefits of PyTorch include:

Simple and intuitive Python API for building neural networks
Broad support for GPU/TPU acceleration
Built-in support for automatic differentiation
Distributed training capabilities
Interoperability with other Python libraries like NumPy

PyTorch Lightning is a lightweight wrapper built on top of PyTorch that further simplifies the process of researcher workflow and model development. With Lightning, data scientists can focus more on designing models rather than boilerplate code. Key advantages of Lightning include:

Provides structure to organize PyTorch code
Handles training loop boilerplate code
Accelerates research experiments with hyperparameters tuning
Simplifies model scaling and deployment

By combining the power and flexibility of PyTorch with the high-level APIs of Lightning, developers can quickly build scalable deep learning systems and iterate faster.

Step 1: Installation and Setup

To start using PyTorch and Lightning, you'll first need to install a few prerequisites:

Python 3.6 or higher
Pip package installer
An NVidia GPU is recommended for accelerated operations (CPU-only setup possible but slower)

Installing Python and PyTorch

It's recommended to use Anaconda for setting up a Python environment for data science and deep learning workloads. Follow the steps below:

Download and install Anaconda for your OS from here
Create a Conda environment (or using another Python environment manager): conda create -n pytorch python=3.8
Activate the environment: conda activate pytorch
Install PyTorch: conda install pytorch torchvision torchaudio -c pytorch

Verify that PyTorch is installed correctly by running a quick test in Python:

import torch
x = torch.rand(3, 3)
print(x)

This will print out a random 3x3 tensor, confirming PyTorch is working properly.

Installing PyTorch Lightning

With PyTorch installed, we can now install Lightning using pip:

pip install lightning

Let's confirm Lightning is set up correctly:

import lightning
print(lightning.__version__)

This should print out the version number, such as 0.6.0.

Now we're ready to start building deep learning models.

Step 2: Building a Model with PyTorch

PyTorch uses tensors, similar to NumPy arrays, as its core data structure. Tensors can be operated on by GPUs and support automatic differentiation for building neural networks.

Let's define a simple neural network for image classification:

import torch
import torch.nn as nn
import torch.nn.functional as F

class Net(nn.Module):
    def __init__(self):
        super(Net, self).__init__()
        self.conv1 = nn.Conv2d(3, 6, 5)
        self.pool = nn.MaxPool2d(2, 2)
        self.conv2 = nn.Conv2d(6, 16, 5)
        self.fc1 = nn.Linear(16 * 5 * 5, 120)
        self.fc2 = nn.Linear(120, 84)
        self.fc3 = nn.Linear(84, 10)

    def forward(self, x):
        x = self.pool(F.relu(self.conv1(x)))
        x = self.pool(F.relu(self.conv2(x)))
        x = torch.flatten(x, 1)
        x = F.relu(self.fc1(x))
        x = F.relu(self.fc2(x))
        x = self.fc3(x)
        return x

net = Net()

This defines a convolutional neural network with two convolutional layers and three fully connected layers for classifying 10 classes. The forward() method defines how data passes through the network.

We can now train this model on sample data using Lightning.

Step 3: Training the Model with Lightning

Lightning provides a LightningModule class to encapsulate PyTorch model code and the training loop boilerplate. Let's convert our model:

import lightning as pl

class LitModel(pl.LightningModule):
    def __init__(self):
        super().__init__()
        self.model = Net()
    
    def forward(self, x):
        return self.model(x)

    def training_step(self, batch, batch_idx):
        x, y = batch
        y_hat = self.forward(x)
        loss = F.cross_entropy(y_hat, y)
        return loss

    def configure_optimizers(self):
        return torch.optim.Adam(self.parameters(), lr=0.02)
        
model = LitModel()

The training_step() defines the forward pass and loss calculation. We configure an Adam optimizer with learning rate 0.02.

Now we can train this model easily:

trainer = pl.Trainer()
trainer.fit(model, train_dataloader, val_dataloader)

The Trainer handles the epoch looping, validation, logging automatically. We can evaluate the model on test data (more on data modules here):

result = trainer.test(model, test_dataloader)
print(result)

For comparison, here is the network and training loop code in pure PyTorch:

import torch
import torch.nn.functional as F
from torch.utils.data import DataLoader

# Assume Net class and train_dataloader, val_dataloader, test_dataloader are defined

class Net(torch.nn.Module):
    # Define your network architecture here
    pass

# Initialize model and optimizer
model = Net()
optimizer = torch.optim.Adam(model.parameters(), lr=0.02)

# Training Loop
for epoch in range(10):  # Number of epochs
    for batch_idx, (x, y) in enumerate(train_dataloader):
        optimizer.zero_grad()
        y_hat = model(x)
        loss = F.cross_entropy(y_hat, y)
        loss.backward()
        optimizer.step()

# Validation Loop
model.eval()
with torch.no_grad():
    for x, y in val_dataloader:
        y_hat = model(x)

# Testing Loop and Evaluate
model.eval()
test_loss = 0
with torch.no_grad():
    for x, y in test_dataloader:
        y_hat = model(x)
        test_loss += F.cross_entropy(y_hat, y, reduction='sum').item()
test_loss /= len(test_dataloader.dataset)
print(f"Test loss: {test_loss}")

Lightning makes PyTorch model development incredibly fast and intuitive.

Step 4: Advanced Topics

Lightning provides many built-in capabilities for hyperparameter tuning, preventing overfitting, and model management.

Hyperparameter Tuning

We can optimize hyperparameters like learning rate using Lightning's tuner module:

tuner = pl.Tuner(trainer)
tuner.fit(model, train_dataloader)
print(tuner.results)

This performs a Bayesian search over the hyperparameter space.

Handling Overfitting

Strategies like dropout layers and early stopping can reduce overfitting:

model = LitModel()
model.add_module('dropout', nn.Dropout(0.2)) # Regularization

trainer = pl.Trainer(early_stop_callback=True) # Early stopping

Model Saving and Loading

Lightning makes it simple to save and reload models:

# Save
trainer.save_checkpoint("model.ckpt") 

# Load
model = LitModel.load_from_checkpoint(checkpoint_path="model.ckpt")

This preserves the full model state and hyperparameters.

Step 5: Comparing PyTorch & PyTorch Lightning

Both PyTorch and PyTorch Lightning are powerful libraries for deep learning, but they serve different purposes and offer unique features. While PyTorch provides the foundational blocks for designing and implementing deep learning models, PyTorch Lightning aims to simplify the repetitive parts of model training, thereby accelerating the development process.

Key Differences

Here is a summary of the key differences between PyTorch and PyTorch Lightning:

Feature	PyTorch	PyTorch Lightning
Training Loop	Manually coded	Automated
Boilerplate Code	Required	Minimal
Hyperparameter Tuning	Manual setup	Built-in support
Distributed Training	Available but manual setup	Automated
Code Organization	No specific structure	Encourages modular design
Model Saving and Loading	Custom implementation needed	Simplified with checkpoints
Debugging	Advanced but manual	Easier with built-in logs
GPU/TPU Support	Available	Easier setup

Flexibility vs Convenience

PyTorch is renowned for its flexibility, particularly with dynamic computation graphs, which is excellent for research and experimentation. However, this flexibility often comes at the cost of writing more boilerplate code, especially for the training loop, distributed training, and hyperparameter tuning. On the other hand, PyTorch Lightning abstracts away much of this boilerplate while still allowing full customization and access to the lower-level PyTorch APIs when needed.

Speed of Development

If you're starting a project from scratch or conducting complex experiments, PyTorch Lightning can save you a lot of time. The LightningModule class streamlines the training process, automates logging, and even simplifies distributed training. This allows you to focus more on your model architecture and less on the repetitive aspects of model training and validation.

The Verdict

In summary, PyTorch offers more granular control and is excellent for researchers who need that level of detail. PyTorch Lightning, however, is designed to make the research-to-production cycle smoother and faster, without taking away the power and flexibility that PyTorch provides. Whether you choose PyTorch or PyTorch Lightning will depend on your specific needs, but the good news is that you can easily switch between the two or even use them in tandem for different parts of your project.

Moving Forward

In this article, we covered the basics of using PyTorch and PyTorch Lightning for deep learning:

PyTorch provides a powerful and flexible framework for building neural networks
PyTorch Lightning simplifies training and model development workflows
Key features like hyperparameters optimization and model management accelerate deep learning research

With these foundations you can start building and training advanced models like CNNs, RNNs, GANs and more. The active open source community also offers Lightning support and additions like Bolt, a component and optimization library.

Happy deep learning!

Matthew Mayo (@mattmayo13) holds a Master's degree in computer science and a graduate diploma in data mining. As Editor-in-Chief of KDnuggets, Matthew aims to make complex data science concepts accessible. His professional interests include natural language processing, machine learning algorithms, and exploring emerging AI. He is driven by a mission to democratize knowledge in the data science community. Matthew has been coding since he was 6 years old.