Using Learning Rate Schedules in PyTorch Training

hasnainmehdi1172@gmail.com

2 months ago

Training neural networks or large deep learning models poses significant optimization challenges. The foundational algorithm for this task is stochastic gradient descent (SGD). It has been shown that adjusting the learning rate throughout the training process can enhance model performance and reduce training time for specific challenges.

In this article, you will learn about learning rate schedules and how to implement different types in your PyTorch neural network models.

Table of Contents

Toggle

What You Will Learn:

The significance of learning rate schedules in model training
How to incorporate learning rate schedules into your PyTorch training loop
How to create your own learning rate schedule

Ready to Dive into Deep Learning with PyTorch?
Enroll in my free email crash course now. Sign up to receive sample code and a complimentary PDF ebook version of the course.

Getting Started

This article is structured into three main sections:

Learning Rate Schedules for Training Models
Implementing Learning Rate Schedules in PyTorch
Creating Custom Learning Rate Schedules

Learning Rate Schedules for Training Models

Gradient descent is a numerical optimization algorithm that updates model parameters using a defined formula. Essentially, it allows you to adjust variables (like weights in a neural network) in a way that minimizes the objective function (the loss function).

In the gradient descent process, you typically start with a constant learning rate, but employing a learning rate schedule can yield better results. A learning rate schedule adapts the learning rate dynamically throughout training, leading to improved performance and shorter training times.

During neural network training, data is processed in batches for multiple epochs. Each batch contributes to a single training step, during which parameters are updated. Usually, the learning rate is adjusted once per epoch to reflect model performance on the validation set.

Different strategies for adapting the learning rate exist. At the beginning of training, a larger learning rate can expedite progress through significant updates. In contrast, as training concludes, a smaller learning rate is preferable to achieve fine-tuned performance, reducing the risk of overshooting optimal values.

The most commonly used adaptation involves gradually decreasing the learning rate over time. This approach enables larger updates at the outset and smaller updates as the training advances, allowing the model to learn effective weights and refine them later.

Implementing Learning Rate Schedules in PyTorch

In PyTorch, models are optimized through an optimizer, which includes the learning rate as a parameter. You can implement a learning rate schedule to manage these updates.

Here’s how to create a basic learning rate schedule:

import torch.optim as optim
import torch.optim.lr_scheduler as lr_scheduler

scheduler = lr_scheduler.LinearLR(optimizer, start_factor=1.0, end_factor=0.3, total_iters=10)

PyTorch offers various learning rate schedulers found in the torch.optim.lr_scheduler module. Each scheduler requires the optimizer as its first argument, with additional parameters depending on the specific scheduler.

Let’s look at an example model designed to solve the ionosphere binary classification problem using a small dataset from the UCI Machine Learning repository.

A simple neural network with a single hidden layer (34 neurons) and a ReLU activation function is created. The output layer consists of a single neuron employing a sigmoid function for probability-like values.

Using plain SGD with a fixed learning rate of 0.1, the model trains for 50 epochs. The optimizer’s learning rate remains unchanged throughout the training:

import numpy as np
import pandas as pd
import torch
import torch.nn as nn
from sklearn.preprocessing import LabelEncoder
from sklearn.model_selection import train_test_split

# Load dataset, split into input (X) and output (y)
dataframe = pd.read_csv("ionosphere.csv", header=None)
X = dataframe.iloc[:, 0:34].values.astype(float)
y = LabelEncoder().fit_transform(dataframe.iloc[:, 34])

# Convert to PyTorch tensors
X_tensor = torch.tensor(X, dtype=torch.float32)
y_tensor = torch.tensor(y, dtype=torch.float32).reshape(-1, 1)

# Train-test split
X_train, X_test, y_train, y_test = train_test_split(X_tensor, y_tensor, train_size=0.7, shuffle=True)

# Construct the model
model = nn.Sequential(
    nn.Linear(34, 34),
    nn.ReLU(),
    nn.Linear(34, 1),
    nn.Sigmoid()
)

# Training the model
n_epochs = 50
batch_size = 24
loss_fn = nn.BCELoss()
optimizer = optim.SGD(model.parameters(), lr=0.1)

for epoch in range(n_epochs):
    for i in range(0, len(X_train), batch_size):
        X_batch = X_train[i:i + batch_size]
        y_batch = y_train[i:i + batch_size]
        optimizer.zero_grad()
        loss = loss_fn(model(X_batch), y_batch)
        loss.backward()
        optimizer.step()
    print(f"Epoch {epoch}: SGD lr={optimizer.param_groups[0]['lr']:.4f}")

# Model accuracy evaluation
model.eval()
accuracy = (model(X_test).round() == y_test).float().mean().item()
print(f"Model accuracy: {accuracy * 100:.2f}%")

This code confirms that the learning rate remains constant during training.

To vary the learning rate, incorporate a scheduler by calling its step() function within the training loop to create dynamic adjustments. Here’s how that looks:

scheduler = lr_scheduler.LinearLR(optimizer, start_factor=1.0, end_factor=0.5, total_iters=30)

for epoch in range(n_epochs):
    for i in range(0, len(X_train), batch_size):
        # Training steps (same as before)
        ...
    scheduler.step()
    print(f"Epoch {epoch}: SGD lr transitioned.")

Now, the learning rate will adjust throughout training, promoting better model performance.

Creating Custom Learning Rate Schedules

Not all models benefit from standard learning rate schedules. Sometimes, a custom approach may yield better results. You can define a custom learning rate schedule using a function. For example, to configure a learning rate that changes based on the epoch, you can implement it as follows:

def lr_lambda(epoch):
    base_lr = 0.1
    factor = 0.01
    return base_lr / (1 + factor * epoch)

You can then utilize this function with LambdaLR() to establish a corresponding learning rate adjustment:

scheduler = lr_scheduler.LambdaLR(optimizer, lr_lambda)

Include this in your training loop for custom learning rate adjustments.

Tips for Using Learning Rate Schedules

Start Higher: Begin with a larger initial learning rate since it will decrease; this also enhances early weight updates.
Utilize Momentum: Larger momentum values in optimizers can aid in maintaining direction when learning rates decline.
Experiment: Different schedules can yield varying results. Testing multiple configurations helps identify the most effective method for your specific problem.

Summary

This article provided insights on learning rate schedules for training neural network models. You learned the importance of learning rates, how to implement them in PyTorch, and how to create custom learning rate schedules to optimize your training processes.

Feel free to adjust any sections further!