theaicompendium.com

How to Grid Search Hyperparameters for PyTorch Models


In PyTorch, the “weights” of a neural network are known as “parameters,” which are fine-tuned during training by an optimizer. In contrast, hyperparameters are fixed characteristics of the model determined before training starts, such as the number of hidden layers and the choice of activation functions. Optimizing hyperparameters is crucial in deep learning due to the inherent complexity and difficulty in configuring neural networks, along with the extensive training times involved.

In this post, you’ll learn how to utilize grid search from the Scikit-learn library to systematically tune hyperparameters for PyTorch deep learning models. After reading, you will understand:

Let’s Get Started

Overview

This article will guide you through using Scikit-learn’s grid search functionality, complete with practical examples you can easily adapt for your projects. Here’s an overview of the topics we will cover:

Using PyTorch Models in Scikit-learn

You can integrate PyTorch models with Scikit-learn by using the skorch library, which enables you to wrap your PyTorch models, providing a similar API to Scikit-learn models. This allows you to leverage Scikit-learn’s functionality seamlessly.

First, install the skorch library if you haven’t already:

pip install skorch

To utilize these wrappers, define your PyTorch model as a class inheriting from nn.Module. Then, pass the class name to the module argument when initializing the NeuralNetClassifier. Here’s an example:

import torch.nn as nn
from skorch import NeuralNetClassifier

class MyClassifier(nn.Module):
    def __init__(self):
        super().__init__()
        ...

    def forward(self, x):
        ...
        return x

# Create the skorch wrapper
model = NeuralNetClassifier(
    module=MyClassifier
)

The NeuralNetClassifier constructor accepts default parameters, which can be passed directly to the model.fit() method. For instance:

model = NeuralNetClassifier(
    module=MyClassifier,
    max_epochs=150,
    batch_size=10
)

You can also pass new parameters to your model’s constructor by prefixing them with module__. For example:

class SonarClassifier(nn.Module):
    def __init__(self, n_layers=3):
        super().__init__()
        self.layers = []
        self.acts = []
        for i in range(n_layers):
            self.layers.append(nn.Linear(60, 60))
            self.acts.append(nn.ReLU())
            self.add_module(f"layer{i}", self.layers[-1])
            self.add_module(f"act{i}", self.acts[-1])
        self.output = nn.Linear(60, 1)

    def forward(self, x):
        for layer, act in zip(self.layers, self.acts):
            x = act(layer(x))
        x = self.output(x)
        return x

model = NeuralNetClassifier(
    module=SonarClassifier,
    max_epochs=150,
    batch_size=10,
    module__n_layers=2
)

To confirm the setup, initialize the model and display it:

print(model.initialize())

Using Grid Search in Scikit-learn

Grid search is an essential technique for hyperparameter optimization that systematically evaluates all combinations of specified hyperparameters to find the optimal set. The GridSearchCV class in Scikit-learn facilitates this process.

While constructing the GridSearchCV class, provide a dictionary of hyperparameters in the param_grid argument, mapping parameter names to arrays of values to try:

from sklearn.model_selection import GridSearchCV

param_grid = {
    'epochs': [10, 20, 30]
}
grid = GridSearchCV(estimator=model, param_grid=param_grid, n_jobs=-1, cv=3)
grid_result = grid.fit(X, y)

Setting n_jobs=-1 utilizes all available CPU cores, improving the speed of the grid search over a single-threaded execution.

Once completed, access the result of the grid search using grid_result. The best_score_ attribute reveals the highest score obtained during the optimization, and best_params_ specifies the parameter set that achieved this score.

Problem Description

Now that you understand how to use PyTorch models with Scikit-learn and how to implement grid search, let’s explore several examples using the Pima Indians onset of diabetes dataset. This dataset is manageable and entirely numerical.

As we work through these examples, we will combine different parameters. While this approach might not be the ideal method for efficiently grid searching due to potential parameter interactions, it serves well illustrative purposes.

Tuning Batch Size and Number of Epochs

In our initial example, we will tune two key parameters: batch size and number of epochs.

The batch size determines how many samples are presented to the model before updating weights. In contrast, the number of epochs specifies how many times the complete dataset is passed through the network.

For this example, we will evaluate a range of batch sizes from 10 to 100 in increments of 20:

import random
import numpy as np
import torch
import torch.nn as nn
import torch.optim as optim
from skorch import NeuralNetClassifier
from sklearn.model_selection import GridSearchCV

# Load the dataset, split into input (X) and output (y) variables
dataset = np.loadtxt('pima-indians-diabetes.csv', delimiter=',')
X = dataset[:,0:8]
y = dataset[:,8]
X = torch.tensor(X, dtype=torch.float32)
y = torch.tensor(y, dtype=torch.float32).reshape(-1, 1)

# PyTorch classifier
class PimaClassifier(nn.Module):
    def __init__(self):
        super().__init__()
        self.layer = nn.Linear(8, 12)
        self.act = nn.ReLU()
        self.output = nn.Linear(12, 1)
        self.prob = nn.Sigmoid()

    def forward(self, x):
        x = self.act(self.layer(x))
        x = self.prob(self.output(x))
        return x

# Create model with skorch
model = NeuralNetClassifier(
    PimaClassifier,
    criterion=nn.BCELoss,
    optimizer=optim.Adam,
    verbose=False
)

# Define the grid search parameters
param_grid = {
    'batch_size': [10, 20, 40, 60, 80, 100],
    'max_epochs': [10, 50, 100]
}
grid = GridSearchCV(estimator=model, param_grid=param_grid, n_jobs=-1, cv=3)
grid_result = grid.fit(X, y)

# Summarize results
print("Best: %f using %s" % (grid_result.best_score_, grid_result.best_params_))
means = grid_result.cv_results_['mean_test_score']
stds = grid_result.cv_results_['std_test_score']
params = grid_result.cv_results_['params']
for mean, stdev, param in zip(means, stds, params):
    print("%f (%f) with: %r" % (mean, stdev, param))

Tuning the Training Optimization Algorithm

Different optimization algorithms can influence the training of your neural network significantly. In this example, we will tune various optimization algorithms available within PyTorch.

from sklearn.model_selection import GridSearchCV

# Load the dataset
dataset = np.loadtxt('pima-indians-diabetes.csv', delimiter=',')
X = dataset[:,0:8]
y = dataset[:,8]
X = torch.tensor(X, dtype=torch.float32)
y = torch.tensor(y, dtype=torch.float32).reshape(-1, 1)

# PyTorch classifier
class PimaClassifier(nn.Module):
    def __init__(self):
        super().__init__()
        self.layer = nn.Linear(8, 12)
        self.act = nn.ReLU()
        self.output = nn.Linear(12, 1)
        self.prob = nn.Sigmoid()

    def forward(self, x):
        x = self.act(self.layer(x))
        x = self.prob(self.output(x))
        return x

# Create model with skorch
model = NeuralNetClassifier(
    PimaClassifier,
    criterion=nn.BCELoss,
    max_epochs=100,
    batch_size=10,
    verbose=False
)

# Define the grid search parameters
param_grid = {
    'optimizer': [optim.SGD, optim.RMSprop, optim.Adagrad, optim.Adadelta,
                  optim.Adam, optim.Adamax, optim.NAdam],
}
grid = GridSearchCV(estimator=model, param_grid=param_grid, n_jobs=-1, cv=3)
grid_result = grid.fit(X, y)

# Summarize results
print("Best: %f using %s" % (grid_result.best_score_, grid_result.best_params_))
means = grid_result.cv_results_['mean_test_score']
stds = grid_result.cv_results_['std_test_score']
params = grid_result.cv_results_['params']
for mean, stdev, param in zip(means, stds, params):
    print("%f (%f) with: %r" % (mean, stdev, param))

How to Tune Learning Rate and Momentum

The learning rate determines the step size during optimization, while momentum helps smooth updates by incorporating past gradients. In this example, we will assess the impact of varying the learning rate and momentum while using the SGD optimizer.

from sklearn.model_selection import GridSearchCV

# Load the dataset
dataset = np.loadtxt('pima-indians-diabetes.csv', delimiter=',')
X = dataset[:,0:8]
y = dataset[:,8]
X = torch.tensor(X, dtype=torch.float32)
y = torch.tensor(y, dtype=torch.float32).reshape(-1, 1)

# PyTorch classifier
class PimaClassifier(nn.Module):
    def __init__(self):
        super().__init__()
        self.layer = nn.Linear(8, 12)
        self.act = nn.ReLU()
        self.output = nn.Linear(12, 1)
        self.prob = nn.Sigmoid()

    def forward(self, x):
        x = self.act(self.layer(x))
        x = self.prob(self.output(x))
        return x

# Create model with skorch
model = NeuralNetClassifier(
    PimaClassifier,
    criterion=nn.BCELoss,
    optimizer=optim.SGD,
    max_epochs=100,
    batch_size=10,
    verbose=False
)

# Define the grid search parameters
param_grid = {
    'optimizer__lr': [0.001, 0.01, 0.1, 0.2, 0.3],
    'optimizer__momentum': [0.0, 0.2, 0.4, 0.6, 0.8, 0.9],
}
grid = GridSearchCV(estimator=model, param_grid=param_grid, n_jobs=-1, cv=3)
grid_result = grid.fit(X, y)

# Summarize results
print("Best: %f using %s" % (grid_result.best_score_, grid_result.best_params_))
means = grid_result.cv_results_['mean_test_score']
stds = grid_result.cv_results_['std_test_score']
params = grid_result.cv_results_['params']
for mean, stdev, param in zip(means, stds, params):
    print("%f (%f) with: %r" % (mean, stdev, param))

Tuning Network Weight Initialization

Weight initialization can significantly affect model performance. In this section, we’ll explore various weight initialization techniques.

import torch.nn.init as init

# Modify classifier to allow weight initialization parameter
class PimaClassifier(nn.Module):
    def __init__(self, weight_init=init.xavier_uniform_):
        super().__init__()
        self.layer = nn.Linear(8, 12)
        self.act = nn.ReLU()
        self.output = nn.Linear(12, 1)
        self.prob = nn.Sigmoid()
        weight_init(self.layer.weight)
        weight_init(self.output.weight)

    def forward(self, x):
        x = self.act(self.layer(x))
        x = self.prob(self.output(x))
        return x

# Grid search to find effective weight initialization method
param_grid = {
    'module__weight_init': [init.uniform_, init.normal_, init.zeros_,
                           init.xavier_normal_, init.xavier_uniform_,
                           init.kaiming_normal_, init.kaiming_uniform_]
}
grid = GridSearchCV(estimator=model, param_grid=param_grid, n_jobs=-1, cv=3)
grid_result = grid.fit(X, y)

# Summarize results
print("Best: %f using %s" % (grid_result.best_score_, grid_result.best_params_))
means = grid_result.cv_results_['mean_test_score']
stds = grid_result.cv_results_['std_test_score']
params = grid_result.cv_results_['params']
for mean, stdev, param in zip(means, stds, params):
    print("%f (%f) with: %r" % (mean, stdev, param))

Tuning the Neuron Activation Function

Activation functions introduce non-linearity into the model. In the following example, we will explore various activation functions.

# PyTorch classifier with adjustable activation function
class PimaClassifier(nn.Module):
    def __init__(self, activation=nn.ReLU):
        super().__init__()
        self.layer = nn.Linear(8, 12)
        self.act = activation()
        self.output = nn.Linear(12, 1)
        self.prob = nn.Sigmoid()
        init.kaiming_uniform_(self.layer.weight)
        init.kaiming_uniform_(self.output.weight)

    def forward(self, x):
        x = self.act(self.layer(x))
        x = self.prob(self.output(x))
        return x

# Grid search over various activation functions
param_grid = {
    'module__activation': [nn.Identity, nn.ReLU, nn.ELU, nn.ReLU6,
                           nn.GELU, nn.Softplus, nn.Softsign, nn.Tanh,
                           nn.Sigmoid, nn.Hardsigmoid]
}
grid = GridSearchCV(estimator=model, param_grid=param_grid, n_jobs=-1, cv=3)
grid_result = grid.fit(X, y)

# Summarize results
print("Best: %f using %s" % (grid_result.best_score_, grid_result.best_params_))
means = grid_result.cv_results_['mean_test_score']
stds = grid_result.cv_results_['std_test_score']
params = grid_result.cv_results_['params']
for mean, stdev, param in zip(means, stds, params):
    print("%f (%f) with: %r" % (mean, stdev, param))

Tuning Dropout Regularization

In this section, we will optimize the dropout rate to reduce overfitting and enhance the model’s generalization:

# PyTorch classifier with adjustable dropout rate
class PimaClassifier(nn.Module):
    def __init__(self, dropout_rate=0.5):
        super().__init__()
        self.layer = nn.Linear(8, 12)
        self.act = nn.ReLU()
        self.dropout = nn.Dropout(dropout_rate)
        self.output = nn.Linear(12, 1)
        self.prob = nn.Sigmoid()
        init.kaiming_uniform_(self.layer.weight)
        init.kaiming_uniform_(self.output.weight)

    def forward(self, x):
        x = self.act(self.layer(x))
        x = self.dropout(x)
        x = self.prob(self.output(x))
        return x

# Grid search for dropout rates
param_grid = {
    'module__dropout_rate': [0.0, 0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8, 0.9]
}
grid = GridSearchCV(estimator=model, param_grid=param_grid, n_jobs=-1, cv=3)
grid_result = grid.fit(X, y)

# Summarize results
print("Best: %f using %s" % (grid_result.best_score_, grid_result.best_params_))
means = grid_result.cv_results_['mean_test_score']
stds = grid_result.cv_results_['std_test_score']
params = grid_result.cv_results_['params']
for mean, stdev, param in zip(means, stds, params):
    print("%f (%f) with: %r" % (mean, stdev, param))

Tuning the Number of Neurons in the Hidden Layer

The number of neurons in a hidden layer directly impacts the model’s capacity to learn. In this example, we will experiment with how the number of neurons affects performance.

class PimaClassifier(nn.Module):
    def __init__(self, n_neurons=12):
        super().__init__()
        self.layer = nn.Linear(8, n_neurons)
        self.act = nn.ReLU()
        self.output = nn.Linear(n_neurons, 1)
        self.prob = nn.Sigmoid()
        init.kaiming_uniform_(self.layer.weight)
        init.kaiming_uniform_(self.output.weight)

    def forward(self, x):
        x = self.act(self.layer(x))
        x = self.prob(self.output(x))
        return x

# Grid search for the number of neurons
param_grid = {
    'module__n_neurons': [5, 10, 15, 20, 25, 30]
}
grid = GridSearchCV(estimator=model, param_grid=param_grid, n_jobs=-1, cv=3)
grid_result = grid.fit(X, y)

# Summarize results
print("Best: %f using %s" % (grid_result.best_score_, grid_result.best_params_))
means = grid_result.cv_results_['mean_test_score']
stds = grid_result.cv_results_['std_test_score']
params = grid_result.cv_results_['params']
for mean, stdev, param in zip(means, stds, params):
    print("%f (%f) with: %r" % (mean, stdev, param))

Tips for Hyperparameter Optimization

Consider these tips during hyperparameter tuning:

Further Reading

For more detailed insights, explore the following resources:

Summary

In this post, you learned how to tune hyperparameters for your deep learning networks in Python using PyTorch and Scikit-learn. Specifically, you gained insights on:


This version presents the concepts clearly while maintaining the original article’s structure and key information.

Exit mobile version