Activation Functions in PyTorch - theaicompendium.com

As neural networks gain prominence in machine learning, understanding the role of activation functions in their implementation is critical. Activation functions are applied to the output of each neuron within a neural network, introducing non-linearity to the model. Without these functions, neural networks would operate solely as a series of linear transformations, severely limiting their capacity to learn complex patterns from data.

PyTorch offers a variety of activation functions, each with unique properties and applications. Common activation functions in PyTorch include ReLU, Sigmoid, and Tanh. Choosing the appropriate activation function for a specific problem is vital for achieving optimal performance in neural networks. This article will guide you through training a neural network in PyTorch with different activation functions and analyzing their impact on performance.

In this tutorial, you will learn about:

Various activation functions used in neural network architectures.
How to implement these functions in PyTorch.
A comparison of activation functions in real-world applications.

Let’s dive in!

Table of Contents

Overview

This tutorial is divided into four parts:

Logistic Activation Function
Tanh Activation Function
ReLU Activation Function
Exploring Activation Functions in a Neural Network

Logistic Activation Function

We begin with the logistic activation function, also known as the sigmoid function. This function maps any input to a value between 0 and 1, making it suitable for binary classification tasks where predictions can be interpreted as probabilities.

One advantage of the logistic function is its differentiability, which is essential for backpropagation during neural network training. It has a smooth gradient, helping to avoid complications like exploding gradients, but it can also lead to vanishing gradients, especially in deeper networks.

To visualize the logistic function, we can implement it in PyTorch and plot it:

import torch
import matplotlib.pyplot as plt

# Create a PyTorch tensor
x = torch.linspace(-10, 10, 100)

# Apply the logistic activation function
y = torch.sigmoid(x)

# Plot the results
plt.plot(x.numpy(), y.numpy(), color='purple')
plt.xlabel('Input')
plt.ylabel('Output')
plt.title('Logistic Activation Function')
plt.show()

In this example, we use torch.sigmoid() to implement the logistic function and visualize it using Matplotlib.

Tanh Activation Function

Next, we examine the Tanh activation function, which outputs values between -1 and 1, providing a mean output of 0. This can facilitate normalization in neural networks, ensuring the outputs remain centered around zero.

While Tanh is smooth and continuous, helping with optimization during gradient descent, it can also suffer from the vanishing gradient problem, particularly in deep networks. Its computational cost is higher because it relies on exponential functions.

You can visualize the Tanh function using the following code:

# Apply the tanh activation function
y = torch.tanh(x)

# Plot the results
plt.plot(x.numpy(), y.numpy(), color='blue')
plt.xlabel('Input')
plt.ylabel('Output')
plt.title('Tanh Activation Function')
plt.show()

ReLU Activation Function

The ReLU (Rectified Linear Unit) activation function is another popular choice in neural networks. Unlike sigmoid and Tanh, ReLU is a non-saturating function that outputs the input value directly if it is positive and zero if it is negative.

ReLU’s key advantages include computational efficiency and lower susceptibility to the vanishing gradient problem. It fosters sparsity in neuron activations, which can enhance the model’s generalization.

To visualize the ReLU function, use the following code:

# Apply the ReLU activation function
y = torch.relu(x)

# Plot the results
plt.plot(x.numpy(), y.numpy(), color='green')
plt.xlabel('Input')
plt.ylabel('Output')
plt.title('ReLU Activation Function')
plt.show()

Exploring Activation Functions in a Neural Network

Activation functions are essential in training deep learning models because they introduce non-linearity, which enables the model to learn complex patterns.

As an example, we can use the MNIST dataset, which consists of 70,000 grayscale images of handwritten digits (28×28 pixels). You’ll create a simple feedforward neural network to classify these digits and experiment with different activation functions like ReLU, Sigmoid, Tanh, and Leaky ReLU.

import torchvision.datasets as datasets
import torchvision.transforms as transforms
from torch.utils.data import DataLoader

# Load the MNIST dataset
transform = transforms.ToTensor()
train_dataset = datasets.MNIST(root='data/', train=True, transform=transform, download=True)
test_dataset = datasets.MNIST(root='data/', train=False, transform=transform, download=True)

train_loader = DataLoader(dataset=train_dataset, batch_size=64, shuffle=True)
test_loader = DataLoader(dataset=test_dataset, batch_size=64, shuffle=False)

Next, let’s define a NeuralNetwork class that includes three linear layers and allows the use of different activation functions:

import torch
import torch.nn as nn
import torch.optim as optim

class NeuralNetwork(nn.Module):
    def __init__(self, input_size, hidden_size, num_classes, activation_function):
        super(NeuralNetwork, self).__init__()
        self.layer1 = nn.Linear(input_size, hidden_size)
        self.layer2 = nn.Linear(hidden_size, hidden_size)
        self.layer3 = nn.Linear(hidden_size, num_classes)
        self.activation_function = activation_function

    def forward(self, x):
        x = self.activation_function(self.layer1(x))
        x = self.activation_function(self.layer2(x))
        x = self.layer3(x)
        return x

Training and Testing the Model with Different Activation Functions

Let’s create functions for training and testing the model. The train() function will execute the training process for one epoch, while the test() function will evaluate the model on the test dataset.

def train(network, data_loader, criterion, optimizer, device):
    network.train()
    running_loss = 0.0

    for data, target in data_loader:
        data, target = data.to(device), target.to(device)
        data = data.view(data.shape[0], -1)

        optimizer.zero_grad()
        output = network(data)
        loss = criterion(output, target)
        loss.backward()
        optimizer.step()

        running_loss += loss.item() * data.size(0)

    return running_loss / len(data_loader.dataset)

def test(network, data_loader, criterion, device):
    network.eval()
    correct = 0
    total = 0
    test_loss = 0.0

    with torch.no_grad():
        for data, target in data_loader:
            data, target = data.to(device), target.to(device)
            data = data.view(data.shape[0], -1)

            output = network(data)
            loss = criterion(output, target)
            test_loss += loss.item() * data.size(0)
            _, predicted = torch.max(output.data, 1)
            total += target.size(0)
            correct += (predicted == target).sum().item()

    return test_loss / len(data_loader.dataset), 100 * correct / total

Next, we’ll set up the parameters and initiate training with various activation functions:

device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
input_size = 784  # 28x28 images
hidden_size = 128
num_classes = 10  # Digits 0-9
num_epochs = 10
learning_rate = 0.001

activation_functions = {
    'ReLU': nn.ReLU(),
    'Sigmoid': nn.Sigmoid(),
    'Tanh': nn.Tanh(),
    'LeakyReLU': nn.LeakyReLU()
}

results = {}

# Train and test the model with different activation functions
for name, activation_function in activation_functions.items():
    print(f"Training with {name} activation function...")

    model = NeuralNetwork(input_size, hidden_size, num_classes, activation_function).to(device)
    criterion = nn.CrossEntropyLoss()
    optimizer = optim.Adam(model.parameters(), lr=learning_rate)

    train_loss_history = []
    test_loss_history = []
    test_accuracy_history = []

    for epoch in range(num_epochs):
        train_loss = train(model, train_loader, criterion, optimizer, device)
        test_loss, test_accuracy = test(model, test_loader, criterion, device)

        train_loss_history.append(train_loss)
        test_loss_history.append(test_loss)
        test_accuracy_history.append(test_accuracy)

        print(f"Epoch [{epoch + 1}/{num_epochs}], Test Loss: {test_loss:.4f}, Test Accuracy: {test_accuracy:.2f}%")

    results[name] = {
        'train_loss_history': train_loss_history,
        'test_loss_history': test_loss_history,
        'test_accuracy_history': test_accuracy_history
    }

Running the above code will output the training and test metrics for each activation function. You can visualize the performance of each function using Matplotlib to plot training loss, testing loss, and testing accuracy over epochs.

Summary

In this tutorial, you learned how to implement some of the most widely used activation functions in PyTorch. You explored how to train a neural network using various activation functions on the MNIST dataset, including ReLU, Sigmoid, Tanh, and Leaky ReLU. You also analyzed their performance through plotting training loss, testing loss, and testing accuracy.

The choice of activation function is crucial in determining model performance, and the optimal function may vary based on the specific task and dataset.