Building a Logistic Regression Classifier in PyTorch

Logistic regression is an essential statistical method that predicts the probability of a given event, making it highly valuable for classification tasks in machine learning, artificial intelligence, and data mining.

The logistic regression model operates by applying a sigmoid function to the output of a linear function. In this article, you will learn how to build a logistic regression classifier using the widely used MNIST dataset for training and testing the model.

After working through this article, you will understand:

  • How to implement logistic regression using PyTorch and its applications for real-world problems.
  • How to load and analyze datasets with torchvision.
  • How to construct and train a logistic regression classifier for image datasets.

Let’s get started!

Overview

This tutorial is divided into three main parts:

  1. Preparing the Data and Building the Model
  2. Training the Classifier with Cross-Entropy Loss
  3. Verifying Model Performance with Test Data

Preparing the Data and the Model

To begin, you’ll need to create a dataset class to facilitate your experiments, splitting the data into training and testing samples. The test samples serve as unseen data to evaluate the performance of the trained model.

Here’s how you can create the dataset class:

import torch
from torch.utils.data import Dataset

class Data(Dataset):
    def __init__(self):
        self.x = torch.arange(-2, 2, 0.1).view(-1, 1)
        self.y = torch.zeros(self.x.shape[0], 1)
        self.y[self.x[:, 0] > 0.2] = 1
        self.len = self.x.shape[0]

    def __getitem__(self, idx):          
        return self.x[idx], self.y[idx] 

    def __len__(self):
        return self.len

Next, instantiate the dataset object:

# Creating dataset object
data_set = Data()

Now, you will build a custom module for your logistic regression model using PyTorch’s nn.Module class, which simplifies creating sophisticated models:

class LogisticRegression(torch.nn.Module):    
    def __init__(self, n_inputs):
        super().__init__()
        self.linear = torch.nn.Linear(n_inputs, 1)

    def forward(self, x):
        y_pred = torch.sigmoid(self.linear(x))
        return y_pred

# Instantiate the model
log_regr = LogisticRegression(1)

You can check the initial weights of the model:

print("Checking parameters: ", log_regr.state_dict())

Training the Classifier with Cross-Entropy Loss

In the previous tutorial, using mean squared error (MSE) loss resulted in poor convergence rates. Here, you will observe what happens when you switch to using cross-entropy loss for training the logistic regression model.

Since this is a binary classification problem, you’ll set the optimizer and the loss function accordingly:

optimizer = torch.optim.SGD(log_regr.parameters(), lr=0.01)
criterion = torch.nn.BCELoss()  # Binary Cross-Entropy Loss

You can now prepare a DataLoader and train the model for a specified number of epochs:

from torch.utils.data import DataLoader

# Load the data into the DataLoader
train_loader = DataLoader(dataset=data_set, batch_size=2)

# Training the model
Loss = []
epochs = 50
for epoch in range(epochs):
    for x, y in train_loader:
        y_pred = log_regr(x)
        loss = criterion(y_pred, y)
        Loss.append(loss.item())
        optimizer.zero_grad()
        loss.backward()
        optimizer.step()   
    print(f"Epoch: {epoch}, Loss: {loss.item():.4f}")
print("Training Complete!")

During training, you will observe the loss decreasing over epochs, indicating that the model is learning effectively.

Verifying Model Performance with Test Data

After training, it’s crucial to validate the model’s performance using the test dataset. You can accomplish this by measuring accuracy on unseen data:

# Get the model predictions on test data
y_pred = log_regr(data_set.x)
label = y_pred > 0.5  # Set threshold for classification
accuracy = torch.mean((label == data_set.y.type(torch.ByteTensor)).float())
print("Model accuracy on test data: ", accuracy.item())

In contrast to the previous training using MSE loss, where accuracy was around 57%, this approach with cross-entropy loss should yield significantly better performance, likely achieving around 86% or higher accuracy depending on the specifics of the data you use.

Summary

In this tutorial, you learned how to build a logistic regression classifier in PyTorch. Specifically, you explored:

  • How to effectively utilize cross-entropy loss to improve model performance.
  • The process of loading and analyzing datasets with torchvision.
  • The steps to train a logistic regression model on image datasets.

By integrating these techniques, you can create effective models for a variety of applications in deep learning and machine learning using PyTorch.

Leave a Comment