Building a Binary Classification Model in PyTorch

The PyTorch library is a powerful tool for deep learning, often used to tackle regression and classification problems. In this article, you will learn how to use PyTorch to create and evaluate neural network models specifically for binary classification tasks.

By the end of this post, you will understand:

How to load training data and prepare it for use in PyTorch.
How to design and train a neural network for binary classification.
How to evaluate model performance using k-fold cross-validation.
How to run a model in inference mode.
How to create a receiver operating characteristics (ROC) curve to assess model performance.

Table of Contents

Let’s Get Started

Description of the Dataset

In this tutorial, we will be utilizing the Sonar dataset, which consists of sonar chirp returns reflecting off various surfaces. The dataset has 60 input variables representing the strength of returns at different angles. The task is to classify these returns into two categories: rocks or metal cylinders.

You can download the dataset from the UCI Machine Learning repository and save it as sonar.csv in your working directory.

This dataset is a great choice for benchmark testing as it’s well-understood. All input features are continuous, typically ranging between 0 and 1, while the output variable is labeled as “M” for mine and “R” for rock. These labels must be converted to numerical values (1 and 0, respectively).

To convert the labels, we will utilize the LabelEncoder from Scikit-learn after loading the dataset with pandas:

import pandas as pd
from sklearn.preprocessing import LabelEncoder

# Load the dataset
data = pd.read_csv("sonar.csv", header=None)
X = data.iloc[:, 0:60]
y = data.iloc[:, 60]

# Convert string labels to integers
encoder = LabelEncoder()
encoder.fit(y)
y = encoder.transform(y)

You can check the labels using:

print(encoder.classes_)

This will output:

['M' 'R']

When you print y, you’ll see:

[1 1 1 ... 0 0 0]

This indicates that the labels have been successfully converted to 0s and 1s.

Next, it’s time to convert the dataset into PyTorch tensors:

import torch

X = torch.tensor(X.values, dtype=torch.float32)
y = torch.tensor(y, dtype=torch.float32).reshape(-1, 1)

Creating the Model

Now it’s time to define the neural network model for our binary classification task.

For this example, we will create two models: a “wide” model with a single hidden layer and a “deep” model with multiple hidden layers. Both structures aim to determine how they affect model performance on the classification task.

Wide Model

import torch.nn as nn

class Wide(nn.Module):
    def __init__(self):
        super().__init__()
        self.hidden = nn.Linear(60, 180)
        self.relu = nn.ReLU()
        self.output = nn.Linear(180, 1)
        self.sigmoid = nn.Sigmoid()

    def forward(self, x):
        x = self.relu(self.hidden(x))
        x = self.sigmoid(self.output(x))
        return x

Deep Model

class Deep(nn.Module):
    def __init__(self):
        super().__init__()
        self.layer1 = nn.Linear(60, 60)
        self.act1 = nn.ReLU()
        self.layer2 = nn.Linear(60, 60)
        self.act2 = nn.ReLU()
        self.layer3 = nn.Linear(60, 60)
        self.act3 = nn.ReLU()
        self.output = nn.Linear(60, 1)
        self.sigmoid = nn.Sigmoid()

    def forward(self, x):
        x = self.act1(self.layer1(x))
        x = self.act2(self.layer2(x))
        x = self.act3(self.layer3(x))
        x = self.sigmoid(self.output(x))
        return x

These models will have a similar number of parameters, which can be checked with the following code:

model1 = Wide()
model2 = Deep()
print(sum([x.reshape(-1).shape[0] for x in model1.parameters()]))  # Count parameters
print(sum([x.reshape(-1).shape[0] for x in model2.parameters()]))  # Count parameters

Comparing Models with Cross-Validation

To determine whether to use a wide or deep model, we can apply cross-validation. This method allows us to train each model on different segments of the data, providing a more comprehensive understanding of their performance.

We’ll use k-fold cross-validation, where the dataset is divided into k portions. Each portion serves as a test set while the others are used for training. Here’s how to set up the cross-validation process:

from sklearn.model_selection import StratifiedKFold

# Create train-test split
X_train, X_test, y_train, y_test = train_test_split(X, y, train_size=0.7, shuffle=True)

# Initialize k-fold cross-validation
kfold = StratifiedKFold(n_splits=5, shuffle=True)
cv_scores_wide = []
for train, test in kfold.split(X_train, y_train):
    model = Wide()
    acc = model_train(model, X_train[train], y_train[train], X_train[test], y_train[test])  # Define method to train and evaluate
    print("Accuracy (wide): %.2f" % acc)
    cv_scores_wide.append(acc)

cv_scores_deep = []
for train, test in kfold.split(X_train, y_train):
    model = Deep()
    acc = model_train(model, X_train[train], y_train[train], X_train[test], y_train[test])
    print("Accuracy (deep): %.2f" % acc)
    cv_scores_deep.append(acc)

# Calculate and print average accuracies
wide_acc = np.mean(cv_scores_wide)
wide_std = np.std(cv_scores_wide)
deep_acc = np.mean(cv_scores_deep)
deep_std = np.std(cv_scores_deep)
print("Wide: %.2f%% (+/- %.2f%%)" % (wide_acc*100, wide_std*100))
print("Deep: %.2f%% (+/- %.2f%%)" % (deep_acc*100, deep_std*100))

Retraining the Final Model

After identifying which model architecture performs better, you can rebuild and retrain it using the entire training dataset. This allows the final model to make predictions based on a broader set of data.

# Choose and retrain the best model
if wide_acc > deep_acc:
    print("Retraining a wide model")
    model = Wide()
else:
    print("Retraining a deep model")
    model = Deep()

final_accuracy = model_train(model, X_train, y_train, X_test, y_test)  # Reuse training function
print(f"Final model accuracy: {final_accuracy*100:.2f}%")

Running Inference

After training the final model, you may want to test its output. The following code can be used to perform inference on a few samples from the test set:

model.eval()
with torch.no_grad():
    for i in range(5):
        y_pred = model(X_test[i:i+1])
        print(f"{X_test[i].numpy()} -> {y_pred[0].numpy()} (expected {y_test[i].numpy()})")

Receiver Operating Characteristic Curve

To further assess the model’s performance, you can create a Receiver Operating Characteristic (ROC) curve to visualize true positive rates against false positive rates for various thresholds.

from sklearn.metrics import roc_curve
import matplotlib.pyplot as plt

with torch.no_grad():
    y_pred = model(X_test)
    fpr, tpr, thresholds = roc_curve(y_test, y_pred)
    plt.plot(fpr, tpr)  # Plot ROC curve
    plt.title("Receiver Operating Characteristics")
    plt.xlabel("False Positive Rate")
    plt.ylabel("True Positive Rate")
    plt.show()

Complete Code

Here is the complete code outline that incorporates all steps:

import copy
import matplotlib.pyplot as plt
import numpy as np
import pandas as pd
import torch
import torch.nn as nn
import torch.optim as optim
import tqdm
from sklearn.model_selection import StratifiedKFold, train_test_split
from sklearn.preprocessing import LabelEncoder

# Load data
data = pd.read_csv("sonar.csv", header=None)
X = data.iloc[:, 0:60]
y = data.iloc[:, 60]

# Binary encoding of labels
encoder = LabelEncoder()
encoder.fit(y)
y = encoder.transform(y)

# Convert to 2D PyTorch tensors
X = torch.tensor(X.values, dtype=torch.float32)
y = torch.tensor(y, dtype=torch.float32).reshape(-1, 1)

# Define two models
class Wide(nn.Module):
    def __init__(self):
        super().__init__()
        self.hidden = nn.Linear(60, 180)
        self.relu = nn.ReLU()
        self.output = nn.Linear(180, 1)
        self.sigmoid = nn.Sigmoid()

    def forward(self, x):
        x = self.relu(self.hidden(x))
        x = self.sigmoid(self.output(x))
        return x

class Deep(nn.Module):
    def __init__(self):
        super().__init__()
        self.layer1 = nn.Linear(60, 60)
        self.act1 = nn.ReLU()
        self.layer2 = nn.Linear(60, 60)
        self.act2 = nn.ReLU()
        self.layer3 = nn.Linear(60, 60)
        self.act3 = nn.ReLU()
        self.output = nn.Linear(60, 1)
        self.sigmoid = nn.Sigmoid()

    def forward(self, x):
        x = self.act1(self.layer1(x))
        x = self.act2(self.layer2(x))
        x = self.act3(self.layer3(x))
        x = self.sigmoid(self.output(x))
        return x

# Helper function to train one model
def model_train(model, X_train, y_train, X_val, y_val):
    loss_fn = nn.BCELoss()  # Binary cross-entropy
    optimizer = optim.Adam(model.parameters(), lr=0.0001)

    n_epochs = 300   # Number of epochs to run
    batch_size = 10  # Size of each batch
    batch_start = torch.arange(0, len(X_train), batch_size)

    best_acc = -np.inf  # Initialize to negative infinity
    best_weights = None

    for epoch in range(n_epochs):
        model.train()
        with tqdm.tqdm(batch_start, unit="batch", mininterval=0, disable=True) as bar:
            bar.set_description(f"Epoch {epoch}")
            for start in bar:
                X_batch = X_train[start:start + batch_size]
                y_batch = y_train[start:start + batch_size]

                # Forward pass
                y_pred = model(X_batch)
                loss = loss_fn(y_pred, y_batch)

                # Backward pass
                optimizer.zero_grad()
                loss.backward()
                optimizer.step()

                acc = (y_pred.round() == y_batch).float().mean()
                bar.set_postfix(loss=float(loss), acc=float(acc))

        model.eval()
        y_pred = model(X_val)
        acc = (y_pred.round() == y_val).float().mean()
        if acc > best_acc:
            best_acc = acc
            best_weights = copy.deepcopy(model.state_dict())

    model.load_state_dict(best_weights)
    return best_acc

# Train-test split
X_train, X_test, y_train, y_test = train_test_split(X, y, train_size=0.7, shuffle=True)

# k-fold cross validation to compare models
kfold = StratifiedKFold(n_splits=5, shuffle=True)
cv_scores_wide = []
for train, test in kfold.split(X_train, y_train):
    model = Wide()
    acc = model_train(model, X_train[train], y_train[train], X_train[test], y_train[test])
    print("Accuracy (wide): %.2f" % acc)
    cv_scores_wide.append(acc)

cv_scores_deep = []
for train, test in kfold.split(X_train, y_train):
    model = Deep()
    acc = model_train(model, X_train[train], y_train[train], X_train[test], y_train[test])
    print("Accuracy (deep): %.2f" % acc)
    cv_scores_deep.append(acc)

wide_acc = np.mean(cv_scores_wide)
wide_std = np.std(cv_scores_wide)
deep_acc = np.mean(cv_scores_deep)
deep_std = np.std(cv_scores_deep)
print("Wide: %.2f%% (+/- %.2f%%)" % (wide_acc * 100, wide_std * 100))
print("Deep: %.2f%% (+/- %.2f%%)" % (deep_acc * 100, deep_std * 100))

# Final model training
if wide_acc > deep_acc:
    print("Retraining a wide model")
    model = Wide()
else:
    print("Retraining a deep model")
    model = Deep()

final_accuracy = model_train(model, X_train, y_train, X_test, y_test)
print(f"Final model accuracy: {final_accuracy * 100:.2f}%")

# Model inference
model.eval()
with torch.no_grad():
    for i in range(5):
        y_pred = model(X_test[i:i + 1])
        print(f"{X_test[i].numpy()} -> {y_pred[0].numpy()} (expected {y_test[i].numpy()})")

# Plot ROC curve
from sklearn.metrics import roc_curve

with torch.no_grad():
    y_pred = model(X_test)
    fpr, tpr, thresholds = roc_curve(y_test, y_pred)
    plt.plot(fpr, tpr)  # Plot ROC curve
    plt.title("Receiver Operating Characteristics")
    plt.xlabel("False Positive Rate")
    plt.ylabel("True Positive Rate")
    plt.show()

Summary

In this article, you learned how to build a binary classification model in PyTorch. You worked through the steps of loading and preparing the dataset, creating neural network models, comparing model performance using k-fold cross-validation, and running inference while generating an ROC curve. This structured approach not only demonstrates the power of PyTorch but also provides a comprehensive methodology to tackle binary classification tasks effectively.

This rewritten version retains the original content’s essence while ensuring clarity and readability.