theaicompendium.com

Understanding Model Behavior During Training by Visualizing Metrics

Visualizing metrics during training offers valuable insights into neural networks and deep learning models. For example, if training accuracy worsens over time, it could indicate optimization issues like a high learning rate. This guide explains how to track and plot performance metrics in PyTorch, helping you understand your model’s behavior.

What You’ll Learn:


1. Collecting Metrics During Training

Training a model with gradient descent involves three steps:

  1. Forward pass: Compute the loss.
  2. Backward pass: Calculate gradients.
  3. Update: Adjust parameters using gradients.

Key Metrics to Track:

Example: Tracking MSE for Regression

Here’s an example using PyTorch and the California housing dataset:

import torch
import torch.nn as nn
import torch.optim as optim
from sklearn.datasets import fetch_california_housing
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler

# Load and preprocess data
data = fetch_california_housing()
X, y = data.data, data.target
X_train_raw, X_test_raw, y_train, y_test = train_test_split(X, y, train_size=0.7, shuffle=True)
scaler = StandardScaler()
scaler.fit(X_train_raw)
X_train = scaler.transform(X_train_raw)
X_test = scaler.transform(X_test_raw)

# Convert data to PyTorch tensors
X_train = torch.tensor(X_train, dtype=torch.float32)
y_train = torch.tensor(y_train, dtype=torch.float32).reshape(-1, 1)
X_test = torch.tensor(X_test, dtype=torch.float32)
y_test = torch.tensor(y_test, dtype=torch.float32).reshape(-1, 1)

# Define model, loss, and optimizer
model = nn.Sequential(
    nn.Linear(8, 24), nn.ReLU(),
    nn.Linear(24, 12), nn.ReLU(),
    nn.Linear(12, 6), nn.ReLU(),
    nn.Linear(6, 1)
)
loss_fn = nn.MSELoss()
optimizer = optim.Adam(model.parameters(), lr=0.001)

# Training loop
n_epochs = 100
batch_size = 32
batch_start = torch.arange(0, len(X_train), batch_size)
mse_history = []

for epoch in range(n_epochs):
    for start in batch_start:
        X_batch = X_train[start:start+batch_size]
        y_batch = y_train[start:start+batch_size]
        y_pred = model(X_batch)
        loss = loss_fn(y_pred, y_batch)
        mse_history.append(float(loss))
        optimizer.zero_grad()
        loss.backward()
        optimizer.step()

Tracking Additional Metrics

Extend the loop to evaluate on the test set and track metrics like MAE:

mae_fn = nn.L1Loss()
train_mse_history, test_mse_history, test_mae_history = [], [], []

for epoch in range(n_epochs):
    model.train()
    epoch_mse = []
    for start in batch_start:
        X_batch = X_train[start:start+batch_size]
        y_batch = y_train[start:start+batch_size]
        y_pred = model(X_batch)
        loss = loss_fn(y_pred, y_batch)
        epoch_mse.append(float(loss))
        optimizer.zero_grad()
        loss.backward()
        optimizer.step()
    
    train_mse_history.append(sum(epoch_mse) / len(epoch_mse))

    # Validation step
    model.eval()
    with torch.no_grad():
        y_pred = model(X_test)
        mse = loss_fn(y_pred, y_test)
        mae = mae_fn(y_pred, y_test)
        test_mse_history.append(float(mse))
        test_mae_history.append(float(mae))

2. Plotting the Training History

Visualize collected metrics using matplotlib:

import matplotlib.pyplot as plt
import numpy as np

plt.plot(np.sqrt(train_mse_history), label="Train RMSE")
plt.plot(np.sqrt(test_mse_history), label="Test RMSE")
plt.plot(test_mae_history, label="Test MAE")
plt.xlabel("Epochs")
plt.legend()
plt.show()

Interpreting the Plots

For regression, metrics like MSE and MAE should decrease over time. In classification, accuracy should increase, and loss should decrease.


Summary

In this guide, you learned how to:

With these tools, you can optimize your training process and improve model performance effectively.

Exit mobile version