I should do an example with some personal data. And create that data myself. However, I will first do it with their examples because I can then use that information to create some proper personal data.

Some people, like The Dom consider a dataloader bloat. And they might have a point. If you implement it via np arrays of training data and training labels, that is totally fine (maybe better).

Simple Code to get a dataloader for mnist, if you want to try the below code snippets out quickly

import torch
import torchvision
import torchvision.transforms as transforms

# PyTorch TensorBoard support
from torch.utils.tensorboard import SummaryWriter
from datetime import datetime


transform = transforms.Compose(
    [transforms.ToTensor(),
    transforms.Normalize((0.5,), (0.5,))])

# Create datasets for training & validation, download if necessary
training_set = torchvision.datasets.FashionMNIST('./data', train=True, transform=transform, download=True)
validation_set = torchvision.datasets.FashionMNIST('./data', train=False, transform=transform, download=True)

# Create data loaders for our datasets; shuffle for training, not for validation
train_dataloader = torch.utils.data.DataLoader(training_set, batch_size=4, shuffle=True)
test_dataloader = torch.utils.data.DataLoader(validation_set, batch_size=4, shuffle=False)

Ignore the above code, I will try to go into more details at a later date, with my own example training data.

Required parameters to train a model.

Each one requires careful consideration and testing.

Per Epoch Training

Going through the entire Dataset once while training constitutes an epoch.

Per epoch we want to achieve the following things:

Go through all the data
Run a training loop
Save the model (to be able to use it later)
Do a validation check, to be able to stop the training if the loss does not decrease anymore.

No worries, I will go over each important function call one by one below.

Epoch Loop

# defines the "correct" the output is
loss_fn = nn.CrossEntropyLoss()
# defines how to adapt the model parameters depending on the (input, loss)
# here we use stochastic gradient descent
optimizer = torch.optim.SGD(model.parameters(), lr=learning_rate)

epochs = 10
for t in range(epochs):
    print(f"Epoch {t+1}\n-------------------------------")
    train_loop(train_dataloader, model, loss_fn, optimizer)
	# testing for each epoch to track the models performance during training.
    avg_test_loss, avg_accuracy = test_loop(test_dataloader, model, loss_fn)
print("Done!")

for early stopping, refer to Model Training Acceleration#Early stopping

for hyperparameter tuning, refer to Hyperparameter Search Optuna

Training Loop

def train_loop(dataloader, model, loss_fn, optimizer):
    size = len(dataloader.dataset)
    # Set the model to training mode - important for batch normalization and  dropout layers
    # Unnecessary in this situation but added for best practices
    model.train()
    for batch_number, (X, y) in enumerate(dataloader):
        # Compute prediction and loss
        pred = model(X)
        loss = loss_fn(pred, y)

        # Backpropagation
        loss.backward()
        optimizer.step()
        optimizer.zero_grad()

        if batch_number % 100 == 0:
            loss, current = loss.item(), (batch_number + 1) * len(X)
            print(f"loss: {loss:>7f}  [{current:>5d}/{size:>5d}]")

Test loop

the accuracy calculation assumes that the problem is multiclass. likely using torch.nn.CrossEntropyLoss().

def test_loop(dataloader, model, loss_fn):
    model.eval()
    avg_test_loss, accuracy = 0, 0

    # Evaluating the model with torch.no_grad() ensures that no gradients are computed during test mode
    # also serves to reduce unnecessary gradient computations and memory usage for tensors with requires_grad=True
    with torch.no_grad():
        for X, y in dataloader:
            pred = model(X)
            avg_test_loss += loss_fn(pred, y).item()
            accuracy += (pred.argmax(1) == y).sum().item()

    avg_test_loss /= len(dataloader)
    accuracy /= len(dataloader.dataset)
    print(f"Test Error: \n Accuracy: {(100*accuracy):>0.1f}%, Avg loss: {avg_test_loss:>8f} \n")

	return avg_test_loss, accuracy

If you require to generate a pdf from the printouts, consider carriage return prints: python, printing tricks

If the problem is binary

Since the problem is binary, the model only returns a single value. We are assuming, that this value is the logit, and did not go through a softmax layer. Therefore {python}.argmax(1)will not work.

Here is the adapted implementation

pred_labels = (torch.sigmoid(pred) >= 0.5)
correct += (pred_labels.long().squeeze() == y.long()).sum().item() # the conversion to long() is just there for type compatibility.

Notice that this implementation will be able to handle batches no matter the size.

Comments on above code:

model.train()

Some layers behave differently during training. Examples are Dropout or BatchNorm. Usually they perform operations which help robustness during training but short term worsen performance. So after training is completed we do not want this to happen

pred = model(X)
loss = loss_fn(pred, y)

# Backpropagation
loss.backward()
optimizer.step()
optimizer.zero_grad()

{python} pred = model(X) Does more than just calculate the output of the model. It also calculates the computational graph for the current model. This means that that it saves a graph of the NN and how the current loss was calculates. Therefore via the prediction we can access all parameters with {python} parameter.requires_grad == True.

{python}loss.backward() computes the gradients for the current output for each parameter where {python} requires_grad == True.

{python} optimizer.step(): Updates the parameters using the gradients of each parameter.

{python} optimizer.zero_grad(): Resets the gradients of all parameters with requires grad True to zero. This is necessary, otherwise they accumulate. Accumulating the gradients can be useful of you wish to accumulate over multiple batches or for efficiency reasons.

Validation

For a trustworthy final score I recommend using K-Fold cross validation or Nested cross validation.