I should do an example with some personal data. And create that data myself. However, I will first do it with their examples because I can then use that information to create some proper personal data.
Simple Code to get a dataloader for mnist, if you want to try the below code snippets out quickly
import torch
import torchvision
import torchvision.transforms as transforms
# PyTorch TensorBoard support
from torch.utils.tensorboard import SummaryWriter
from datetime import datetime
transform = transforms.Compose(
[transforms.ToTensor(),
transforms.Normalize((0.5,), (0.5,))])
# Create datasets for training & validation, download if necessary
training_set = torchvision.datasets.FashionMNIST('./data', train=True, transform=transform, download=True)
validation_set = torchvision.datasets.FashionMNIST('./data', train=False, transform=transform, download=True)
# Create data loaders for our datasets; shuffle for training, not for validation
train_dataloader = torch.utils.data.DataLoader(training_set, batch_size=4, shuffle=True)
test_dataloader = torch.utils.data.DataLoader(validation_set, batch_size=4, shuffle=False)
Ignore the above code, I will try to go into more details at a later date, with my own example training data.
Required parameters to train a model.
Each one requires careful consideration and testing.
- Loss functions
- Optimizer
- Model (choosen vial Model Selection)
Per Epoch Training
Per epoch we want to achieve the following things:
- Go through all the data
- Run a training loop
- Save the model (to be able to use it later)
- Do a validation check, to be able to stop the training if the loss does not decrease anymore.
Epoch Loop
# defines the "correct" the output is
loss_fn = nn.CrossEntropyLoss()
# defines how to adapt the model parameters depending on the (input, loss)
# here we use stochastic gradient descent
optimizer = torch.optim.SGD(model.parameters(), lr=learning_rate)
epochs = 10
for t in range(epochs):
print(f"Epoch {t+1}\n-------------------------------")
train_loop(train_dataloader, model, loss_fn, optimizer)
# testing for each epoch to track the models performance during training.
avg_test_loss, avg_accuracy = test_loop(test_dataloader, model, loss_fn)
print("Done!")
Training Loop
def train_loop(dataloader, model, loss_fn, optimizer):
size = len(dataloader.dataset)
# Set the model to training mode - important for batch normalization and dropout layers
# Unnecessary in this situation but added for best practices
model.train()
for batch_number, (X, y) in enumerate(dataloader):
# Compute prediction and loss
pred = model(X)
loss = loss_fn(pred, y)
# Backpropagation
loss.backward()
optimizer.step()
optimizer.zero_grad()
if batch_number % 100 == 0:
loss, current = loss.item(), (batch_number + 1) * len(X)
print(f"loss: {loss:>7f} [{current:>5d}/{size:>5d}]")
Test loop
def test_loop(dataloader, model, loss_fn):
model.eval()
avg_test_loss, accuracy = 0, 0
# Evaluating the model with torch.no_grad() ensures that no gradients are computed during test mode
# also serves to reduce unnecessary gradient computations and memory usage for tensors with requires_grad=True
with torch.no_grad():
for X, y in dataloader:
pred = model(X)
avg_test_loss += loss_fn(pred, y).item()
accuracy += (pred.argmax(1) == y).sum().item()
avg_test_loss /= len(dataloader)
accuracy /= len(dataloader.dataset)
print(f"Test Error: \n Accuracy: {(100*accuracy):>0.1f}%, Avg loss: {avg_test_loss:>8f} \n")
return avg_test_loss, accuracy
If the problem is binary
Since the problem is binary, the model only returns a single value. We are assuming, that this value is the logit, and did not go through a softmax layer. Therefore {python}.argmax(1)
will not work.
Here is the adapted implementation
pred_labels = (torch.sigmoid(pred) >= 0.5)
correct += (pred_labels.long().squeeze() == y.long()).sum().item() # the conversion to long() is just there for type compatibility.
Notice that this implementation will be able to handle batches no matter the size.
Comments on above code:
model.train()
Some layers behave differently during training. Examples are Dropout or BatchNorm. Usually they perform operations which help robustness during training but short term worsen performance. So after training is completed we do not want this to happen
pred = model(X)
loss = loss_fn(pred, y)
# Backpropagation
loss.backward()
optimizer.step()
optimizer.zero_grad()
{python} pred = model(X)
Does more than just calculate the output of the model. It also calculates the computational graph for the current model. This means that that it saves a graph of the NN and how the current loss was calculates. Therefore via the prediction we can access all parameters with {python} parameter.requires_grad == True
.
{python}loss.backward()
computes the gradients for the current output for each parameter where {python} requires_grad == True
.
{python} optimizer.step()
: Updates the parameters using the gradients of each parameter.
{python} optimizer.zero_grad()
: Resets the gradients of all parameters with requires grad True to zero. This is necessary, otherwise they accumulate. Accumulating the gradients can be useful of you wish to accumulate over multiple batches or for efficiency reasons.
Validation
For a trustworthy final score I recommend using K-Fold cross validation or Nested cross validation.