Select device on which the training is running:
# define which device is used for training
if torch.cuda.is_available():
device = "cuda"
elif torch.backends.mps.is_available():
device = "mps"
else:
device = "cpu"
model.to(device) # do this later if the model is defined later.
torch.set_default_device(device)
print(f"Using {device} device. Every tensor created will be by default on {device}")
Mps device: enables high performance training on GPUs for Macos devices with the metal programming framework. I think that means m1 or newer, not sure.
torch.tensor(array, device=device) # or set the default device for all torch tensors.
instead of
torch.tensor(array).to(device)
If possible, add the entire dataset. Gpu's have a lot of Vram, most datasets can get stored there.
Where to place the .to(device)?
It is common practice that the model only takes care of the math, nothing else. Input is a tensor on the right device, output is a tensor as well.
The .to(device) therefore needs to be placed in the training and test loop respectively. Is it worth it to create your own function for it like dominik did? No.
You are coding models. there is no reason to create or run torch tensors anywhere else than on device.
{python} torch.set_default_device(device)
right at the start of the file/main function
What if I need to work with the respective outputs?
This will be debugging or showcase stuff where performance is less important than interpretability and speed of development. Therefore any potential performance increases by using a gpu are not worth it anymore.
This should happen after the model has done training. So there is a first "normal" part and then an analysis part of the code. Maybe even inside of the same file.
model.cpu()
torch.set_default_device("cpu")
# move any data you might need to cpu as well.
# don't forget to detach the data too
# model_output.detach()
with torch.no_grad():
model_output = model(some_input)
model_output.detach()
Is it pretty? No. But it is not something you should ever do outside of research or debugging functions. Therefore I consider it okay.
Early stopping
Consider implementing Early Stopping (patience) early on, to reduce the overall training time. Be aware that your performance statistics will be less reliable though, and you should consider using a separate test set.
Utilize all GPUs:
One option is to use Ray Clustering. It has the advantage that it only adds code, and shouldn't modify existing code.