Dataloader and Dataset

Let's say you have a dataloader already. How do you iterate over it?
?

How to get the data in the training/test methods

Let's start from the end.

Iterate over the data

for batchnumber, (X, y) in enumerate(dataloader)
	...

Let's say you can generate data via {python}generate_data(). Please write a custom dataset/dataloader.
?

If we can generate the data

Using the function: {python} generate_data() to get the tuple (input_data, label)

from torch.utils.data import Dataset, DataLoader

class CustomDataset(Dataset):
    def __init__(self, size):
        self.size = size
    
    def __len__(self):
        return self.size
    
    def __getitem__(self, index):
        input_data, label = generate_data()
        return input_data, label

my_dataloader = DataLoader(CustomDataset(size = 1000), batch_size=32, shuffle=True)

However I still recommend generating the X, y yourself. It is simply easier to handle and not every library is able to utilise Datasets.

while above statement is true, I think I am experienced enough, that I can now utilise it, without being bothered by the added complexity.

Let's say you have a list of nib file paths:

# these files have both image and label data.
filepaths = ["file_1.nib", "file_2.nib", ...]

Please write a custom dataset/dataloader including a training, validation and testing split.
?

If we have a list of file paths

# these files have both image and label data.
filepaths = ["file_1.nib", "file_2.nib", ...]

You will want to divide the files into Training, Validation and Testing. You then initialise on dataset and dataloader for each one.

Dividing:

shuffle is true per default.
from sklearn.model_selection import train_test_split

filepaths_train, filepaths_non_train = train_test_split(filepaths, test_size=VAL_SPLIT_PERCENTAGE + TEST_SPLIT_PERCENTAGE)
filepaths_validation, filepaths_test = train_test_split(filepaths_non_train, test_size=TEST_SPLIT_PERCENTAGE / (TEST_SPLIT_PERCENTAGE + VAL_SPLIT_PERCENTAGE))
import nibabel as nib
from torch.utils.data import Dataset, DataLoader

class CustomDataset(Dataset):
    def __init__(self, filepaths):
        self.filepaths = filepaths

    def __len__(self):
        return len(self.filepaths)

    def __getitem__(self, idx):
        # Load image from file path
        current_training_filepath = self.filepaths[idx]
        data = nib.load(current_training_filepath)

		image = data["image"]
		label = data["label"]

		# any on the fly transformations go here
		...

		# don't forget to transform the data into torch tensors and put it into the right shape
		...

        return image, label

training_dataloader = DataLoader(CustomDataset(filepaths_train), batch_size=32, shuffle=True)
validation_dataloader = DataLoader(CustomDataset(filepaths_validation), batch_size=8, shuffle=False)
test_dataloader = DataLoader(CustomDataset(filepaths_test), batch_size=8, shuffle=False)