if the mean and std of the input is important information, then the model will loose that information if Layer-, Instance or Batchnorms are applied.

Layer normalization function in a very similar way as BatchNorm, except that we do not calculate the mean and std between batches, but we. calculate the mean and std within each sample. Not for each channel, but for all values within the sample. As if the sample was flattened into a vector and we normalize this vector.

normalisation comparisons.png

N: samples
C: Channels
H, W: height, width

How does it work?

mean and std within each sample (not aggregated)
Pasted image 20241125163119.png

When to use

Use if normalising across features (as opposed to each feature individually) makes sense. This is the case for small batches where it makes sense to compare feature 1 to feature 2. If for example they are describing the same thing or feature.

However if this does not make sense, if the features do not describe the same type of data for example, then I would recommend to use Instance Normalization

Implementation

{python}torch.nn.LayerNorm(normalized_shape)

The normalized shape specifies the dimensions of the input tensor to normalize over. The point is to tell the layer, which dimension a datapoint has, so that it doesn't normalize across multiple datapoints. This is necessary if we have small batches.

How to set the normalized shape:

input shape: (batch_size, features) -> normalized shape: (features,)
input shape: (batch_size, channels, height, width) -> (channels, height, width)