A way to normalize and center activations between layers. See Normalization between layers

For very small batches, the mean and std vary strongly and will make the network unstable. Consider LayerNorms instead.

if the mean and std of the input is important information, then the model will loose that information if Batchnorms are applies.

How does it work?

Normalize and center each batch within the batch. (mean of the batch, std of the batch)

{\hat{x}}_{i}^{(k)} = \frac{x_{i}^{(k)} - μ_{B}^{(k)}}{\sqrt{{(σ_{B}^{(k)})}^{2} + ϵ}}

$B$ is the Batch, i the the index within the batch. Input is $x = (x^{(1)}, . . ., x^{(d)})$ . $k$ is the dimension index within that sample.
2. Normalized data is usually good for stability etc, but if the networks output is not between 1 and zero, then it might benefit from a scaling factor and bias.

y_{i}^{(k)} = γ^{(k)} {\hat{x}}_{i}^{(k)} + β^{(k)}

$γ$ and $β$ are both learnt via gradient descent

When to use it

Nearly always a good idea to use them. They help training speed and stabilize the model.

Use it directly after convolutional layer, or after fully connected layers, before ReLu layers.

When not to use it

Small batches: If the batches are too small, the mean and variance cannot be trusted anymore and will lead to a loss of information.

Implementation

replace "3d" with "2d" if necessary

class DoubleConv3D(nn.Module):  
    def __init__(self, in_channels, out_channels):  
        super().__init__()  
        self.conv_op = nn.Sequential(
            nn.Conv3d(in_channels, out_channels, kernel_size=3, padding=1),
            nn.BatchNorm3d(out_channels),
            nn.ReLU(inplace=True),
            nn.Conv3d(out_channels, out_channels, kernel_size=3, padding=1),
            nn.BatchNorm3d(out_channels),
            nn.ReLU(inplace=True)
        )
  
    def forward(self, x):
        return self.conv_op(x)