How do they work?
Global pooling layers aggregate spatial information across the entire feature map, reducing each channel to a single value.
When to use them
Extreme reduction, use it for binary classification, when you just want to detect something, not care about where it is. Equivariance
It can avoid massive Fully connected layers. The final FC layer then only requires c neurons, one for each channel.
Same as for Max Pooling Layer and Average Pooling Layer, you should choose max if you care about spikes and have a binary classification, Average I would have to do more research when to use them exactly. I am not convinced about the noisiness reduction thing.
Implementation
# average global pooling
torch.nn.AdaptiveAvgPool2d(output_size=1)
# max global pooling
torch.nn.AdaptiveMaxPool2d(output_size=1)