Pasted image 20240614182241.png

Most used activation function. Sometimes people use "leaky RELU" to avoid dead neurons.

It looks like this activation function cannot be derived. We fix this by simply setting the derivation to 0 or 1 at x=0.

{python} torch.nn.ReLU()

Torch allows us to set an in-place parameter. If set to true, it doesn't create a copy of the input, it saves memory and helps with computations, however if the layer is needed later (skip connections or other) that is obviously an issue.

Advantages

So cheap to compute

Neural networks that use this activation function after each/most layers are called Relu Neural Network

Leaky ReLU

Pasted image 20240614182305.png

Trivia

In one course a model using the ReLU activation function with linear layers was called a "relu neural network". I wouldn't use this term outside of this one course though, I don't think its an established term.