!lecture-18-LRP.pdf

Idea: Propagate the prediction backwards through the network to find out which neurons/inputs are the most relevant for the output (or simply one output class). The propagation goes from the output layer to the input layer. So Backwards. We select one output neuron (usually a class) and try to find the input tokens, that are responsible for that output.
We do not propagate the gradient but relevance. How relevance is calculated is explained here: Layerwise Relevance Propagation#Relevance Definition

Formula:Rj=kzjkjzjkRk

Rj: Relevance of neuron aj in layer t-1 for the activation of neuron ak in layer t.
Rk: Relevance of neuron ak in layer t for the selected output neuron.
zjkj(zjk): Normalised relevance from neuron ak to neuron aj.
jzjkjzjkRk: Sum of all relevances times their normed weights from layer t to neuron aj
j: neuron index further towards the input
k: neuron index further towards the output

Relevance Definition

Relevance does not mean weight!! It is closely linked though. There are multiple possible Rules. These can be seen as "regularisations", mostly to reduce the noise/complexity of the produced explanations.

The explanation itself can be a heatmap of the pixels relevant for the activation of one output Neuron (class).

How to choose the right rule

LRP rules are set differently at different layers.

Pasted image 20240816150706.png

This was mostly found out via heuristic methods, but the results are actually based on deep taylor decomposition. Taylor decomposition simplifies each layer into the sum of functions that approximate it.

Example: why use LRP-0 in the upper layer:

  1. Simplicity: These layers are very simple, and not subjected to the instability and interpretability issues of the lower layers. Therefore LRP-0 is sufficient.