The main point of using masks, is to vectorize operations that previously were done via loops.

The idea of masking, is to "hide" certain data and apply operations only on a selected amount of items in a matrix (numpy, pandas, ...).

Basic examples

Conditional replacement

import numpy as np
from IPython.display import display
from PIL import Image

matrix = np.random.uniform(0, 10, (128, 128)).astype(np.uint8)
above_5_mask = matrix > 5
matrix[above_5_mask] = 255
img = Image.fromarray(matrix)
display(img)

plt.hist(matrix.flatten())
plt.title("data distribution after masked operation")
plt.show()

Pasted image 20250611142327.png|300

Pasted image 20250611142320.png|300

nan imputation

import numpy as np

data = np.array([1.2, np.nan, 3.7, np.nan, 5.1])
mask = ~np.isnan(data)  # True where data is NOT NaN
clean_mean = np.mean(data[mask])  # (1.2 + 3.7 + 5.1) / 3 ≈ 3.33

What is a mask?

It is just a true false matrix of the same shape as the original matrix it is created from.