masking

The main point of using masks, is to vectorize operations that previously were done via loops.

The idea of masking, is to "hide" certain data and apply operations only on a selected amount of items in a matrix (numpy, pandas, ...).

Basic examples

Conditional replacement

import numpy as np
from IPython.display import display
from PIL import Image

matrix = np.random.uniform(0, 10, (128, 128)).astype(np.uint8)
above_5_mask = matrix > 5
matrix[above_5_mask] = 255
img = Image.fromarray(matrix)
display(img)

plt.hist(matrix.flatten())
plt.title("data distribution after masked operation")
plt.show()

Pasted image 20250611142327.png|300

Pasted image 20250611142320.png|300

nan imputation

import numpy as np

data = np.array([1.2, np.nan, 3.7, np.nan, 5.1])
mask = ~np.isnan(data)  # True where data is NOT NaN
clean_mean = np.mean(data[mask])  # (1.2 + 3.7 + 5.1) / 3 ≈ 3.33

What is a mask?

It is just a true false matrix of the same shape as the original matrix it is created from.