The main point of using masks, is to vectorize operations that previously were done via loops.
The idea of masking, is to "hide" certain data and apply operations only on a selected amount of items in a matrix (numpy, pandas, ...).
Basic examples
Conditional replacement
import numpy as np
from IPython.display import display
from PIL import Image
matrix = np.random.uniform(0, 10, (128, 128)).astype(np.uint8)
above_5_mask = matrix > 5
matrix[above_5_mask] = 255
img = Image.fromarray(matrix)
display(img)
plt.hist(matrix.flatten())
plt.title("data distribution after masked operation")
plt.show()
nan imputation
import numpy as np
data = np.array([1.2, np.nan, 3.7, np.nan, 5.1])
mask = ~np.isnan(data) # True where data is NOT NaN
clean_mean = np.mean(data[mask]) # (1.2 + 3.7 + 5.1) / 3 ≈ 3.33
What is a mask?
It is just a true false matrix of the same shape as the original matrix it is created from.