In machine learning the curse of dimensionality refers to a phenomenon, where common approaches start to break down once a high amount of dimensions are reached. Especially
Intuition fails in high Dimensions
Similarity approaches
The following algorithms (and many more) start to break down in high dimensions:
- nearest neighbour classifiers
- Euclidean Distance-based Algorithms
- Dot Product-based Similarity
- Manhattan Distance
- Correlation-based Similarity Measures
The origin of the issue is:
As dimensionality increases, the volume of the space increases exponentially, making data points sparse and making meaningful distance or similarity measures becomes impossible.
With extremely sparse points in high dimensions, distances between points tend to all be the same.
Example
I was trying to embed images of real and forged signatures using VGG16. The embedding vectors were way to big (26 000) dimensions for euclidean based algorithms to work properly. I am not saying that they didn't work at all, but common distance measurements like euclidean distance, means etc could not be trusted anymore.
Off the 26 000 dimensions most are probably random and add way too much noise.