Divide an image into multiple parts or regions that belong to the same class.
Basics
Each (input) pixel gets classified into a class. For the above example the horse is a Binary classification with one class: Horse pixel or not horse pixel. Below we have more classes (trees, grass, road, sidewalk, people, cars, road signs).
Input: An image
Output/Label: A mask with the same dimensions as the input Image. It is called a mask, because it is meant to be an overlay, that "masks"/isolates part of the image. For the example of the horse, we would hide the background.
Technically, the horse masks would be 0 everywhere except for the horse where we would have 1s. The 1 is the class "horse", the 0 would be "not horse" or in this case, background.
State of the art
U-Net: Developed for medical image segmentation, it is widely used for image segmentation.