A convolution is a mathematical operation that combines two functions to produce a third function, representing the way one function is modified by the other.

In convolutional neural network, convolution is used in convolutional layers to extract features.

We have a kernel of learnable weights that slide across the input data, computing a dot product between the weights and the local region of the input, and generating an output map.

Intuitively, each output feature (green) only “look” at input features (blue) coming from roughly the same location.

The typical flow: passing the input data through multiple layers, where each layer, especially convolutional layers, uses filters (each a different set of kernels) to extract features. These features are represented in separate channels, and the final output may be the result of classification, regression, or other tasks based on the learned features.

Link to original