From my time researching this topic I have come to the realisation that a lot of people don’t actually understand it, or only understand it partly. Also, many tutorials do a bad job in explaining in “lay-mans” terms what exactly it is doing, or leave out certain steps that would otherwise clear up some confusion. So i’m going to explain from start to finish in the most simplest way possible.
The approach we are going to be looking at is a mix between feature based and template based. One of the easiest and fastest ways of implementing facial detection is by using Viola Jones Algorithm.
Before learning about Viola Jones, we need to take a quick look at Haar-like features (which ill just be calling haar features from now on), and their inspiration: Haar Wavelets — Haar Wavelets were proposed by mathematician Alfred Haar in 1909 and are used in applications such as signal and image compression in electrical and computer engineering. To put simply: Haar Features are essentially collections of pixels in rectangular shapes. Haar features are conceptually similar to kernels in convolutional neural nets. The difference is that these features are created programmatically, they aren’t learned from the raw image data like in the case of deep learning.
But don’t worry, you don’t need to sit there and write thousands of fancy functions to generate these features as they are widely available online in the form of XML files. There are thousands of possible features you can use, because all they really are rectangles with regions for calculating delta values.