AI in Biometrics Recognition

The Merriam-Webster dictionary describes biometrics as follows: “the measurement and analysis of unique physical or behavioral characteristics (such as fingerprint or voice patterns) especially as a means of verifying personal identity.”

There are many types of biometrics used today, for example, DNA matching, the shape of the ear, eye matching (iris, retina), facial features, fingerprinting, hand geometry, voice, signature, etc. Verifying personal identity may be very important in many applications for law enforcement, security and access control, and even in smart offices and homes where person dependent services may improve processes and everyday life for people.

Most biometrics identification systems work in a very similar manner and involve two main steps, feature extraction and feature (or pattern) matching. Feature extraction means that we analyze the chosen biometrics (a human face in this case) and extract a collection of features which are necessary to distinguish between different people. The aim is, of course, to limit the extracted information to the minimum amount necessary in order to optimize the machine learning training and prediction phases. Too much information would not only make everything much slower but it would also confuse the machine learning model, which should focus on the features that are really important for distinguishing different people. Feature matching is the process in which we use the extracted features in order to determine the identity of a person. We usually compare extracted features in a reference database to the input features for recognition.

The main steps of building and using a face recognition machine learning system can be divided into two major tasks:
  • Training a machine learning model for feature extraction, and
  • Performing face recognition with the help of the trained machine learning model.
The two major tasks explained above are further divided into several sub-tasks. First we need to train a machine learning model based on a huge number of input images (an image database) for feature extraction. The training of such a model may take several days or even weeks and may involve millions of images. The aim is that the ML model (a large scale convolutional neural network (CNN)) learns how to distinguish between the faces of different people. Deep inside of the system the CNN learns which face patterns are important in order to distinguish between different people.

As usual ML model training and testing are both important in order to arrive to a good final ML model.

The face recognition branch of the whole process involves the detection of face(s) in the input image, normalization of the extracted face image (we will see later how and why), feature extraction using the previously trained ML model and, finally, effective face recognition based on the extracted features.

After we have trained our CNN model we are ready to assemble a professional face recognition system.

As a first step we need to find automatically all of the faces in the input image and their exact location in order to extract the face images. Face detection is a complex problem because of the many possible face poses, rotations, scales, facial expressions, occlusions, etc.

Before we can perform face recognition we need to build a reference face recognition database with high quality frontal face images of people we want to recognize. The size of the face images should be similar to the size of the images we used during the training of the ML model for feature extraction. Face recognition systems usually extract and scale the face images automatically (AI-TOOLKIT) from selected input images.

The trained residual convolutional neural network (RCNN) can now be used to extract the feature vector from each detected and normalized face in an input image for recognition. Next we need to extract the feature vector from each image in the reference database. When we have all of the above feature vectors we can simply use a clustering algorithm in order to group (cluster) all feature vectors. If the detected image corresponds to one of the reference images, then both images will be grouped into the same cluster because the feature vectors are close to each other in the Euclidean space learned by the ML model if they are both face images of the same person. If the detected face (represented by its feature vector) is assigned to a cluster without any other face, then the face is an unknown face (it does not exist in the reference database).

Speaker recognition is similar to face recognition (feature extraction and identification) and based on some acoustic patterns (features) in human speech which are unique between individuals. The uniqueness of these acoustic patterns is due to the unique anatomy of humans (the shape and size of organs in the mouth called the vocal tract) and due to learned speech patterns and style.

The AI-TOOLKIT has built-in Apps which can be used for professional automatic face, speaker and fingerprint recognition.

This article is a slightly modified excerpt from the book “The Application of Artificial Intelligence”. If you are interested in the subject then it is strongly recommended to read the book which contains many more details and real world case studies for several sectors and disciplines! The book explains several examples step-by-step by using the AI-TOOLKIT. The book is going through the publishing process at the time of writing this article. You may use the contact form for info about pre-ordering the book.


Please use the contact form:


Antwerp, Belgium


You may contact the AI-TOOLKIT helpdesk:

Search This Website