Review of the current state of solving computer vision problems
Today, computer vision is experiencing a period of significant progress in solving problems on which researchers have been working for decades. Meaningful advances have been made in detection, segmentation, classification of objects, image search by pattern, image generation, and other tasks.
For many years, most approaches of image analysis have been based on the transition from an image to a feature space, which contains basic information about the whole image or its characteristics, and has a denser representation than the image. After the transition, such features are compared with the appropriate method, and then a decision is made according to the problem which is being solved. In addition to being informative and having reduced dimension, the obtained feature should be as insensitive as possible, i.e., invariant to image geometric transformations, changes in lighting, noise, local occlusion, for example, when objects are partially overlapped, or an object part extends beyond an image.
There isn’t any universal way to describe images with a feature set. The choice of feature space and their processing method depends on a specific problem. The weakness of the classical feature approaches is the need for complex configuration, where it is necessary to set various parameters on the basis of heuristic information. Values of such parameters affect the final result significantly.
In recent years, neural networks have provided a significant breakthrough in computer vision. In 2012, the AlexNet neural network took part in the annual ImageNet competition and showed the best results in solving the object classification problem with a number of errors of 15.3% against 26.2% of the runner-up. Then, in 2019, the classification quality with neural networks equaled human capabilities. The combination of achievements of classical analytical feature approach with a neural network one has a great perspective.
But even now, there are a lot of tasks where the classical feature approach is necessary. Such tasks, like cartography, stitching of panoramic images, etc., depend on solving the problem of normalization (compensation) of present geometric transformations.
Image normalization problem and approaches to solve it
In this article, image normalization means the process of compensation of geometric transformations that distinguish one image from another. This matter has been investigated for a long time. The fundamental works about normalization propose two main approaches: tracking and parametric, and also consider some methods for each of these approaches.
The tracking approach implies the gradual compensation of geometric transformations with many steps. The processed image is compared with the pattern at each step, and then it undergoes a tiny geometric transformation that compensates only a part of the whole geometric transformation, bringing the processed image closer to the pattern. As a result of all steps, the processed image will become a pattern, and the parameters of a general geometric transformation will be defined. This approach is applied to tracking and targeting tasks.
The parametric approach is aimed to determine the parameters of an entire geometric transformation at once. Then the found transformation is compensated, and the processed image turns into the pattern. This approach is used more widely.
For both approaches, some researchers offer to put an integrated method of construction of functionals, which are based on the moments of a different order. Still, only cases with simple geometrical transformations are considered. Also, for all integrated methods, a significant problem is the background that can be partially or completely changed. In other research papers, to solve the problem of normalization in the conditions of complex geometric transformations and local occlusions, it is proposed to use the method of one-dimensional normalizations and decomposition of complex groups of transformations into compositions of simple ones. However, such methods solve the problem only partially. This article is devoted to analyzing the normalization on the basis of the descriptors of image key points.
Construction of image features based on descriptors
In the classical approach, a solution of a big amount of tasks is based on the key points definition and description of their neighborhoods by a feature vector with further processing of obtained vectors. So, there is a transition from an image to a space of key points feature vectors.
An algorithm that gets key points is a detector, and an algorithm that gets a description of the found points is called a descriptor. Also, a descriptor means a feature vector of a key point.
Over a long period of existence of computer vision tasks, a significant number of algorithms have been developed to detect and describe key points, which differ in varying degrees of invariance to geometric transformations, changes in lighting, angles of view, and time costs values. The implementation of most of these algorithms can be seen in popular software libraries. For instance, the open library OpenCV contains SURF, SIFT, ORB, BRISK, KAZE, AKAZE, LATCH, VGG, LUCID, DAISY, FREAK and other descriptors.
Research of descriptor-based image normalization
1. Normalization of geometrical transformations based on the descriptors.
The normalization method based on descriptors uses the basic property of projective transformation