Figure 15: Regularities of change in the values of the metrics median tpr, median ppv, median tpr, median f1, mean f1, tpr IQR and f1 IQR from the threshold value
To evaluate the quality of the developed system, which would not depend on the threshold value, we calculated the AUC-ROC value – the area under the error curve (Receiver Operating Characteristic curve), and its value was 0.999997. To sum up, the developed system using the RetinaFace-MobileNet0.25 detection model at the first stage meets the requirements for the enterprise security system specified in Section 2.
Discussions
In this paper, we use AP values for the models from the works [11–16,18] MTCNN, FaceBoxes, DSFD, RetinaFaceResNet125, RetinaFaceMobileNet0.25, CenterFace, SCRFD-500MF, which were measured on the WIDER FACE (Hard) validation dataset [19]. According to this accuracy, the methods can be ranked as follows: RetinaFaceResNet125> DSFD> CenterFace> RetinaFaceMobileNet0.25> SCRFD-500MF> MTCNN> FaceBoxes.
At the beginning of the research, it was assumed that in the experiments on the robustness of methods in the conditions of rotation and resizing of faces, the methods would be ranked in a similar order. However, the model rankings obtained from the experiments differed significantly. The use of synthetic datasets for experiments can explain this difference.
The synthetic datasets were needed to control the values of the face rotation and image size parameters. However, in further work, it is desirable to investigate the accuracy of the fastest detection models, namely RetinaFaceMobileNet0.25, CenterFace, SCRFD-500MF, by AP indicator on working real datasets with which the security system will operate.
Conclusions
The scientific novelty of this work is the further development of face recognition methods, where the first stage is object detection.
The practical significance of the obtained results is a thorough study of the MTCNN, FaceBoxes, DSFD, RetinaFaceResNet125, RetinaFaceMobileNet0.25, CenterFace, SCRFD-500MF detection models to further use the best one for the enterprise security system based on face recognition from the video stream of surveillance cameras. Particular attention was paid to the compromise between the speed and accuracy of the studied methods.
Experiments were conducted on the robustness of the models to face rotation in different planes and face resizing, and time costs were also evaluated. The experiments were conducted on pre-prepared datasets. As a result of the research, it can be noted that:
- The best models in terms of rotation were RetinaFaceResNet125, DSFD, RetinaFaceMobileNet0.25, which confidently (confidence 0.9) detect faces with rotations in the range [-45;45], which is a sufficient requirement for use in a security system. The MTCNN, FaceBoxes, CenterFace, and SCRFD-500MF models also work with rotated faces but have lower confidence and a smaller range of face angles.
- In experiments with different face sizes, the best results were shown by MTCNN, DSFD, RetinaFace-ResNet, FaceBoxes, RetinaFace-MobileNet0.25 models, which detect images starting at 75×75 px with a confidence of 0.9. The CenterFace and SCRFD0.5GF models have significantly lower confidence but detect very small faces in some cases.
- Detection time measurements showed that the fastest models are RetinaFace-MobileNet0.25 and FaceBoxes, which spend less than 47 ms to process one card for VGA images and 100 ms for HD size. The next rank belongs to the SCRFD-500MF, CenterFace. For the MTCNN, and RetinaFace-ResNet125 models, detection time exceeds 100 ms even for VGA images. The DSFD model cannot be applied in real-time, even for VGA images.
To finally select the best model for further use in the security system, the quantitative values of their properties were converted to a 7-point scale. The following properties were considered: the claimed accuracy of AP methods in primary publications [11–16,18], the presence of landmarks, the maximum range of rotation angle, the minimum face size detected with a confidence of more than 0.9, the average frame processing time of one frame for VGA images (Table 4). After analyzing the ratings, the RetinaFace-MobileNet0.25 model was chosen as the best for use in the security system, as it is one of the fastest, has landmarks, and is robust to rotation and changes in face size.
The RetinaFace-MobileNet0.25 was used in the enterprise security system based on face recognition from surveillance cameras as the first-stage detection model. Then there were the stages of face normalization, obtaining embeddings, and classification. ResNet was used for embedding, and SVM was used for classification (recognition). The accuracy of the system was 0.999997 according to the AUC-ROC value, which shows the feasibility of using RetinaFace-MobileNet0.25 in the system to solve the face detection problem.
In the future, it is advisable to investigate other stages more thoroughly, especially the matter of the maximum number of classes for which SVM can be used as a classifier, the use of other classification methods, and the specifics of retraining the security system.
At SYTOSS, one of our specializations is developing advanced software solutions for face detection in video surveillance systems. We implement cutting-edge technology in our software systems designed for our clients to ensure precise results and enhance efficiency. Trust us to deliver robust, scalable solutions tailored to your specific processes and modernize your operation. Contact us today to discuss your particular conditions and unlock the potential of facial recognition technology!
Citation
This paper's results were presented at the conference Computational Linguistics and Intelligent Systems 2023 and were published in the CEUR Workshop Proceedings collection (CEUR-WS.org) "Computational Linguistics and Intelligent Systems 2023 (Volume III)". Please refer to a link to the printed version.
References
[1] O. Yakovleva, K. Nikolaieva, Research of Descriptor Based Image Normalization and Comparative Analysis of Surf, SIFT, Brisk, Orb, Kaze, Akaze Descriptors, Advanced Information Systems 4 (4) (2020) 89101. doi:10.20998/2522-9052.2020.4.13.
[2] A. Kovtunenko, O. Yakovleva, Doslidzhennia sumisnoho vykorystannia matematychnoi morfolohii ta zhortkovikh neironnykh merezh dla virishennia zadachi rozpiznavannia tsikavnykiv, Visnyk Natsionalnoho tekhnichnoho universytetu “KhPI”. Seriia: Systemnyi analis, upravlinnia ta informatsiini tekhnolohii, 1 (3) (2020) 24–31. doi:10.20998/2079-0023.2020.01.05.
[3] V. Gorokhovatskyi, I. Tvoroshenko, Image Classification Based on the Kohonen Network and the Data Space Modification, in: CEUR Workshop Proceedings: Computer Modeling and Intelligent Systems (CMIS-2020), 2020, pp. 1013–1026. doi:10.32782/cmis/2608-76.
[4] Y. Daradkeh, V. Gorokhovatskyi, I. Tvoroshenko, S. Gadetska, M. Al-Dhaifallah, Methods of classification of images on the basis of the values of statistical distributions for the composition of structural description components, IEEE Access 9 (2021) 92964–92973. doi:10.1109/ACCESS.2021.3093457.
[5] P. Viola, M. Jones, Rapid Object Detection Using a Boosted Cascade of Simple Features, in: Proceedings of the 2001 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR 2001), 2001. doi:10.1109/cvpr.2001.990517.
[6] N. Dalal, B. Triggs, Histograms of Oriented Gradients for Human Detection, in: 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR’05), 2005. doi:10.1109/cvpr.2005.177.
[7] H. Li, Z. Lin, X. Shen, J. Brandt, G. Hua, A convolutional neural network Cascade for face detection, in: 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2015, pp. 5325–5334. doi:10.1109/cvpr.2015.7299170.
[8] R. Girshick, J. Donahue, T. Darrell, J. Malik, Rich feature hierarchies for accurate object detection and semantic segmentation, in: 2014 IEEE Conference on Computer Vision and Pattern Recognition, 2014, pp. 580–587. doi :10.1109/cvpr.2014.81.
[9] S. Ren, K. He, R. Girshick, J. Sun, Faster R-CNN: Towards real-time object detection with region proposal networks, in: IEEE Transactions on Pattern Analysis and Machine Intelligence, 2017, pp. 1137–1149. doi : 10.1109/tpami.2016.2577031.
[10] L. Chengjun, H. Wechsler, Gabor feature based classification using the Enhanced Fisher linear discriminant model for face recognition, in: IEEE Transactions on Image Processing, 2002, pp. 467–476. doi:10.1109/tip.2002.999679.
[11] K. Zhang, Z. Zhang, Z. Li, Y. Qiao, Joint face detection and alignment using multitask cascaded convolutional networks, IEEE Signal Processing Letters 23(10) (2016) 1499–1503. doi:10.1109/lsp.2016.2603342.
[12] S. Zhang, X. Zhu, Z. Lei, H. Shi, X. Wang, S. Z. Li, FaceBoxes: A CPU real-time face detector with high accuracy, in: 2017 IEEE International Joint Conference on Biometrics (IJCB), 2017, pp. 1–9. doi:10.1109/btas.2017.8272675.
[13] S. Ren, K. He, R. Girshick, J. Sun, Faster R-CNN: Towards real-time object detection with region proposal networks, IEEE Transactions on Pattern Analysis and Machine Intelligence 39(6) (2016) 1137–1149. doi:10.1109/tpami.2016.2577031.
[14] J. Deng, J. Guo, E. Ververas, I. Kotsia, S. Zafeiriou, Retinaface: Single-shot multi-level face localisation in the wild, in: 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2020, pp. 5203–5212. doi:10.1109/cvpr42600.2020.00525.
[15] Y. Xu, W. Yan, G. Yang, J. Luo, T. Li, J. He, Centerface: Joint face detection and alignment using face as point, Scientific Programming (2020) 1–8. doi:10.1155/2020/7845384.
[16] J. Guo, J. Deng, A. Lattas, S. Zafeiriou, Sample and computation redistribution for efficient face detection, 2021. URL: https://arxiv.org/abs/2105.04714.
[17] E. Zhang, Y. Zhang, Average precision, Encyclopedia of Database Systems (2009) 192–193. doi:10.1007/978-0-387-39940-9_482.
[18] InsightFace Model Zoo, 2021. URL: https://github.com/deepinsight/insightface/tree/master/model_zoo.
[19] S. Yang, P. Luo, C. C. Loy, X. Tang, Wider face: A face detection benchmark, in: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2016, pp. 5525–5533. doi:10.1109/cvpr.2016.596.
[20] Pretrained Pytorch Face Detection (MTCNN) and facial recognition (InceptionResnet) models, 2019. URL: https://github.com/timesler/facenet-pytorch.
[21] A pytorch implementation of faceboxes, 2017. URL: https://github.com/zisianw/FaceBoxes.PyTorch.
[22] A high-performance pytorch implementation of face detection models, including RetinaFace and DSFD. 2019. URL: https://github.com/hukkelas/DSFD-Pytorch-Inference.
[23] Star-Clouds/Centerface: Face detection, 2019. URL: https://github.com/Star-Clouds/CenterFace.
[24] InsightFace Python Library, 2022. URL: https://github.com/deepinsight/insightface/tree/master/python-package.
[25] Unique, worry-free model photos. Generated Photos, 2022. URL: https://generated.photos.
[26] Home of the blender project – free and open 3D creation softwareб 2022. URL: https://www.blender.org.
[27] A. González-Ramírez, J. Lopez, D. Torres, I. Yañez-Vargas, Analysis of multi-class classification performance metrics for remote sensing imagery imbalanced datasets, Journal of Quantitative and Statistical Analysis (2021) 11–17. doi:10.35429/jqsa.2021.22.8.11.17.
[28] Yakovleva, О., Kovtunenko, A., Liubchenko, V., Honcharenko, V., & Kobylin, O. (2023). Face Detection for Video Surveillance-based Security System (COLINS-2023). In CEUR Workshop Proceedings (Vol. 3403). pp. 69-86.
Authors