Identifying Faces in Video Images is Major Challenge, Report Shows

faces in the crowd A recent NIST report shows that video facial recognition is a difficult challenge.Image Credit: ©Artens/ShutterstockIn movies and television, computers can quickly identify a person in a crowded arena from tiny, grainy video images. But that is often not the reality when it comes to identifying bank robbery perpetrators from security camera video, detecting terrorism suspects in a crowded railway station, or finding desired individuals when searching video archives.

To advance video facial identification for these and other applications, the National Institute of Standards and Technology (NIST) conducted a large public test known as the Face in Video Evaluation (FIVE). The FIVE project has now released an interagency report detailing its results and aiming to provide guidance to developers of the technology.

The report shows that video facial recognition is a difficult challenge. Getting the best, most accurate results for each intended application requires good algorithms, a dedicated design effort, a multidisciplinary team of experts, limited-size image databases, and field tests to properly calibrate and optimize the technology.

FIVE ran 36 prototype algorithms from 16 commercial suppliers on 109 hours of video imagery taken at a variety of settings. The video images included hard-to-match pictures of people looking at smartphones, wearing hats, or just looking away from the camera. Lighting was sometimes a problem, and some faces never appeared on the video because they were blocked, for example, by a tall person in front of them.

NIST used the algorithms to match faces from the video to databases populated with photographs of up to 48,000 individuals. People in the videos were not required to look in the direction of the camera. Without this requirement, the technology must compensate for large changes in the appearance of a face and is often less successful. The report notes that even for the more accurate algorithms, subjects may be identified anywhere from around 60 percent of the time to more than 99 percent, depending on video or image quality and the algorithm’s ability to deal with the given scenario.

“Our research revealed that the video images’ quality and other properties can highly influence the accuracy of facial identification,” said lead author Patrick Grother, who heads several of NIST’s biometrics standards and evaluation activities. In video, many faces are small, or unevenly lit, or not forward-facing—three critical points for accurately identifying individuals because the algorithms are not very effective at compensating for these factors.

In traditional face-matching evaluations that NIST has performed since the 1990s, algorithms compare a photograph of a person’s face against a database, or gallery, of millions of portrait photographs. Today’s match rates for portrait photographs can exceed 99 percent in some applications. But in the new study, NIST limited galleries to just 48,000 because the lower face quality in video undermines recognition accuracy.

Related Article: How Well Do Facial Recognition Algorithms Cope with a Million Strangers?

NIST also measured “false positive” outcomes in which an algorithm incorrectly matches a face from the video with an image in the gallery. The report notes that deployers of face identification technologies must consider this problem, particularly in crowded settings in which the vast majority of individuals in the video may be absent from the gallery.

The report states that accuracy in these video-based applications may approach that of still-photo face recognition, but only if image collection can be improved. To this end, the report provides guidance to a wide group of individuals involved with the technology, from algorithm developers to system designers. In addition, the report can inform policymakers’ decisions regarding the use of these systems.

Algorithm designs can be improved by requiring high levels of accuracy to avoid false matches, according to the guidance. Limiting the gallery size and using only high-quality images are other suggestions. For example, when using video algorithms for access control to a secure building or transportation, Grother recommends keeping only the necessary individuals in the gallery. Using only good still photos for matching is another key point.

The report also endorses using a multidisciplinary team of experts to design systems that capture high-quality video images. Experts in videography can determine optimal lighting and optics, camera positioning, and mounting.

The NIST document provides guidance for researchers to consider when assessing the deployment of video face identification systems. Accuracy, as important as it is, is not the only factor to analyze when considering the deployment of video face recognition, according to Grother. Other concerns include the costs of computer processing time and having trained facial recognition experts on hand to ensure that the matches are accurate. Implementers also need to study network infrastructure and scalability, which is the ability of its software to work easily on small datasets as well as large ones.

“Whether video is appropriate for a particular facial identification application requires quantitative analysis and design—and the FIVE report aims to inform those processes,” Grother said.