Lab Manager | Run Your Lab Like a Business

How Well Do Facial Recognition Algorithms Cope with a Million Strangers?

All of the algorithms suffered in accuracy when confronted with more distractions, but some fared much better than others

by University of Washington
Register for free to listen to this article
Listen with Speechify

MegaFace datasetThe MegaFace dataset contains 1 million images representing more than 690,000 unique people. It is the first benchmark that tests facial recognition algorithms at a million scale.Collage credit: University of WashingtonIn the last few years, several groups have announced that their facial recognition systems have achieved near-perfect accuracy rates, performing better than humans at picking the same face out of the crowd.

But those tests were performed on a dataset with only 13,000 images—fewer people than attend an average professional U.S. soccer game. What happens to their performance as those crowds grow to the size of a major U.S. city?

University of Washington researchers answered that question with the MegaFace Challenge, the world’s first competition aimed at evaluating and improving the performance of face recognition algorithms at the million person scale. All of the algorithms suffered in accuracy when confronted with more distractions, but some fared much better than others.

“We need to test facial recognition on a planetary scale to enable practical applications—testing on a larger scale lets you discover the flaws and successes of recognition algorithms,” said Ira Kemelmacher-Shlizerman, a UW assistant professor of computer science and the project’s principal investigator. “We can’t just test it on a very small scale and say it works perfectly.”

The UW team first developed a dataset with one million Flickr images from around the world that are publicly available under a Creative Commons license, representing 690,572 unique individuals. Then they challenged facial recognition teams to download the database and see how their algorithms performed when they had to distinguish between a million possible matches.

Google’s FaceNet showed the strongest performance on one test, dropping from near-perfect accuracy when confronted with a smaller number of images to 75 percent on the million person test. A team from Russia’s N-TechLab came out on top on another test set, dropping to 73 percent.

Facial recognition algorithms that fared well with 10,000 distracting imagesFacial recognition algorithms that fared well with 10,000 distracting images all experienced a drop in accuracy when confronted with 1 million images. But some performed much better than others.Image credit: University of Washington

By contrast, the accuracy rates of other algorithms that had performed well at a small scale dropped by much larger percentages to as low as 33 percent accuracy when confronted with the harder task.

Initial results are detailed in a paper to be presented at the IEEE Conference on Computer Vision and Pattern Recognition (CVPR 2016) June 30, and ongoing results are updated on the project website. More than 300 research groups are working with MegaFace.

The MegaFace challenge tested the algorithms on verification, or how well they could correctly identify whether two photos were of the same person. That’s how an iPhone security feature, for instance, could recognize your face and decide whether to unlock your phone instead of asking you to type in a password.

Related Article: Who's Your Daddy? UCF Students Program Computer to Find Out

“What happens if you lose your phone in a train station in Amsterdam and someone tries to steal it?” said Kemelmacher-Shlizerman, who co-leads the UW Graphics and Imaging Laboratory (GRAIL.) “I’d want certainty that my phone can correctly identify me out of a million people—or 7 billion—not just 10,000 or so.”

 facial recognition challengesThe MegaFace challenge highlights problems in facial recognition that have yet to be fully solved—such as identifying the same person at different ages and recognizing someone in different poses.Images courtesy of the University of WashingtonThey also tested the algorithms on identification, or how accurately they could find a match to the photo of a single individual to a different photo of the same person buried among a million “distractors.” That’s what happens, for instance, when law enforcement have a single photograph of a criminal suspect and are combing through images taken on a subway platform or airport to see if the person is trying to escape.

“You can see where the hard problems are—recognizing people across different ages is an unsolved problem. So is identifying people from their doppelgängers  and matching people who are in varying poses like side views to frontal views,” said Kemelmacher-Shlizerman. The paper also analyses age and pose invariance in face recognition when evaluated at scale.

In general, algorithms that “learned” how to find correct matches out of larger image datasets outperformed those that only had access to smaller training datasets. But the SIAT MMLab algorithm developed by a research team from China, which learned on a smaller number of images, bucked that trend by outperforming many others.

The MegaFace challenge is ongoing and still accepting results.

The team’s next steps include assembling a half a million identities—each with a number of photographs—for a dataset that will be used to train facial recognition algorithms. This will help level the playing field and test which algorithms outperform others given the same amount of large scale training data, as most researchers don’t have access to image collections as large as Google’s or Facebook’s. The training set will be released towards the end of the summer.

“State-of-the-art deep neural network algorithms have millions of parameters to learn and require a plethora of examples to accurately tune them,” said Aaron Nech, a UW computer science and engineering master’s student working on the training dataset. “Unlike people, these models are initially a blank slate. Having diversity in the data, such as the intricate identity cues found across more than 500,000 unique individuals, can increase algorithm performance by providing examples of situations not yet seen.”

The research was funded by the National Science Foundation, Intel, Samsung, Google, and the University of Washington Animation Research Labs.

Co-authors include UW computer science and engineering professor Steve Seitz, undergraduate student and web developer Evan Brossard and former student Daniel Miller.