Hardware Considerations when Architecting a Face Recognition System

As the capabilities of automated face recognition algorithms continue to skyrocket, so does the number of face recognition (FR) applications being deployed. Whether it is using FR to unlock a phone, create an investigative lead to help identify a violent criminal, enable low-income persons to open a bank account online, or perform visitor management at a courthouse, more and more face recognition applications continue to be developed by dozens of different system integrators. However, depending on the application, different system architectures and software requirements will be needed. And, depending on the architecture, different algorithm requirements will emerge. This article will discuss these requirements across different face recognition applications so that readers can select the proper FR algorithm when building FR systems.


Face Recognition ApplicationsThere are three primary use-cases for FR technology:

Identity Verification 1:1

Identity verification (1:1) is the process of validating a person against a claimed identity. For example, the person will claim to the system they are “John Doe”. The system would take a photo of the person (the face “presentation”), generate an FR template from the photo, and compare it against the template on file for “John Doe”. If the presented identity matches the reference identity, then access is granted. This could mean a door opens, a bank account is accessed, or a phone is unlocked. It is important to note that face can be one of multiple authentication factors used for identity verification (e.g., passwords, or tokens).

USE CASES

Bank Account Access
Secure Facility Access
Phone Unlock
Tax Return Filing

Analyst Driven Search1:N

Analyst driven search (1:N) is the process of manually searching a face image (a “probe”) against a database of pre-processed FR templates (a “gallery”). For example, in a criminal investigation, an image of a suspect may be obtained from a variety of sources, such as a still frame from a security camera, an online photo, or picture captured by a witness. This probe photo of the criminal suspect would then be manually uploaded for search. In turn, a template would be created from the probe image, and it would then be compared against all the templates in the gallery database. After comparing the probe to the gallery, the most similar matching images in the gallery would be presented to the analyst for manual adjudication. This process is labor intensive in that the FR system is merely a filtering tool that will reduce the size of the database. A significant amount of time and effort is needed for the manual adjudication process.

USE CASES

Identification of a Bank Robber from a surveillance video frame.
Identification of an Assaulter from their online dating profile.
Identification of a Hit & Run Suspect from a bystander’s cell phone camera.

Automated Search1:N+1

Automated search (1:N+1) is typically performed in high-throughput applications such as traveler screening or video analytics. For example, a human trafficking investigation may require anlayzing terabytes of images and video to identity the different persons present (both victims and perpetrators). This step involves generating templates for every input image processed. Or, in the case of live streaming video, templates are generated at a rate of roughly five (5) video frames per second (FPS). For video, after the templates have been generated they are often clustered into the different identities present. This clustering and tracking step involves cross-comparing all the templates. Without an efficient template comparison speed and clustering algorithm, this process can be very time consuming and generally grows exponentially in time as a function of the number of templates being clustered. Each clustered identity, or each individual template if no clustering is performed, is then searched against available watch-list galleries. Any probe template that matches a gallery template beyond a predetermined similarity threshold will trigger an identity match alert. Or, in the case of a passenger screening for an automated search application, either a single image is manually captured by an operator, or a live video stream of a passenger is captured and automatically distilled down to a single representative photograph. In the case of the single image, the face image is captured, analyzed for quality conformance (e.g., using an automated quality metric and / or validation of ICAO compliance), and templatized. In the case of live video, five (5) to ten (10) FPS needs to be captured and templatized, followed by identity tracking and grouping, and finally cross-comparing templates from the recent collection sequence and possibly applying spatio-temporal constraints. The template for each passenger being screened can then be compared against multiple galleries, such as a passenger manifest or No Fly List. Any probe template that matches a gallery template beyond a predetermined similarity threshold will trigger an identity match alert. Or, in the case of the passenger manifest, if the presented passenger identity does not match any person in the manifest, a match alert would occur.

USE CASES

Bank Account Access
Secure Facility Access
Phone Unlock
Tax Return Filing

Hardware Considerations

Algorithm Efficiency

Previous Rank One articles have provided significant insights into the various efficiency metrics that influence an FR algorithm’s deployability. For new readers we highly recommend reading those articles, particularly our initial article on the topic. To summarize these metrics:

  • Template generation speed is the time needed to initially process a face image or video frame.
  • Template size is the memory required to represent face features of a processed face image.
  • Comparison speed is the time needed to measure the similarity between two face templates.
  • Binary size is the amount of memory needed to load an algorithm’s model files and software libraries.

The performance of an FR algorithm across these metrics will dictate whether or not they can run on a given hardware system. And, across the FR industry there is a tremendous amount of variation in efficiency metrics across different vendors. The following graphic demonstrates how different metrics can influence the amount of CPU throughput or memory needed for a hardware system:

Hardware Components

Different hardware and network resources may be available or desired for a given application. The common architectural components are:

  • Persistent server / desktop – low quantity, high cost, high processing power and memory. These systems will typically host FR libraries and/or system software. These systems will typically have server grade x64 processors and potentially GPU processors.
  • Embedded device – low-cost, high quantity devices with limited processing power and memory that can either host FR libraries on-edge or operate as a “thin-client” that passes imagery to a server or cloud system for processing. These systems typically have mobile grade ARM processors and potentially Neural Processing Units (NPU’s).
  • Scalable cloud – arrays of server resources abstracted through a cloud resource management system.
  • Network – communication channels between devices. Networks will have varying amounts of bandwidth depending on their properties.

Depending on the application and available hardware resources, different FR system architectures need to be deployed. And, depending on the architecture used, different FR algorithm efficiency requirements will emerge. This is because of the differences in processing and memory resources across these different hardware systems:

Note that this article does not specifically cover GPU acceleration, whether through a traditional NVIDIA CUDA-enabled GPU or an embedded Neural Processing Unit (NPU), but readers can assign such hardware components to the “processor” category. The main distinction is that GPU acceleration generally decreases the throughput-cost for CPU dependent applications.

Architecture Options

In this remainder of this article we will walk through the various architectures that are encountered when developing a face recognition system and the algorithm efficiency considerations for each architecture will then be discussed. Persistent server and Desktop systems, Server and desktop systems are typically used in analyst driven applications, such as forensic analysis of digital media evidence, systems with predictable workloads such as an identity document agency (e.g., a DMV), or high-value systems with infrequent use (e.g., a law enforcement search system). These systems will typically stay installed on the same computer for several years at a time.

Advantages

  • Hardware flexibility
  • Predictable cost
  • Predictable throughput
  • High throughput

Disadvantages

  • Hardware cost
  • Lack of redundancy
  • Lack of scalability
  • Lack of portability

Algorithm limitations when using a persistent server:

Identity Verification – 1:1

  • Slow template generation speed will reduce throughput/system response time
  • Large binary size will impact system restart speed
  • High hardware cost
  • Powerful network needed for decentralized sensors

Manual Identification – 1:N

  • Large template size will require significant memory resources
  • High template generation speed will delay search results
  • High comparison speed will delay search results

Manual Identification – 1:N+1

  • High template generation speed will reduce throughput (e.g., video processing)
  • Large template and binary sizes will require significant memory resources

Embedded Devices

Embedded devices such as a phone or consumer electronic device are low cost and highly capable when running properly designed software. There are fundamental limits on what can be achieved by an embedded processor (e.g., ARM) and thus template generation speed and template size can play a major role in FR system requirements.

Advantages

  • Low hardware cost
  • Portability

Disadvantages

  • Limited hardware capacity
  • Limited power resources
  • Requires highly efficient algorithms

Algorithm limitations per FR application when using embedded devices

Identity Verification – 1:1

  • Slow template generation speed will cause major latency (> 3 seconds)
  • Large binary size will occupy a high percentage of available memory

Manual Identification – 1:N

  • Template size must be very small due to memory limits
  • High template generation speed will significantly delay search results
  • High comparison speed will significantly delay search results
  • Large binary size will occupy a high percentage of available memory

Manual Identification – 1:N+1

  • Slow template generation speed will render video processing impossible
  • Template size must be very small due to memory limits
  • Large binary sizes will exasperate memory resources

Scalable Cloud

A scalable cloud architecture, such as Kubernetes, running on a scalable cloud hardware provider, can be highly valuable for application workflows that have varied and unpredictable throughputs.

Advantages

  • Highly scalable
  • Pay per usage
  • Redundancy
  • Fault tolerance

Disadvantages

  • Latency to instantiate new nodes
  • Memory limitations
  • Higher cost to initially implement

Algorithm limitations per FR application when using the cloud

Identity Verification – 1:1

  • Large binary size will slow container instantiation time
  • Poor network bandwidth will delay image transmission
  • Slow template generation speed will reduce throughput / system response time

Manual Identification – 1:N

  • NOT ADVISED TYPICALLY
  • Large template size or large number of templates will make container instantiation very slow; as such, not typically advised
  • Gallery size is typically too large to instantiate containers in less than 30 seconds

Manual Identification – 1:N+1

  • Slow template generation speed makes video processing expensive
  • Large template size, large number of templates, and/or large binary size will make container instantiation very slow
  • Poor network bandwidth will prevent video transmission to the cloud

Summary

There are a wide range of considerations when building and deploying a face recognition system. This article walked through such considerations related to what hardware is being used to deploy such a system, and the various algorithm properties that are needed to run effectively on such hardware. Such an understanding is critical because while the majority of marketing focus on face recognition algorithms is on accuracy, the top 100 performers in NIST FRVT are often separated by less than a 1% in accuracy. By contrast, the efficiency of an algorithm can vary by 5x to 10x and can be make-or-break when it comes to the successful deployment of a face recognition system. The Rank One algorithm is the only Western friendly vendor to consistently achieve top performance marks in both FR algorithm accuracy and efficiency. As such, regardless of the FR application or the available hardware resources, the ROC SDK is an ideal backbone for any FR system configuration. Contact our team today to begin your trial of our industry leading face recognition algorithms and software libraries!