Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

C++ Face Recognition demo #3685

Open
wants to merge 10 commits into
base: master
Choose a base branch
from
Open
12 changes: 12 additions & 0 deletions demos/face_recognition_demo/cpp/CMakeLists.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1,12 @@
# Copyright (C) 2023 KNS Group LLC (YADRO)
# SPDX-License-Identifier: Apache-2.0
#

file(GLOB_RECURSE SOURCES ${CMAKE_CURRENT_SOURCE_DIR}/*.cpp)
file(GLOB_RECURSE HEADERS ${CMAKE_CURRENT_SOURCE_DIR}/*.hpp)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Python version draws landmarks. Is it intentional not to draw them here?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, just thought it wasn't needed. Add rendering them?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, if you don't mind :)

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

added


add_demo(NAME face_recognition_demo
SOURCES ${SOURCES}
HEADERS ${HEADERS}
INCLUDE_DIRECTORIES "${CMAKE_CURRENT_SOURCE_DIR}/include"
DEPENDENCIES monitors utils)
190 changes: 190 additions & 0 deletions demos/face_recognition_demo/cpp/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,190 @@
# Face Recognition C++ Demo

![](./face_recognition_demo.gif)

This example demonstrates an approach to create interactive applications
for video processing. It shows the basic architecture for building model
pipelines supporting model placement on different devices and simultaneous
parallel or sequential execution using OpenVINO library.
In particular, this demo uses 4 models to build a pipeline able to detect
faces on videos, their keypoints (aka "landmarks"),
recognize persons using the provided faces database (the gallery) and estimate probabilities
whether spoof or real persons on video or image.
The following pretrained models can be used:

* `face-detection-retail-0004` and `face-detection-adas-0001`, to detect faces and predict their bounding boxes;
* `landmarks-regression-retail-0009`, to predict face keypoints;
* `face-reidentification-retail-0095`, `Sphereface`, `facenet-20180408-102900` or `face-recognition-resnet100-arcface-onnx` to recognize persons.
* `anti-spoof-mn3`, which is executed on top of the results of the detection model and reports estimated probability whether spoof or real face is shown

## How it works

The application is invoked from command line. It reads the specified input
video stream frame-by-frame, be it a camera device or a video file,
and performs independent analysis of each frame. In order to make predictions
the application deploys 4 models on the specified devices using OpenVINO
library and runs them in asynchronous manner.
There are 3 user modes for this demo application:
1. Only face detetion. In this mode, an input frame is processed by the face detection model to predict face bounding boxes.
To do face detection, use only `-mfd` flag. Example:
```
./face_recognition_demo \
-i <path_to_video>/input_video.mp4 \
-mfd <path_to_model>/face-detection-retail-0004.xml
```
2. Face Recognition mode. In this case after face detection, face keypoints
are predicted by the Landmarks model and as final step face recognition model uses keypoints to align faces
and match found faces with faces from face gallery, which should be defined by user.
So, in this mode user should provide 3 flags `-mfd`, `-mlm`, `-mreid`. Example:
```
./face_recognition_demo
-i <path_to_video>/input_video.mp4
-mfd <path_to_model>/face-detection-retail-0004.xml
-mlm <path_to_model>/landmarks-regression-retail-0009.xml
-mreid <path_to_model>/face-reidentification-retail-0095.xml
-fg "/home/face_gallery"
```
3. With Anti-Spoof model. In this case 4 models are working and for all recognized faces demo applies anti-spoof model,
which estimate probability whether spoof or real faces on video.
So, in this mode user should provide 3 flags `-mfd`, `-mlm`, `-mreid`. Example:
```
./face_recognition_demo
-i <path_to_video>/input_video.mp4
-mfd <path_to_model>/face-detection-retail-0004.xml
-mlm <path_to_model>/landmarks-regression-retail-0009.xml
-mreid <path_to_model>/face-reidentification-retail-0095.xml
-mas <path_to_model>/anti-spoof-mn3.xml
-fg "/home/face_gallery"
```

After all computations the processing results are
visualized and displayed on the screen or written to the output file.

> **NOTE**: By default, Open Model Zoo demos expect input with BGR channels order. If you trained your model to work with RGB order, you need to manually rearrange the default channels order in the demo application or reconvert your model using the Model Optimizer tool with the `--reverse_input_channels` argument specified. For more information about the argument, refer to **When to Reverse Input Channels** section of [Embedding Preprocessing Computation](@ref openvino_docs_MO_DG_Additional_Optimization_Use_Cases).

## Preparing to Run

For demo input image or video files, refer to the section **Media Files Available for Demos** in the [Open Model Zoo Demos Overview](../../README.md).
The list of models supported by the demo is in `<omz_dir>/demos/face_recognition_demo/python/models.lst` file.
This file can be used as a parameter for [Model Downloader](../../../tools/model_tools/README.md) and Converter to download and, if necessary, convert models to OpenVINO IR format (\*.xml + \*.bin).

An example of using the Model Downloader:

```sh
omz_downloader --list models.lst
```

An example of using the Model Converter:

```sh
omz_converter --list models.lst
```

### Supported Models

* face-detection-adas-0001
* face-detection-retail-0004
* face-recognition-resnet100-arcface-onnx
* face-reidentification-retail-0095
* facenet-20180408-102900
* landmarks-regression-retail-0009
* Sphereface
* anti-spoof-mn3

> **NOTE**: Refer to the tables [Intel's Pre-Trained Models Device Support](../../../models/intel/device_support.md) and [Public Pre-Trained Models Device Support](../../../models/public/device_support.md) for the details on models inference support at different devices.

### Creating a gallery for face recognition

To recognize faces the application uses a face database, or a gallery.
The gallery is a folder with images of persons. Each image in the gallery can
be of arbitrary size and should contain one or more frontally-oriented faces
with decent quality. There are allowed multiple images of the same person, but
the naming format in that case should be specific - `{id}-{num_of_instance}.jpg`.
For example, there could be images `Paul-0.jpg`, `Paul-1.jpg` etc.
and they all will be treated as images of the same person. In case when there
is one image per person, you can use format `{id}.jpg` (e.g. `Paul.jpg`).
The application can build gallery while working, that is
controlled by `--allow_grow` flag. In that mode the user will
be asked if he wants to add a specific image to the images gallery (and it
leads to automatic dumping images to the same folder on disk). If it is, then
the user should specify the name for the image in the open window and press
`Enter`. If it's not, then press `Escape`. The user may add multiple images of
the same person by setting the same name in the open window. However, the
resulting gallery needs to be checked more thoroughly, since a face detector can
fail and produce poor crops.

Image file name is used as a person name during the visualization.
Use the following name convention: `person_N_name.png` or `person_N_name.jpg`.

## Running

Running the application with the `-h` option:

```
[ -h] show the help message and exit
[--help] print help on all arguments
[ -i <INPUT>] an input to process. The input must be a single image, a folder of images, video file or camera id. Default is 0
--mfd <MODEL FILE> path to the Face Detection model (.xml) file.
[--mlm <MODEL FILE>] path to the Facial Landmarks Regression Retail model (.xml) file
[--mreid <MODEL FILE>] path to the Face Recognition model (.xml) file.
[--mas <MODEL FILE>] path to the Antispoofing Classification model (.xml) file.
[--t_fd <NUMBER>] probability threshold for face detections. Default is 0.5
[--input_shape <STRING>] specify the input shape for detection network in (width x height) format. Input of model will be reshaped according specified shape.Example: 1280x720. Shape of network input used by default.
[--t_reid <NUMBER>] cosine distance threshold between two vectors for face reidentification. Default is 0.7
[--exp <NUMBER>] expand ratio for bbox before face recognition. Default is 1.0
[--greedy_reid_matching] ([--nogreedy_reid_matching])(don't) use faster greedy matching algorithm in face reid.
[--fg <GALLERY PATH>] path to a faces gallery directory.
[--allow_grow] ([--noallow_grow]) (dont't) allow to grow faces gallery and to dump on disk.
[--crop_gallery] ([--nocrop_gallery]) (dont't) crop images during faces gallery creation.
[--dfd <DEVICE>] specify a device Face Detection model to infer on (the list of available devices is shown below). Use '-d HETERO:<comma-separated_devices_list>' format to specify HETERO plugin. Use '-d MULTI:<comma-separated_devices_list>' format to specify MULTI plugin. Default is CPU
[--dlm <DEVICE>] specify a device for Landmarks Regression model to infer on (the list of available devices is shown below). Use '-d HETERO:<comma-separated_devices_list>' format to specify HETERO plugin. Use '-d MULTI:<comma-separated_devices_list>' format to specify MULTI plugin. Default is CPU
[--dreid <DEVICE>] specify a target device for Face Reidentification model to infer on (the list of available devices is shown below). Use '-d HETERO:<comma-separated_devices_list>' format to specify HETERO plugin. Use '-d MULTI:<comma-separated_devices_list>' format to specify MULTI plugin. Default is CPU
[--das <DEVICE>] specify a device for Anti-spoofing model to infer on (the list of available devices is shown below). Use '-d HETERO:<comma-separated_devices_list>' format to specify HETERO plugin. Use '-d MULTI:<comma-separated_devices_list>' format to specify MULTI plugin. Default is CPU
[--lim <NUMBER>] number of frames to store in output. If 0 is set, all frames are stored. Default is 1000
[ -o <OUTPUT>] name of the output file(s) to save.
[--loop] enable reading the input in a loop
[--show] ([--noshow]) (don't) show output
[ -u <DEVICE>] resource utilization graphs. Default is cdm. c - average CPU load, d - load distribution over cores, m - memory usage, h - hide
Key bindings:
Q, q, Esc - Quit
P, p, 0, spacebar - Pause
C - average CPU load, D - load distribution over cores, M - memory usage, H - hide
```

Example of a valid command line to run the application:

``` sh

./face_recognition_demo \
-i <path_to_video>/input_video.mp4 \
-m_fd <path_to_model>/face-detection-retail-0004.xml \
-m_lm <path_to_model>/landmarks-regression-retail-0009.xml \
-m_reid <path_to_model>/face-reidentification-retail-0095.xml \
--verbose \
-fg "/home/face_gallery"
```

>**NOTE**: If you provide a single image as an input, the demo processes and renders it quickly, then exits. To continuously visualize inference results on the screen, apply the `loop` option, which enforces processing a single image in a loop.

You can save processed results to a Motion JPEG AVI file or separate JPEG or PNG files using the `-o` option:

* To save processed results in an AVI file, specify the name of the output file with `avi` extension, for example: `-o output.avi`.
* To save processed results as images, specify the template name of the output image file with `jpg` or `png` extension, for example: `-o output_%03d.jpg`. The actual file names are constructed from the template at runtime by replacing regular expression `%03d` with the frame number, resulting in the following: `output_000.jpg`, `output_001.jpg`, and so on.
To avoid disk space overrun in case of continuous input stream, like camera, you can limit the amount of data stored in the output file(s) with the `limit` option. The default value is 1000. To change it, you can apply the `-limit N` option, where `N` is the number of frames to store.

>**NOTE**: Windows\* systems may not have the Motion JPEG codec installed by default. If this is the case, you can download OpenCV FFMPEG back end using the PowerShell script provided with the OpenVINO &trade; install package and located at `<INSTALL_DIR>/opencv/ffmpeg-download.ps1`. The script should be run with administrative privileges if OpenVINO &trade; is installed in a system protected folder (this is a typical case). Alternatively, you can save results as images.

## Demo output

The demo uses OpenCV window to display the resulting video frame and detections.
The demo reports

* **FPS**: average rate of video frame processing (frames per second).
* **Latency**: average time required to process one frame (from reading the frame to displaying the results).
You can use both of these metrics to measure application-level performance.

## See also

* [Open Model Zoo Demos](../../README.md)
* [Model Optimizer](https://docs.openvino.ai/latest/openvino_docs_MO_DG_Deep_Learning_Model_Optimizer_DevGuide.html)
* [Model Downloader](../../../tools/model_tools/README.md)
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
155 changes: 155 additions & 0 deletions demos/face_recognition_demo/cpp/include/api.hpp
Original file line number Diff line number Diff line change
@@ -0,0 +1,155 @@
// Copyright (C) 2023 KNS Group LLC (YADRO)
// SPDX-License-Identifier: Apache-2.0
//

#pragma once

#include "models.hpp"
#include "reid_gallery.hpp"

#include <opencv2/imgproc.hpp>
#include <openvino/openvino.hpp>
#include <vector>

#include "utils/ocv_common.hpp"
#include "utils/image_utils.h"

// Classes for using in main
struct Result {
cv::Rect face;
size_t id;
float distance;
std::string label;
bool real;
Result(cv::Rect face, size_t id,
float distance, const std::string& label, bool real = true) :
face(face), id(id), distance(distance), label(label), real(real) {}
};

class FaceRecognizer {
public:
virtual ~FaceRecognizer() = default;

virtual std::vector<Result> recognize(const cv::Mat& frame, const std::vector<FaceBox>& faces) = 0;
};

class FaceRecognizerDefault : public FaceRecognizer {
public:
static constexpr int maxNumRequests = 16;
FaceRecognizerDefault(
const BaseConfig& landmarksDetectorConfig,
const BaseConfig& reidConfig,
const DetectorConfig& faceRegistrationDetConfig,
const std::string& faceGalleryPath,
const double reidThreshold,
const bool cropGallery,
const bool allowGrow,
const bool greedyReidMatching) :
allowGrow(allowGrow),
landmarksDetector(landmarksDetectorConfig),
faceReid(reidConfig),
faceGallery(faceGalleryPath, reidThreshold, cropGallery,
faceRegistrationDetConfig, landmarksDetector, faceReid,
greedyReidMatching)
{};

std::vector<Result> recognize(const cv::Mat& frame, const std::vector<FaceBox>& faces) {
cv::Mat origImg = frame.clone();

std::vector<cv::Mat> landmarks;
std::vector<cv::Mat> embeddings;
std::vector<cv::Mat> faceRois;

auto faceRoi = [&](const FaceBox& face) {
return frame(face.face);
};
int numFaces = faces.size();
if (numFaces < maxNumRequests) {
std::transform(faces.begin(), faces.end(), std::back_inserter(faceRois), faceRoi);
landmarks = landmarksDetector.infer(faceRois);
alignFaces(faceRois, landmarks);
embeddings = faceReid.infer(faceRois);
} else {
auto embedding = [&](cv::Mat& emb) { return emb; };
for (int n = numFaces; n > 0; n -= maxNumRequests) {
landmarks.clear();
faceRois.clear();
size_t start_idx = size_t(numFaces) - n;
size_t end_idx = start_idx + std::min(numFaces, maxNumRequests);
std::transform(faces.begin() + start_idx, faces.begin() + end_idx, std::back_inserter(faceRois), faceRoi);

landmarks = landmarksDetector.infer(faceRois);
alignFaces(faceRois, landmarks);

std::vector<cv::Mat> tmpEmbeddings = faceReid.infer(faceRois);

std::transform(tmpEmbeddings.begin(), tmpEmbeddings.end(), std::back_inserter(embeddings), embedding);
}
}
std::vector<std::pair<int, float>> matches = faceGallery.getIDsByEmbeddings(embeddings);
std::vector<Result> results;
for (size_t faceIndex = 0; faceIndex < faces.size(); ++faceIndex) {
if (matches[faceIndex].first == EmbeddingsGallery::unknownId && allowGrow) {
std::string personName = faceGallery.tryToSave(origImg(faces[faceIndex].face));
if (personName != "")
faceGallery.addFace(origImg(faces[faceIndex].face), embeddings[faceIndex], personName);
}
results.emplace_back(faces[faceIndex].face, matches[faceIndex].first,
matches[faceIndex].second, faceGallery.getLabelByID(matches[faceIndex].first));
}
return results;
}
protected:
bool allowGrow;
AsyncModel landmarksDetector;
AsyncModel faceReid;
EmbeddingsGallery faceGallery;
};

class AntiSpoofer {
public:
static constexpr int maxNumRequests = 16;
AntiSpoofer(const BaseConfig& antiSpoofConfig, const float spoofThreshold=40.0) :
antiSpoof(antiSpoofConfig), spoofThreshold(spoofThreshold)
{}

void process(const cv::Mat& frame, const std::vector<FaceBox>& faces, std::vector<Result>& results) {
if (!antiSpoof.enabled()) {
return;
}
cv::Mat origImg = frame.clone();

std::vector<cv::Mat> faceRois;
std::vector<cv::Mat> spoofs;

auto faceRoi = [&](const FaceBox& face) {
return frame(face.face);
};
int numFaces = faces.size();
if (numFaces < maxNumRequests) {
std::transform(faces.begin(), faces.end(), std::back_inserter(faceRois), faceRoi);
spoofs = antiSpoof.infer(faceRois);
} else {
auto func = [&](cv::Mat& spoof) { return spoof; };
for (int n = numFaces; n > 0; n -= maxNumRequests) {
faceRois.clear();
size_t startIdx = size_t(numFaces) - n;
size_t endIdx = startIdx + std::min(numFaces, maxNumRequests);
std::transform(faces.begin() + startIdx, faces.begin() + endIdx, std::back_inserter(faceRois), faceRoi);
std::vector<cv::Mat> tmpSpoofs = antiSpoof.infer(faceRois);
std::transform(tmpSpoofs.begin(), tmpSpoofs.end(), std::back_inserter(spoofs), func);
}
}
for (size_t faceIndex = 0; faceIndex < faces.size(); ++faceIndex) {
results[faceIndex].real = isReal(spoofs[faceIndex]);
}
}
private:
AsyncModel antiSpoof;
float spoofThreshold;

bool isReal(cv::Mat& spoof) {
float probability = spoof.at<float>(0) * 100;
return probability > spoofThreshold;
}
};
Loading