diff --git a/demos/face_recognition_demo/cpp/CMakeLists.txt b/demos/face_recognition_demo/cpp/CMakeLists.txt new file mode 100644 index 00000000000..ed46e54ebc8 --- /dev/null +++ b/demos/face_recognition_demo/cpp/CMakeLists.txt @@ -0,0 +1,12 @@ +# Copyright (C) 2023 KNS Group LLC (YADRO) +# SPDX-License-Identifier: Apache-2.0 +# + +file(GLOB_RECURSE SOURCES ${CMAKE_CURRENT_SOURCE_DIR}/*.cpp) +file(GLOB_RECURSE HEADERS ${CMAKE_CURRENT_SOURCE_DIR}/*.hpp) + +add_demo(NAME face_recognition_demo + SOURCES ${SOURCES} + HEADERS ${HEADERS} + INCLUDE_DIRECTORIES "${CMAKE_CURRENT_SOURCE_DIR}/include" + DEPENDENCIES monitors utils) diff --git a/demos/face_recognition_demo/cpp/README.md b/demos/face_recognition_demo/cpp/README.md new file mode 100644 index 00000000000..3452a86ce07 --- /dev/null +++ b/demos/face_recognition_demo/cpp/README.md @@ -0,0 +1,190 @@ +# Face Recognition C++ Demo + +![](./face_recognition_demo.gif) + +This example demonstrates an approach to create interactive applications +for video processing. It shows the basic architecture for building model +pipelines supporting model placement on different devices and simultaneous +parallel or sequential execution using OpenVINO library. +In particular, this demo uses 4 models to build a pipeline able to detect +faces on videos, their keypoints (aka "landmarks"), +recognize persons using the provided faces database (the gallery) and estimate probabilities +whether spoof or real persons on video or image. +The following pretrained models can be used: + +* `face-detection-retail-0004` and `face-detection-adas-0001`, to detect faces and predict their bounding boxes; +* `landmarks-regression-retail-0009`, to predict face keypoints; +* `face-reidentification-retail-0095`, `Sphereface`, `facenet-20180408-102900` or `face-recognition-resnet100-arcface-onnx` to recognize persons. +* `anti-spoof-mn3`, which is executed on top of the results of the detection model and reports estimated probability whether spoof or real face is shown + +## How it works + +The application is invoked from command line. It reads the specified input +video stream frame-by-frame, be it a camera device or a video file, +and performs independent analysis of each frame. In order to make predictions +the application deploys 4 models on the specified devices using OpenVINO +library and runs them in asynchronous manner. +There are 3 user modes for this demo application: +1. Only face detetion. In this mode, an input frame is processed by the face detection model to predict face bounding boxes. + To do face detection, use only `-mfd` flag. Example: + ``` + ./face_recognition_demo \ + -i /input_video.mp4 \ + -mfd /face-detection-retail-0004.xml + ``` +2. Face Recognition mode. In this case after face detection, face keypoints + are predicted by the Landmarks model and as final step face recognition model uses keypoints to align faces + and match found faces with faces from face gallery, which should be defined by user. + So, in this mode user should provide 3 flags `-mfd`, `-mlm`, `-mreid`. Example: + ``` + ./face_recognition_demo + -i /input_video.mp4 + -mfd /face-detection-retail-0004.xml + -mlm /landmarks-regression-retail-0009.xml + -mreid /face-reidentification-retail-0095.xml + -fg "/home/face_gallery" + ``` +3. With Anti-Spoof model. In this case 4 models are working and for all recognized faces demo applies anti-spoof model, + which estimate probability whether spoof or real faces on video. + So, in this mode user should provide 3 flags `-mfd`, `-mlm`, `-mreid`. Example: + ``` + ./face_recognition_demo + -i /input_video.mp4 + -mfd /face-detection-retail-0004.xml + -mlm /landmarks-regression-retail-0009.xml + -mreid /face-reidentification-retail-0095.xml + -mas /anti-spoof-mn3.xml + -fg "/home/face_gallery" + ``` + +After all computations the processing results are +visualized and displayed on the screen or written to the output file. + +> **NOTE**: By default, Open Model Zoo demos expect input with BGR channels order. If you trained your model to work with RGB order, you need to manually rearrange the default channels order in the demo application or reconvert your model using the Model Optimizer tool with the `--reverse_input_channels` argument specified. For more information about the argument, refer to **When to Reverse Input Channels** section of [Embedding Preprocessing Computation](@ref openvino_docs_MO_DG_Additional_Optimization_Use_Cases). + +## Preparing to Run + +For demo input image or video files, refer to the section **Media Files Available for Demos** in the [Open Model Zoo Demos Overview](../../README.md). +The list of models supported by the demo is in `/demos/face_recognition_demo/python/models.lst` file. +This file can be used as a parameter for [Model Downloader](../../../tools/model_tools/README.md) and Converter to download and, if necessary, convert models to OpenVINO IR format (\*.xml + \*.bin). + +An example of using the Model Downloader: + +```sh +omz_downloader --list models.lst +``` + +An example of using the Model Converter: + +```sh +omz_converter --list models.lst +``` + +### Supported Models + +* face-detection-adas-0001 +* face-detection-retail-0004 +* face-recognition-resnet100-arcface-onnx +* face-reidentification-retail-0095 +* facenet-20180408-102900 +* landmarks-regression-retail-0009 +* Sphereface +* anti-spoof-mn3 + +> **NOTE**: Refer to the tables [Intel's Pre-Trained Models Device Support](../../../models/intel/device_support.md) and [Public Pre-Trained Models Device Support](../../../models/public/device_support.md) for the details on models inference support at different devices. + +### Creating a gallery for face recognition + +To recognize faces the application uses a face database, or a gallery. +The gallery is a folder with images of persons. Each image in the gallery can +be of arbitrary size and should contain one or more frontally-oriented faces +with decent quality. There are allowed multiple images of the same person, but +the naming format in that case should be specific - `{id}-{num_of_instance}.jpg`. +For example, there could be images `Paul-0.jpg`, `Paul-1.jpg` etc. +and they all will be treated as images of the same person. In case when there +is one image per person, you can use format `{id}.jpg` (e.g. `Paul.jpg`). +The application can build gallery while working, that is +controlled by `--allow_grow` flag. In that mode the user will +be asked if he wants to add a specific image to the images gallery (and it +leads to automatic dumping images to the same folder on disk). If it is, then +the user should specify the name for the image in the open window and press +`Enter`. If it's not, then press `Escape`. The user may add multiple images of +the same person by setting the same name in the open window. However, the +resulting gallery needs to be checked more thoroughly, since a face detector can +fail and produce poor crops. + +Image file name is used as a person name during the visualization. +Use the following name convention: `person_N_name.png` or `person_N_name.jpg`. + +## Running + +Running the application with the `-h` option: + +``` + [ -h] show the help message and exit + [--help] print help on all arguments + [ -i ] an input to process. The input must be a single image, a folder of images, video file or camera id. Default is 0 + --mfd path to the Face Detection model (.xml) file. + [--mlm ] path to the Facial Landmarks Regression Retail model (.xml) file + [--mreid ] path to the Face Recognition model (.xml) file. + [--mas ] path to the Antispoofing Classification model (.xml) file. + [--t_fd ] probability threshold for face detections. Default is 0.5 + [--input_shape ] specify the input shape for detection network in (width x height) format. Input of model will be reshaped according specified shape.Example: 1280x720. Shape of network input used by default. + [--t_reid ] cosine distance threshold between two vectors for face reidentification. Default is 0.7 + [--exp ] expand ratio for bbox before face recognition. Default is 1.0 + [--greedy_reid_matching] ([--nogreedy_reid_matching])(don't) use faster greedy matching algorithm in face reid. + [--fg ] path to a faces gallery directory. + [--allow_grow] ([--noallow_grow]) (dont't) allow to grow faces gallery and to dump on disk. + [--crop_gallery] ([--nocrop_gallery]) (dont't) crop images during faces gallery creation. + [--dfd ] specify a device Face Detection model to infer on (the list of available devices is shown below). Use '-d HETERO:' format to specify HETERO plugin. Use '-d MULTI:' format to specify MULTI plugin. Default is CPU + [--dlm ] specify a device for Landmarks Regression model to infer on (the list of available devices is shown below). Use '-d HETERO:' format to specify HETERO plugin. Use '-d MULTI:' format to specify MULTI plugin. Default is CPU + [--dreid ] specify a target device for Face Reidentification model to infer on (the list of available devices is shown below). Use '-d HETERO:' format to specify HETERO plugin. Use '-d MULTI:' format to specify MULTI plugin. Default is CPU + [--das ] specify a device for Anti-spoofing model to infer on (the list of available devices is shown below). Use '-d HETERO:' format to specify HETERO plugin. Use '-d MULTI:' format to specify MULTI plugin. Default is CPU + [--lim ] number of frames to store in output. If 0 is set, all frames are stored. Default is 1000 + [ -o ] name of the output file(s) to save. + [--loop] enable reading the input in a loop + [--show] ([--noshow]) (don't) show output + [ -u ] resource utilization graphs. Default is cdm. c - average CPU load, d - load distribution over cores, m - memory usage, h - hide + Key bindings: + Q, q, Esc - Quit + P, p, 0, spacebar - Pause + C - average CPU load, D - load distribution over cores, M - memory usage, H - hide +``` + +Example of a valid command line to run the application: + +``` sh + +./face_recognition_demo \ + -i /input_video.mp4 \ + -m_fd /face-detection-retail-0004.xml \ + -m_lm /landmarks-regression-retail-0009.xml \ + -m_reid /face-reidentification-retail-0095.xml \ + --verbose \ + -fg "/home/face_gallery" +``` + +>**NOTE**: If you provide a single image as an input, the demo processes and renders it quickly, then exits. To continuously visualize inference results on the screen, apply the `loop` option, which enforces processing a single image in a loop. + +You can save processed results to a Motion JPEG AVI file or separate JPEG or PNG files using the `-o` option: + +* To save processed results in an AVI file, specify the name of the output file with `avi` extension, for example: `-o output.avi`. +* To save processed results as images, specify the template name of the output image file with `jpg` or `png` extension, for example: `-o output_%03d.jpg`. The actual file names are constructed from the template at runtime by replacing regular expression `%03d` with the frame number, resulting in the following: `output_000.jpg`, `output_001.jpg`, and so on. +To avoid disk space overrun in case of continuous input stream, like camera, you can limit the amount of data stored in the output file(s) with the `limit` option. The default value is 1000. To change it, you can apply the `-limit N` option, where `N` is the number of frames to store. + +>**NOTE**: Windows\* systems may not have the Motion JPEG codec installed by default. If this is the case, you can download OpenCV FFMPEG back end using the PowerShell script provided with the OpenVINO ™ install package and located at `/opencv/ffmpeg-download.ps1`. The script should be run with administrative privileges if OpenVINO ™ is installed in a system protected folder (this is a typical case). Alternatively, you can save results as images. + +## Demo output + +The demo uses OpenCV window to display the resulting video frame and detections. +The demo reports + +* **FPS**: average rate of video frame processing (frames per second). +* **Latency**: average time required to process one frame (from reading the frame to displaying the results). +You can use both of these metrics to measure application-level performance. + +## See also + +* [Open Model Zoo Demos](../../README.md) +* [Model Optimizer](https://docs.openvino.ai/latest/openvino_docs_MO_DG_Deep_Learning_Model_Optimizer_DevGuide.html) +* [Model Downloader](../../../tools/model_tools/README.md) diff --git a/demos/face_recognition_demo/cpp/face_recognition_demo.gif b/demos/face_recognition_demo/cpp/face_recognition_demo.gif new file mode 100644 index 00000000000..14e9ce0fb6a Binary files /dev/null and b/demos/face_recognition_demo/cpp/face_recognition_demo.gif differ diff --git a/demos/face_recognition_demo/cpp/include/api.hpp b/demos/face_recognition_demo/cpp/include/api.hpp new file mode 100644 index 00000000000..ba35c661340 --- /dev/null +++ b/demos/face_recognition_demo/cpp/include/api.hpp @@ -0,0 +1,169 @@ +// Copyright (C) 2023 KNS Group LLC (YADRO) +// SPDX-License-Identifier: Apache-2.0 +// + +#pragma once + +#include "models.hpp" +#include "reid_gallery.hpp" + +#include +#include +#include + +#include "utils/ocv_common.hpp" +#include "utils/image_utils.h" + +// Classes for using in main +struct Result { + cv::Rect face; + std::vector landmarks; + size_t id; + float distance; + std::string label; + bool real; + Result(cv::Rect face, std::vector landmarks, size_t id, + float distance, const std::string& label, bool real = true) : + face(face), landmarks(landmarks), id(id), distance(distance), label(label), real(real) {} +}; + +class FaceRecognizer { +public: + virtual ~FaceRecognizer() = default; + + virtual std::vector recognize(const cv::Mat& frame, const std::vector& faces) = 0; +}; + +class FaceRecognizerDefault : public FaceRecognizer { +public: + static const int MAX_NUM_REQUESTS; + static const int LANDMARKS_NUM; + FaceRecognizerDefault( + const BaseConfig& landmarksDetectorConfig, + const BaseConfig& reidConfig, + const DetectorConfig& faceRegistrationDetConfig, + const std::string& faceGalleryPath, + const double reidThreshold, + const bool cropGallery, + const bool allowGrow, + const bool greedyReidMatching) : + allowGrow(allowGrow), + landmarksDetector(landmarksDetectorConfig), + faceReid(reidConfig), + faceGallery(faceGalleryPath, reidThreshold, cropGallery, + faceRegistrationDetConfig, landmarksDetector, faceReid, + greedyReidMatching) + {}; + + std::vector recognize(const cv::Mat& frame, const std::vector& faces) { + cv::Mat origImg = frame.clone(); + + std::vector landmarks; + std::vector embeddings; + std::vector faceRois; + + auto faceRoi = [&](const FaceBox& face) { + return frame(face.face); + }; + + int numFaces = faces.size(); + if (numFaces < MAX_NUM_REQUESTS) { + std::transform(faces.begin(), faces.end(), std::back_inserter(faceRois), faceRoi); + landmarks = landmarksDetector.infer(faceRois); + alignFaces(faceRois, landmarks); + embeddings = faceReid.infer(faceRois); + } else { + auto embedding = [&](cv::Mat& emb) { return emb; }; + for (int n = numFaces; n > 0; n -= MAX_NUM_REQUESTS) { + landmarks.clear(); + faceRois.clear(); + size_t start_idx = size_t(numFaces) - n; + size_t end_idx = start_idx + std::min(numFaces, MAX_NUM_REQUESTS); + std::transform(faces.begin() + start_idx, faces.begin() + end_idx, std::back_inserter(faceRois), faceRoi); + + landmarks = landmarksDetector.infer(faceRois); + alignFaces(faceRois, landmarks); + + std::vector tmpEmbeddings = faceReid.infer(faceRois); + + std::transform(tmpEmbeddings.begin(), tmpEmbeddings.end(), std::back_inserter(embeddings), embedding); + } + } + std::vector> matches = faceGallery.getIDsByEmbeddings(embeddings); + std::vector results; + for (size_t faceIndex = 0; faceIndex < faces.size(); ++faceIndex) { + if (matches[faceIndex].first == EmbeddingsGallery::unknownId && allowGrow) { + std::string personName = faceGallery.tryToSave(origImg(faces[faceIndex].face)); + if (personName != "") + faceGallery.addFace(origImg(faces[faceIndex].face), embeddings[faceIndex], personName); + } + std::vector lms; + for (int i = 0; i < LANDMARKS_NUM; ++i) { + std::cout << landmarks[0] << std::endl; + int x = static_cast(faces[faceIndex].face.x + landmarks[faceIndex].at(2* i, 0) * faces[faceIndex].face.width); + int y = static_cast(faces[faceIndex].face.y + landmarks[faceIndex].at(2* i + 1, 0) * faces[faceIndex].face.height); + lms.emplace_back(x, y); + } + results.emplace_back(faces[faceIndex].face, lms, matches[faceIndex].first, + matches[faceIndex].second, faceGallery.getLabelByID(matches[faceIndex].first)); + } + return results; + } +protected: + bool allowGrow; + AsyncModel landmarksDetector; + AsyncModel faceReid; + EmbeddingsGallery faceGallery; +}; + +class AntiSpoofer { +public: + static const int MAX_NUM_REQUESTS; + AntiSpoofer(const BaseConfig& antiSpoofConfig, const float spoofThreshold=40.0) : + antiSpoof(antiSpoofConfig), spoofThreshold(spoofThreshold) + {} + + void process(const cv::Mat& frame, const std::vector& faces, std::vector& results) { + if (!antiSpoof.enabled()) { + return; + } + cv::Mat origImg = frame.clone(); + + std::vector faceRois; + std::vector spoofs; + + auto faceRoi = [&](const FaceBox& face) { + return frame(face.face); + }; + int numFaces = faces.size(); + if (numFaces < MAX_NUM_REQUESTS) { + std::transform(faces.begin(), faces.end(), std::back_inserter(faceRois), faceRoi); + spoofs = antiSpoof.infer(faceRois); + } else { + auto func = [&](cv::Mat& spoof) { return spoof; }; + for (int n = numFaces; n > 0; n -= MAX_NUM_REQUESTS) { + faceRois.clear(); + size_t startIdx = size_t(numFaces) - n; + size_t endIdx = startIdx + std::min(numFaces, MAX_NUM_REQUESTS); + std::transform(faces.begin() + startIdx, faces.begin() + endIdx, std::back_inserter(faceRois), faceRoi); + std::vector tmpSpoofs = antiSpoof.infer(faceRois); + std::transform(tmpSpoofs.begin(), tmpSpoofs.end(), std::back_inserter(spoofs), func); + } + } + for (size_t faceIndex = 0; faceIndex < faces.size(); ++faceIndex) { + results[faceIndex].real = isReal(spoofs[faceIndex]); + } + } +private: + AsyncModel antiSpoof; + float spoofThreshold; + + bool isReal(cv::Mat& spoof) { + float probability = spoof.at(0) * 100; + return probability > spoofThreshold; + } +}; + +const int FaceRecognizerDefault::LANDMARKS_NUM = 5; +const int FaceRecognizerDefault::MAX_NUM_REQUESTS = 16; +const int AntiSpoofer::MAX_NUM_REQUESTS = 16; diff --git a/demos/face_recognition_demo/cpp/include/async_queue.hpp b/demos/face_recognition_demo/cpp/include/async_queue.hpp new file mode 100644 index 00000000000..4efec49ba65 --- /dev/null +++ b/demos/face_recognition_demo/cpp/include/async_queue.hpp @@ -0,0 +1,33 @@ +// Copyright (C) 2023 KNS Group LLC (YADRO) +// SPDX-License-Identifier: Apache-2.0 +// + +#pragma once + +#include +#include +#include +#include + +#include +#include + +class AsyncInferQueue { +public: + AsyncInferQueue(ov::CompiledModel& compiled_model, size_t size); + ~AsyncInferQueue(); + + void submitData(std::unordered_map inputs, size_t input_id); + void waitAll(); + std::unordered_map> getResults(); + +private: + std::vector requests; + std::queue idsOfFreeRequests; + std::unordered_map> results; + std::vector outputNames; + void setCallback(); + size_t getIdleRequestId(); + std::mutex mutex; + std::condition_variable cv; +}; diff --git a/demos/face_recognition_demo/cpp/include/models.hpp b/demos/face_recognition_demo/cpp/include/models.hpp new file mode 100644 index 00000000000..853fd1cdadb --- /dev/null +++ b/demos/face_recognition_demo/cpp/include/models.hpp @@ -0,0 +1,114 @@ +// Copyright (C) 2023 KNS Group LLC (YADRO) +// SPDX-License-Identifier: Apache-2.0 +// + +#pragma once + +#include "async_queue.hpp" + +#include +#include + +#include +#include +#include + +struct BaseConfig { + BaseConfig(const std::string& path_to_model) : + pathToModel(path_to_model) {} + + std::string pathToModel; + int numRequests = 1; + ov::Core core; + std::string deviceName; +}; + +class BaseModel { +public: + BaseModel(const BaseConfig& config) : mConfig(config) {}; + virtual ~BaseModel() = default; + + bool enabled() const { + return !mConfig.pathToModel.empty(); + } +protected: + BaseConfig mConfig; + std::string inputTensorName; + std::vector outputTensorsNames; + + virtual void prepareInputsOutputs(std::shared_ptr& model) = 0; +}; + +class AsyncModel : public BaseModel { +public: + AsyncModel(const BaseConfig& config) : + BaseModel(config) { + if (!enabled()) { + return; + } + slog::info << "Reading model: " << mConfig.pathToModel << slog::endl; + std::shared_ptr model = mConfig.core.read_model(mConfig.pathToModel); + logBasicModelInfo(model); + prepareInputsOutputs(model); + ov::CompiledModel compiledModel = mConfig.core.compile_model(model, mConfig.deviceName); + inferQueue.reset(new AsyncInferQueue(compiledModel, mConfig.numRequests)); + logCompiledModelInfo(compiledModel, mConfig.pathToModel, mConfig.deviceName); + }; + + std::vector infer(const std::vector& images); + +private: + cv::Size netInputSize; + cv::Size origImageSize; + std::unique_ptr inferQueue; + + void prepareInputsOutputs(std::shared_ptr& model) override; +}; + +void alignFaces(std::vector& face_images, + const std::vector& landmarks_vec); + +struct FaceBox { + cv::Rect face; + float confidence; + explicit FaceBox(const cv::Rect& rect = cv::Rect(), float confidence = -1.0f) : + face(rect), confidence(confidence) {} +}; + +struct DetectorConfig : public BaseConfig { + DetectorConfig(const std::string& path_to_model) : + BaseConfig(path_to_model) {} + + float confidenceThreshold = 0.5f; + float increaseScaleX = 1.15f; + float increaseScaleY = 1.15f; + cv::Size inputSize = cv::Size(600, 600); +}; + +class FaceDetector : public BaseModel { +public: + FaceDetector(const DetectorConfig& config) : + BaseModel(config), mConfig(config) { + slog::info << "Reading model: " << mConfig.pathToModel << slog::endl; + std::shared_ptr model = mConfig.core.read_model(mConfig.pathToModel); + logBasicModelInfo(model); + prepareInputsOutputs(model); + ov::CompiledModel compiledModel = mConfig.core.compile_model(model, mConfig.deviceName); + mRequest = std::make_shared(compiledModel.create_infer_request()); + logCompiledModelInfo(compiledModel, mConfig.pathToModel, mConfig.deviceName); + }; + + void submitData(const cv::Mat& inputImage); + std::vector getResults(); + +private: + DetectorConfig mConfig; + cv::Size netInputSize; + cv::Size origImageSize; + std::shared_ptr mRequest; + size_t maxDetectionCount = 0; + size_t detectedObjectSize = 0; + static constexpr int emptyDetectionIndicator = -1; + + void prepareInputsOutputs(std::shared_ptr& model) override; +}; diff --git a/demos/face_recognition_demo/cpp/include/reid_gallery.hpp b/demos/face_recognition_demo/cpp/include/reid_gallery.hpp new file mode 100644 index 00000000000..8b81bb1f1a1 --- /dev/null +++ b/demos/face_recognition_demo/cpp/include/reid_gallery.hpp @@ -0,0 +1,62 @@ +// Copyright (C) 2023 KNS Group LLC (YADRO) +// SPDX-License-Identifier: Apache-2.0 +// + +#pragma once + +#include "models.hpp" + +#include +#include + +#include + +enum class RegistrationStatus { + SUCCESS, + FAILURE_LOW_QUALITY, + FAILURE_NOT_DETECTED, +}; + +struct GalleryObject { + std::vector embeddings; + std::string label; + int id; + + GalleryObject(const std::vector& embeddings, + const std::string& label, int id) : + embeddings(embeddings), label(label), id(id) {} +}; + +class EmbeddingsGallery { +public: + static const char unknownLabel[]; + static const int unknownId; + static const float unknownDistance; + EmbeddingsGallery(const std::string& fgPath, + double threshold, + bool crop, + const DetectorConfig& detectorConfig, + AsyncModel& landmarksDet, + AsyncModel& imageReid, + bool useGreedyMatcher = false); + size_t size() const; + std::vector> getIDsByEmbeddings(const std::vector& embeddings) const; + std::string getLabelByID(int id) const; + bool labelExists(const std::string& label) const; + std::string tryToSave(cv::Mat newFace); + void addFace(cv::Mat newFace, cv::Mat embedding, std::string label); + +private: + RegistrationStatus registerIdentity(const std::string& identityLabel, + const cv::Mat& image, + const bool crop, + FaceDetector& detector, + AsyncModel& landmarksDet, + AsyncModel& imageReid, + cv::Mat& embedding); + std::vector idxToId; + double reidThreshold; + std::vector identities; + bool useGreedyMatcher; + std::string faceGalleryPath; +}; diff --git a/demos/face_recognition_demo/cpp/main.cpp b/demos/face_recognition_demo/cpp/main.cpp new file mode 100644 index 00000000000..cca4b3581ad --- /dev/null +++ b/demos/face_recognition_demo/cpp/main.cpp @@ -0,0 +1,359 @@ +// Copyright (C) 2023 KNS Group LLC (YADRO) +// SPDX-License-Identifier: Apache-2.0 +// + +#include "api.hpp" +#include "models.hpp" +#include "reid_gallery.hpp" + +#include +#include +#include + +#include "openvino/openvino.hpp" + +#include "gflags/gflags.h" +#include "monitors/presenter.h" +#include "utils/args_helper.hpp" +#include "utils/images_capture.h" +#include "utils/ocv_common.hpp" +#include "utils/slog.hpp" +#include "utils/ocv_common.hpp" + +namespace { +constexpr char h_msg[] = "show the help message and exit"; +DEFINE_bool(h, false, h_msg); + +constexpr char i_msg[] = "an input to process. The input must be a single image, a folder of images, video file or camera id. Default is 0"; +DEFINE_string(i, "0", i_msg); + +constexpr char mfd_msg[] = "path to the Face Detection model (.xml) file."; +DEFINE_string(mfd, "", mfd_msg); + +constexpr char mlm_msg[] = "path to the Facial Landmarks Regression Retail model (.xml) file"; +DEFINE_string(mlm, "", mlm_msg); + +constexpr char mreid_msg[] = "path to the Face Recognition model (.xml) file."; +DEFINE_string(mreid, "", mreid_msg); + +constexpr char mas_msg[] = "path to the Antispoofing Classification model (.xml) file."; +DEFINE_string(mas, "", mas_msg); + +constexpr char tfd_msg[] = "probability threshold for face detections. Default is 0.5"; +DEFINE_double(t_fd, 0.5, tfd_msg); + +constexpr char input_shape_msg[] = + "specify the input shape for detection network in (width x height) format. " + "Input of model will be reshaped according specified shape." + "Example: 1280x720. Shape of network input used by default."; +DEFINE_string(input_shape, "", input_shape_msg); + +constexpr char exp_msg[] = "expand ratio for bbox before face recognition. Default is 1.0"; +DEFINE_double(exp, 1.0, exp_msg); + +constexpr char treid_msg[] = "cosine distance threshold between two vectors for face reidentification. Default is 0.7"; +DEFINE_double(t_reid, 0.7, treid_msg); + +constexpr char match_algo_msg[] = "(don't) use faster greedy matching algorithm in face reid."; +DEFINE_bool(greedy_reid_matching, false, match_algo_msg); + +constexpr char fg_msg[] = "path to a faces gallery directory."; +DEFINE_string(fg, "", fg_msg); + +constexpr char ag_msg[] = "(dont't) allow to grow faces gallery and to dump on disk."; +DEFINE_bool(allow_grow, false, ag_msg); + +constexpr char cg_msg[] = "(dont't) crop images during faces gallery creation."; +DEFINE_bool(crop_gallery, false, cg_msg); + +constexpr char o_msg[] = "name of the output file(s) to save."; +DEFINE_string(o, "", o_msg); + +constexpr char loop_msg[] = "enable reading the input in a loop"; +DEFINE_bool(loop, false, loop_msg); + +constexpr char dfd_msg[] = + "specify a device Face Detection model to infer on (the list of available devices is shown below). " + "Use '-d HETERO:' format to specify HETERO plugin. " + "Use '-d MULTI:' format to specify MULTI plugin. " + "Default is CPU"; +DEFINE_string(dfd, "CPU", dfd_msg); + +constexpr char dlm_msg[] = + "specify a device for Landmarks Regression model to infer on (the list of available devices is shown below). " + "Use '-d HETERO:' format to specify HETERO plugin. " + "Use '-d MULTI:' format to specify MULTI plugin. " + "Default is CPU"; +DEFINE_string(dlm, "CPU", dlm_msg); + +constexpr char dreid_msg[] = + "specify a target device for Face Reidentification model to infer on (the list of available devices is shown below). " + "Use '-d HETERO:' format to specify HETERO plugin. " + "Use '-d MULTI:' format to specify MULTI plugin. " + "Default is CPU"; +DEFINE_string(dreid, "CPU", dreid_msg); + +constexpr char das_msg[] = + "specify a device for Anti-spoofing model to infer on (the list of available devices is shown below). " + "Use '-d HETERO:' format to specify HETERO plugin. " + "Use '-d MULTI:' format to specify MULTI plugin. " + "Default is CPU"; +DEFINE_string(das, "CPU", das_msg); + +constexpr char lim_msg[] = "number of frames to store in output. If 0 is set, all frames are stored. Default is 1000"; +DEFINE_uint32(lim, 1000, lim_msg); + +constexpr char show_msg[] = "(don't) show output"; +DEFINE_bool(show, true, show_msg); + +constexpr char u_msg[] = "resource utilization graphs. Default is cdm. " + "c - average CPU load, d - load distribution over cores, m - memory usage, h - hide"; +DEFINE_string(u, "cdm", u_msg); + +void parse(int argc, char *argv[]) { + gflags::ParseCommandLineFlags(&argc, &argv, false); + if (FLAGS_h || 1 == argc) { + std::cout << "\t[ -h] " << h_msg + << "\n\t[--help] print help on all arguments" + << "\n\t[ -i ] " << i_msg + << "\n\t --mfd " << mfd_msg + << "\n\t[--mlm ] " << mlm_msg + << "\n\t[--mreid ] " << mreid_msg + << "\n\t[--mas ] " << mas_msg + << "\n\t[--t_fd ] " << tfd_msg + << "\n\t[--input_shape ] " << input_shape_msg + << "\n\t[--t_reid ] " << treid_msg + << "\n\t[--exp ] " << exp_msg + << "\n\t[--greedy_reid_matching] ([--nogreedy_reid_matching])" << match_algo_msg + << "\n\t[--fg ] " << fg_msg + << "\n\t[--allow_grow] ([--noallow_grow]) " << ag_msg + << "\n\t[--crop_gallery] ([--nocrop_gallery]) " << cg_msg + << "\n\t[--dfd ] " << dfd_msg + << "\n\t[--dlm ] " << dlm_msg + << "\n\t[--dreid ] " << dreid_msg + << "\n\t[--das ] " << das_msg + << "\n\t[--lim ] " << lim_msg + << "\n\t[ -o ] " << o_msg + << "\n\t[--loop] " << loop_msg + << "\n\t[--show] ([--noshow]) " << show_msg + << "\n\t[ -u ] " << u_msg + << "\n\tKey bindings:" + "\n\t\tQ, q, Esc - Quit" + "\n\t\tP, p, 0, spacebar - Pause" + "\n\t\tC - average CPU load, D - load distribution over cores, M - memory usage, H - hide\n"; + showAvailableDevices(); + std::cout << ov::get_openvino_version() << std::endl; + exit(0); + } if (FLAGS_i.empty()) { + throw std::invalid_argument{"-i can't be empty"}; + } if (FLAGS_mfd.empty()) { + throw std::invalid_argument{"-m_fd can't be empty"}; + } if (!FLAGS_fg.empty() && (FLAGS_mlm.empty() || FLAGS_mreid.empty())) { + throw std::logic_error("Face Gallery path should be provided only with landmarks and reidentification models"); + } if (!FLAGS_input_shape.empty() && FLAGS_input_shape.find("x") == std::string::npos) { + throw std::logic_error("Correct format of --input_shape parameter is \"width\"x\"height\"."); + } +} + +cv::Size getInputSize(const std::string& input_shape) { + size_t found = FLAGS_input_shape.find("x"); + cv::Size inputSize; + if (found == std::string::npos) { + inputSize = cv::Size(0, 0); + } else { + inputSize = cv::Size{ + std::stoi(FLAGS_input_shape.substr(0, found)), + std::stoi(FLAGS_input_shape.substr(found + 1, FLAGS_input_shape.length()))}; + } + + return inputSize; +} + +std::string getLabelForFace(const Result& result) { + std::string faceLabel = result.label; + if (!result.real) { + faceLabel = "Spoof"; + } + return faceLabel; +} + +cv::Mat drawDetections(const std::vector& results, cv::Mat frame) { + cv::Scalar acceptColor(0, 220, 0); + cv::Scalar disableColor(0, 0, 255); + for (const auto& result : results) { + cv::Rect rect = result.face; + std::string faceLabel = getLabelForFace(result); + + cv::Scalar color; + + if (result.label != EmbeddingsGallery::unknownLabel && result.real) { + color = acceptColor; + } else { + color = disableColor; + } + int baseLine = 0; + const cv::Size label_size = cv::getTextSize(faceLabel, cv::FONT_HERSHEY_SIMPLEX, 0.6, 1, &baseLine); + cv::rectangle( + frame, + cv::Point(rect.x, rect.y - label_size.height - baseLine), + cv::Point(rect.x + label_size.width, rect.y), + color, cv::FILLED); + cv::putText(frame, faceLabel, cv::Point(rect.x, rect.y - baseLine), cv::FONT_HERSHEY_SIMPLEX, 0.6, + cv::Scalar(0, 0, 0), 1, cv::LINE_AA); + cv::rectangle(frame, rect, color, 1); + auto drawPhotoFrameCorner = [&](cv::Point p, int dx, int dy) { + cv::line(frame, p, cv::Point(p.x, p.y + dy), color, 2); + cv::line(frame, p, cv::Point(p.x + dx, p.y), color, 2); + }; + + int dx = static_cast(0.1 * rect.width); + int dy = static_cast(0.1 * rect.height); + + drawPhotoFrameCorner(rect.tl(), dx, dy); + drawPhotoFrameCorner(cv::Point(rect.x + rect.width - 1, rect.y), -dx, dy); + drawPhotoFrameCorner(cv::Point(rect.x, rect.y + rect.height - 1), dx, -dy); + drawPhotoFrameCorner(cv::Point(rect.x + rect.width - 1, rect.y + rect.height - 1), -dx, -dy); + + for (const auto lm : result.landmarks) { + cv::circle(frame, lm, 2, {110, 193, 225}, -1); + } + } + return frame; +} + +} // namespace + + +int main(int argc, char* argv[]) { + try { + PerformanceMetrics metrics; + + parse(argc, argv); + + const auto fdModelPath = FLAGS_mfd; + const auto frModelPath = FLAGS_mreid; + const auto lmModelPath = FLAGS_mlm; + const auto asModelPath = FLAGS_mas; + + slog::info << ov::get_openvino_version() << slog::endl; + ov::Core core; + + // Load face detector and create recognizer + std::unique_ptr faceDetector; + DetectorConfig detectorConfig(fdModelPath); + detectorConfig.deviceName = FLAGS_dfd; + detectorConfig.core = core; + detectorConfig.confidenceThreshold = static_cast(FLAGS_t_fd); + detectorConfig.inputSize = getInputSize(FLAGS_input_shape); + detectorConfig.increaseScaleX = static_cast(FLAGS_exp); + detectorConfig.increaseScaleY = static_cast(FLAGS_exp); + faceDetector.reset(new FaceDetector(detectorConfig)); + + // Load lanmarks regression and reid models + std::unique_ptr faceRecognizer; + if (!lmModelPath.empty() && !frModelPath.empty()) { + BaseConfig landmarksConfig(lmModelPath); + landmarksConfig.deviceName = FLAGS_dlm; + landmarksConfig.numRequests = FaceRecognizerDefault::MAX_NUM_REQUESTS; + landmarksConfig.core = core; + + BaseConfig reidConfig(frModelPath); + reidConfig.deviceName = FLAGS_dreid; + reidConfig.numRequests = FaceRecognizerDefault::MAX_NUM_REQUESTS; + reidConfig.core = core; + + bool allowGrow = FLAGS_allow_grow && FLAGS_show; + + faceRecognizer.reset(new FaceRecognizerDefault( + landmarksConfig, reidConfig, + detectorConfig, FLAGS_fg, FLAGS_t_reid, + FLAGS_crop_gallery, allowGrow, FLAGS_greedy_reid_matching)); + } else { + slog::warn << "Lanmarks Regression and Face Reidentification models are disabled!" << slog::endl; + } + + // Load anti spoof model + std::unique_ptr antiSpoofer; + if (!asModelPath.empty()) { + BaseConfig antiSpoofConfig(asModelPath); + antiSpoofConfig.deviceName = FLAGS_das; + antiSpoofConfig.numRequests = FaceRecognizerDefault::MAX_NUM_REQUESTS; + antiSpoofConfig.core = core; + antiSpoofer.reset(new AntiSpoofer(antiSpoofConfig)); + } else { + slog::warn << "AntiSpoof model is disabled!" << slog::endl; + } + + size_t framesNum = 0; + + std::unique_ptr cap = openImagesCapture(FLAGS_i, FLAGS_loop); + LazyVideoWriter videoWriter{FLAGS_o, cap->fps(), FLAGS_lim}; + cv::Mat frame = cap->read(); + cv::Size graphSize{static_cast(frame.cols / 4), 60}; + Presenter presenter(FLAGS_u, frame.rows - graphSize.height - 10, graphSize); + faceDetector->submitData(frame); + bool keepRunning = true; + while (keepRunning) { + auto startTime = std::chrono::steady_clock::now(); + cv::Mat prevFrame = std::move(frame); + frame = cap->read(); + + keepRunning = !frame.empty(); + + presenter.drawGraphs(prevFrame); + + std::vector faces = faceDetector->getResults(); + + if (keepRunning) { + faceDetector->submitData(frame); + } + + // Recognize + std::vector results; + if (faceRecognizer) { + results = faceRecognizer->recognize(prevFrame.clone(), faces); + } else { + for (const auto& f : faces) { + results.emplace_back(f.face, std::vector{}, EmbeddingsGallery::unknownId, + EmbeddingsGallery::unknownDistance, EmbeddingsGallery::unknownLabel); + } + } + + // AntiSpoof + if (antiSpoofer) { + antiSpoofer->process(prevFrame.clone(), faces, results); + } + + metrics.update(startTime, frame, { 10, 22 }, cv::FONT_HERSHEY_COMPLEX, 0.65); + prevFrame = drawDetections(results, prevFrame); + + if (FLAGS_show) { + cv::imshow(argv[0], prevFrame); + char key = cv::waitKey(1); + if ('P' == key || 'p' == key || '0' == key || ' ' == key) { + key = cv::waitKey(0); + } + if (27 == key || 'q' == key || 'Q' == key) { // Esc + keepRunning = false; + } + presenter.handleKey(key); + } + videoWriter.write(prevFrame); + framesNum++; + } + + slog::info << "Metrics report:" << slog::endl; + metrics.logTotal(); + slog::info << presenter.reportMeans() << slog::endl; + } + catch (const std::exception& error) { + slog::err << error.what() << slog::endl; + return 1; + } + catch (...) { + slog::err << "Unknown/internal exception happened." << slog::endl; + return 1; + } + + return 0; +} diff --git a/demos/face_recognition_demo/cpp/models.lst b/demos/face_recognition_demo/cpp/models.lst new file mode 100644 index 00000000000..0848555cd4b --- /dev/null +++ b/demos/face_recognition_demo/cpp/models.lst @@ -0,0 +1,9 @@ +# This file can be used with the --list option of the model downloader. +face-detection-adas-???? +face-detection-retail-???? +Sphereface +face-recognition-resnet100-arcface-onnx +face-reidentification-retail-???? +facenet-20180408-102900 +landmarks-regression-retail-???? +anti-spoof-mn3 diff --git a/demos/face_recognition_demo/cpp/src/align_transform.cpp b/demos/face_recognition_demo/cpp/src/align_transform.cpp new file mode 100644 index 00000000000..f7bb15caac6 --- /dev/null +++ b/demos/face_recognition_demo/cpp/src/align_transform.cpp @@ -0,0 +1,73 @@ +// Copyright (C) 2023 KNS Group LLC (YADRO) +// SPDX-License-Identifier: Apache-2.0 +// + +#include "models.hpp" +#include +#include + +namespace { + static const float h = 112.; + static const float w = 96.; + // reference landmarks points in the unit square [0,1]x[0,1] + static const float REF_LANDMARKS_NORMED[] = { + 30.2946f / w, 51.6963f / h, 65.5318f / w, 51.5014f / h, 48.0252f / w, + 71.7366f / h, 33.5493f / w, 92.3655f / h, 62.7299f / w, 92.2041f / h + }; +} + +cv::Mat getTransform(cv::Mat* src, cv::Mat* dst) { + cv::Mat colMeanSrc; + reduce(*src, colMeanSrc, 0, cv::REDUCE_AVG); + for (int i = 0; i < src->rows; i++) { + src->row(i) -= colMeanSrc; + } + + cv::Mat colMeanDst; + reduce(*dst, colMeanDst, 0, cv::REDUCE_AVG); + for (int i = 0; i < dst->rows; i++) { + dst->row(i) -= colMeanDst; + } + + cv::Scalar mean, devSrc, devDst; + cv::meanStdDev(*src, mean, devSrc); + devSrc(0) = + std::max(static_cast(std::numeric_limits::epsilon()), devSrc(0)); + *src /= devSrc(0); + cv::meanStdDev(*dst, mean, devDst); + devDst(0) = + std::max(static_cast(std::numeric_limits::epsilon()), devDst(0)); + *dst /= devDst(0); + + cv::Mat w, u, vt; + cv::SVD::compute((*src).t() * (*dst), w, u, vt); + cv::Mat r = (u * vt).t(); + cv::Mat m(2, 3, CV_32F); + m.colRange(0, 2) = r * (devDst(0) / devSrc(0)); + m.col(2) = (colMeanDst.t() - m.colRange(0, 2) * colMeanSrc.t()); + return m; +} + +void alignFaces(std::vector& faceImages, const std::vector& landmarksVec) { + if (landmarksVec.size() == 0) { + return; + } + CV_Assert(faceImages.size() == landmarksVec.size()); + cv::Mat refLandmarks = cv::Mat(5, 2, CV_32F); + + for (size_t j = 0; j < faceImages.size(); j++) { + auto lms = landmarksVec.at(j).clone(); + for (int i = 0; i < refLandmarks.rows; i++) { + refLandmarks.at(i, 0) = + REF_LANDMARKS_NORMED[2 * i] * faceImages.at(j).cols; + refLandmarks.at(i, 1) = + REF_LANDMARKS_NORMED[2 * i + 1] * faceImages.at(j).rows; + lms = lms.reshape(1, 5); + lms.at(i, 0) *= faceImages.at(j).cols; + lms.at(i, 1) *= faceImages.at(j).rows; + } + cv::Mat m = getTransform(&refLandmarks, &lms); + cv::warpAffine(faceImages.at(j), faceImages.at(j), m, + faceImages.at(j).size(), cv::WARP_INVERSE_MAP); + } +} diff --git a/demos/face_recognition_demo/cpp/src/async_queue.cpp b/demos/face_recognition_demo/cpp/src/async_queue.cpp new file mode 100644 index 00000000000..7f4f6771c34 --- /dev/null +++ b/demos/face_recognition_demo/cpp/src/async_queue.cpp @@ -0,0 +1,113 @@ +// Copyright (C) 2023 KNS Group LLC (YADRO) +// SPDX-License-Identifier: Apache-2.0 +// + +#include "async_queue.hpp" +#include +#include "utils/ocv_common.hpp" +#include + +AsyncInferQueue::AsyncInferQueue(ov::CompiledModel& compiledModel, size_t size) { + requests.resize(size); + for (size_t requestId = 0; requestId < size; ++requestId) { + requests[requestId] = compiledModel.create_infer_request(); + idsOfFreeRequests.push(requestId); + } + + for (const auto& output: compiledModel.outputs()) { + outputNames.push_back(output.get_any_name()); + } + + this->setCallback(); +} + +void AsyncInferQueue::setCallback() { + for (size_t requestId = 0; requestId < requests.size(); ++requestId) { + requests[requestId].set_callback([this, requestId /* ... */](std::exception_ptr exceptionPtr) { + { + // acquire the mutex to access m_idle_handles + std::lock_guard lock(mutex); + + for (const auto& outName : outputNames) { + auto tensor = requests[requestId].get_tensor(outName); + results[requestId][outName] = tensor; + } + // Add idle handle to queue + idsOfFreeRequests.push(requestId); + } + // Notify locks in getIdleRequestId() + cv.notify_one(); + try { + if (exceptionPtr) { + std::rethrow_exception(exceptionPtr); + } + } catch (const std::exception& e) { + throw ov::Exception(e.what()); + } + }); + } +} + +AsyncInferQueue::~AsyncInferQueue() { + waitAll(); +} + +size_t AsyncInferQueue::getIdleRequestId() { + std::unique_lock lock(mutex); + + cv.wait(lock, [this] { + return !(idsOfFreeRequests.empty()); + }); + size_t idleHandle = idsOfFreeRequests.front(); + // wait for request to make sure it returned from callback + requests[idleHandle].wait(); + + return idleHandle; +} + +void AsyncInferQueue::waitAll() { + for (auto&& request : requests) { + request.wait(); + } +} + +void AsyncInferQueue::submitData(std::unordered_map inputs, size_t inputId) { + size_t id = getIdleRequestId(); + + { + std::lock_guard lock(mutex); + idsOfFreeRequests.pop(); + } + requests[id].set_callback([this, id, inputId /* ... */](std::exception_ptr exceptionPtr) { + { + // acquire the mutex to access m_idle_handles + std::lock_guard lock(mutex); + for (const auto& outName : outputNames) { + auto tensor = requests[id].get_tensor(outName); + results[inputId][outName] = tensor; + } + // Add idle handle to queue + idsOfFreeRequests.push(id); + } + // Notify locks in getIdleRequestId() + cv.notify_one(); + try { + if (exceptionPtr) { + std::rethrow_exception(exceptionPtr); + } + } catch (const std::exception& e) { + throw ov::Exception(e.what()); + } + }); + for (const auto& input: inputs) { + ov::Tensor inputTensor = requests[id].get_tensor(input.first); + resize2tensor(input.second, inputTensor); + requests[id].set_tensor(input.first, inputTensor); + } + requests[id].start_async(); +} + +std::unordered_map> AsyncInferQueue::getResults() { + waitAll(); + return results; +} diff --git a/demos/face_recognition_demo/cpp/src/models.cpp b/demos/face_recognition_demo/cpp/src/models.cpp new file mode 100644 index 00000000000..565b01366b2 --- /dev/null +++ b/demos/face_recognition_demo/cpp/src/models.cpp @@ -0,0 +1,215 @@ +// Copyright (C) 2023 KNS Group LLC (YADRO) +// SPDX-License-Identifier: Apache-2.0 +// + +#include "models.hpp" +#include "reid_gallery.hpp" + +#include +#include +#include + +#include "utils/ocv_common.hpp" +#include "utils/image_utils.h" + +void AsyncModel::prepareInputsOutputs(std::shared_ptr& model) { + if (model->inputs().size() != 1) { + throw std::logic_error("Face landmarks/reidentification network should have only 1 input"); + } + + if (model->outputs().size() != 1) { + throw std::logic_error("Face landmarks/reidentification network should have only 1 output"); + } + inputTensorName = model->input().get_any_name(); + ov::OutputVector outputs = model->outputs(); + for (auto& item : outputs) { + const std::string name = item.get_any_name(); + outputTensorsNames.push_back(name); + } + + ov::Shape inputDims = model->input().get_shape(); + + ov::Layout modelLayout = ov::layout::get_layout(model->input()); + if (modelLayout.empty()) { + modelLayout = {"NCHW"}; + } + netInputSize = cv::Size(inputDims[ov::layout::width_idx(modelLayout)], + inputDims[ov::layout::height_idx(modelLayout)]); + + ov::Shape outputDims = model->output().get_shape(); + + ov::preprocess::PrePostProcessor ppp(model); + ov::Layout desiredLayout = {"NHWC"}; + + ppp.input().tensor() + .set_element_type(ov::element::u8) + .set_layout(desiredLayout); + ppp.input().preprocess() + .convert_layout(modelLayout) + .convert_element_type(ov::element::f32); + ppp.input().model().set_layout(modelLayout); + + model = ppp.build(); +} + +std::vector AsyncModel::infer(const std::vector& rois) { + if (!enabled()) { + return std::vector(); + } + std::unordered_map input; + for (size_t id = 0; id < rois.size(); ++id) { + cv::Mat resizedImg = resizeImageExt(rois[id], netInputSize.width, netInputSize.height); + input[inputTensorName] = std::move(resizedImg); + inferQueue->submitData(input, id); + } + std::unordered_map> results = std::move(inferQueue->getResults()); + + // create cv::Mats from results + std::vector mats; + for (size_t id = 0; id < rois.size(); ++id) { + ov::Tensor tensor = results[id][outputTensorsNames[0]]; + if (tensor) { + ov::Shape shape = tensor.get_shape(); + std::vector tensorSizes(shape.size(), 0); + for (size_t i = 0; i < tensorSizes.size(); ++i) { + tensorSizes[i] = shape[i]; + } + cv::Mat outTensor(tensorSizes, CV_32F, tensor.data()); + outTensor = outTensor.reshape(1, outTensor.size[1]); + mats.push_back(outTensor.clone()); + } + } + return mats; +} + +namespace { + cv::Rect truncateToValidRect(const cv::Rect& rect, const cv::Size& size) { + auto tl = rect.tl(), br = rect.br(); + tl.x = std::max(0, std::min(size.width - 1, tl.x)); + tl.y = std::max(0, std::min(size.height - 1, tl.y)); + br.x = std::max(0, std::min(size.width, br.x)); + br.y = std::max(0, std::min(size.height, br.y)); + int w = std::max(0, br.x - tl.x); + int h = std::max(0, br.y - tl.y); + return cv::Rect(tl.x, tl.y, w, h); + } + + cv::Rect increaseRect(const cv::Rect& r, float coeff_x, float coeff_y) { + cv::Point2f tl = r.tl(); + cv::Point2f br = r.br(); + cv::Point2f c = (tl * 0.5f) + (br * 0.5f); + cv::Point2f diff = c - tl; + cv::Point2f newDiff{diff.x * coeff_x, diff.y * coeff_y}; + cv::Point2f newTl = c - newDiff; + cv::Point2f newBr = c + newDiff; + + cv::Point newTlInt {static_cast(std::floor(newTl.x)), static_cast(std::floor(newTl.y))}; + cv::Point newBrInt {static_cast(std::ceil(newBr.x)), static_cast(std::ceil(newBr.y))}; + + return cv::Rect(newTlInt, newBrInt); + } +} // namespace + +void FaceDetector::prepareInputsOutputs(std::shared_ptr& model) { + if (model->inputs().size() != 1) { + throw std::logic_error("Face Detection network should have only one input"); + } + + if (model->outputs().size() != 1) { + throw std::logic_error("Face Detection network should have only one output"); + } + inputTensorName = model->input().get_any_name(); + outputTensorsNames.push_back(model->output().get_any_name()); + + if (mConfig.inputSize.area()) { + model->reshape(ov::Shape({1, 3, static_cast(mConfig.inputSize.height), static_cast(mConfig.inputSize.width)})); + } + + ov::Shape inputDims = model->input().get_shape(); + + ov::Layout modelLayout = ov::layout::get_layout(model->input()); + if (modelLayout.empty()) { + modelLayout = {"NCHW"}; + } + netInputSize = cv::Size(inputDims[ov::layout::width_idx(modelLayout)], + inputDims[ov::layout::height_idx(modelLayout)]); + + ov::Shape outputDims = model->output().get_shape(); + maxDetectionCount = outputDims[2]; + detectedObjectSize = outputDims[3]; + if (detectedObjectSize != 7) { + throw std::runtime_error("Face Detection network output layer should have 7 as a last dimension"); + } + if (outputDims.size() != 4) { + throw std::runtime_error("Face Detection network output should have 4 dimensions, but had " + + std::to_string(outputDims.size())); + } + + ov::preprocess::PrePostProcessor ppp(model); + ov::Layout desiredLayout = {"NHWC"}; + + ppp.input().tensor() + .set_element_type(ov::element::u8) + .set_layout(desiredLayout); + ppp.input().preprocess() + .convert_layout(modelLayout) + .convert_element_type(ov::element::f32); + ppp.input().model().set_layout(modelLayout); + + model = ppp.build(); +} + +// Function to start inference +void FaceDetector::submitData(const cv::Mat& inputImage) { + origImageSize = inputImage.size(); + ov::Tensor inputTensor = mRequest->get_input_tensor(); + + resize2tensor(inputImage, inputTensor); + mRequest->start_async(); +} + +std::vector FaceDetector::getResults() { + mRequest->wait(); + const float* data = mRequest->get_output_tensor().data(); + + std::vector detectedFaces; + + for (size_t det_id = 0; det_id < maxDetectionCount; ++det_id) { + const int start_pos = det_id * detectedObjectSize; + + const float batchID = data[start_pos]; + if (batchID == emptyDetectionIndicator) { + break; + } + + const float score = std::min(std::max(0.0f, data[start_pos + 2]), 1.0f); + if (score < mConfig.confidenceThreshold) { + continue; + } + + const float x0 = std::min(std::max(0.0f, data[start_pos + 3]), 1.0f) * origImageSize.width; + const float y0 = std::min(std::max(0.0f, data[start_pos + 4]), 1.0f) * origImageSize.height; + const float x1 = std::min(std::max(0.0f, data[start_pos + 5]), 1.0f) * origImageSize.width; + const float y1 = std::min(std::max(0.0f, data[start_pos + 6]), 1.0f) * origImageSize.height; + + FaceBox detectedObject; + detectedObject.confidence = score; + detectedObject.face = cv::Rect(cv::Point(static_cast(round(static_cast(x0))), + static_cast(round(static_cast(y0)))), + cv::Point(static_cast(round(static_cast(x1))), + static_cast(round(static_cast(y1))))); + + + detectedObject.face = truncateToValidRect(increaseRect(detectedObject.face, + mConfig.increaseScaleX, + mConfig.increaseScaleY), + cv::Size(static_cast(origImageSize.width), + static_cast(origImageSize.height))); + + if (detectedObject.face.area() > 0) { + detectedFaces.emplace_back(detectedObject); + } + } + + return detectedFaces; +} diff --git a/demos/face_recognition_demo/cpp/src/reid_gallery.cpp b/demos/face_recognition_demo/cpp/src/reid_gallery.cpp new file mode 100644 index 00000000000..8bd5cbbc4d0 --- /dev/null +++ b/demos/face_recognition_demo/cpp/src/reid_gallery.cpp @@ -0,0 +1,219 @@ +// Copyright (C) 2023 KNS Group LLC (YADRO) +// SPDX-License-Identifier: Apache-2.0 +// + +#include "reid_gallery.hpp" + +#include +#include +#include + +#ifdef _WIN32 +# include "w_dirent.hpp" +#else +# include // for closedir, dirent, opendir, readdir, DIR +#endif + +#include "utils/kuhn_munkres.hpp" + +namespace { + float computeReidDistance(const cv::Mat& descr1, const cv::Mat& descr2) { + float xy = static_cast(descr1.dot(descr2)); + float xx = static_cast(descr1.dot(descr1)); + float yy = static_cast(descr2.dot(descr2)); + float norm = sqrt(xx * yy) + 1e-6f; + return 1.0f - xy / norm; + } + + std::vector file_extensions = {".jpg", ".png"}; + +} // namespace + +const char EmbeddingsGallery::unknownLabel[] = "Unknown"; +const int EmbeddingsGallery::unknownId = -1; +const float EmbeddingsGallery::unknownDistance = 1.0; + +EmbeddingsGallery::EmbeddingsGallery(const std::string& fgPath, + double threshold, + bool crop, + const DetectorConfig& detectorConfig, + AsyncModel& landmarksDet, + AsyncModel& imageReid, + bool useGreedyMatcher) : + reidThreshold(threshold), useGreedyMatcher(useGreedyMatcher), faceGalleryPath(fgPath) { + if (faceGalleryPath.empty()) { + return; + } + + FaceDetector detector(detectorConfig); + + int id = 0; + DIR* dir = opendir(fgPath.c_str()); + if (!dir) { + throw std::runtime_error("Can't find the directory " + fgPath); + } + while (struct dirent* ent = readdir(dir)) { + if (strcmp(ent->d_name, ".") && strcmp(ent->d_name, "..")) { + std::string name = ent->d_name; + cv::Mat image = cv::imread(fgPath + '/' + name); + if (image.empty()) { + throw std::runtime_error("Image is empty"); + } + std::vector embeddings; + cv::Mat emb; + RegistrationStatus status = registerIdentity(name, image, crop, detector, landmarksDet, imageReid, emb); + if (status == RegistrationStatus::SUCCESS) { + embeddings.push_back(emb); + idxToId.push_back(id); + identities.emplace_back(embeddings, name, id); + ++id; + } + } + } + closedir(dir); + slog::info << identities.size() << " persons to recognize were added from the gallery" << slog::endl; +} + +std::vector> EmbeddingsGallery::getIDsByEmbeddings(const std::vector& embeddings) const { + if (embeddings.empty() || idxToId.empty()) { + return std::vector>(embeddings.size(), {unknownId, unknownDistance}); + } + + cv::Mat distances(static_cast(embeddings.size()), static_cast(idxToId.size()), CV_32F); + + for (int i = 0; i < distances.rows; i++) { + int k = 0; + for (size_t j = 0; j < identities.size(); j++) { + for (const auto& reference_emb : identities[j].embeddings) { + distances.at(i, k) = computeReidDistance(embeddings[i], reference_emb); + k++; + } + } + } + + KuhnMunkres matcher(useGreedyMatcher); + auto matchedIdx = matcher.Solve(distances); + std::vector> matches; + for (auto col_idx : matchedIdx) { + if (int(col_idx) == -1) { + matches.push_back({unknownId, unknownDistance}); + continue; + } + if (distances.at(matches.size(), col_idx) > reidThreshold) { + matches.push_back({unknownId, unknownDistance}); + } else { + matches.push_back({idxToId[col_idx], distances.at(matches.size(), col_idx)}); + } + } + return matches; +} + +std::string EmbeddingsGallery::getLabelByID(int id) const { + if (id >= 0 && id < static_cast(identities.size())) { + return identities[id].label; + } else { + return unknownLabel; + } +} + +size_t EmbeddingsGallery::size() const { + return identities.size(); +} + +bool EmbeddingsGallery::labelExists(const std::string& label) const { + return identities.end() != std::find_if(identities.begin(), identities.end(), + [label](const GalleryObject& o) {return o.label == label;}); +} + +std::string EmbeddingsGallery::tryToSave(cv::Mat newFace){ + std::string winname = "Unknown face"; + size_t height = int(400 * newFace.rows / newFace.cols); + cv::Mat resized; + cv::resize(newFace, resized, cv::Size(400, height), 0.0, 0.0, cv::INTER_AREA); + size_t font = cv::FONT_HERSHEY_PLAIN; + size_t fontScale = 1; + cv::Scalar fontColor(255, 255, 255); + size_t lineType = 1; + cv::copyMakeBorder(resized, newFace, 5, 5, 5, 5, cv::BORDER_CONSTANT, cv::Scalar(255, 255, 255)); + cv::putText(newFace, "Please, enter person's name and", cv::Point2d(30, 80), font, fontScale, fontColor, lineType); + cv::putText(newFace, "press \"Enter\" to accept and continue.", cv::Point2d(30, 110), font, fontScale, fontColor, lineType); + cv::putText(newFace, "Press \"Escape\" to discard.", cv::Point2d(30, 140), font, fontScale, fontColor, lineType); + cv::putText(newFace, "Name: ", cv::Point2d(30, 170), font, fontScale, fontColor, lineType); + std::string name; + bool save = false; + while (true) { + cv::Mat cc = newFace.clone(); + cv::putText(cc, name, cv::Point2d(30, 200), font, fontScale, fontColor, lineType); + cv::imshow(winname, cc); + int key = (cv::waitKey(0) & 0xFF); + if (key == 27) { + break; + } + if (key == 13) { + if (name.size() > 0) { + if (labelExists(name)) { + cv::putText(cc, "The name is already exists! Try another one.", cv::Point2d(30, 200), font, fontScale, fontColor, lineType); + continue; + } + save = true; + break; + } else { + cv::putText(cc, "Provided name is empty. Please, provide a valid name.", cv::Point2d(30, 200), font, fontScale, fontColor, lineType); + cv::imshow(winname, cc); + key = cv::waitKey(0); + if (key == 27) { + break; + } + continue; + } + } + if (key == 225) { + continue; + } + if (key == 8) { + name = name.substr(0, name.size() - 1); + continue; + } else { + name += char(key); + continue; + } + } + cv::destroyWindow(winname); + return (save ? name : ""); +} + +void EmbeddingsGallery::addFace(const cv::Mat newFace, const cv::Mat embedding, std::string label) { + identities.emplace_back(std::vector{embedding}, label, idxToId.size()); + idxToId.push_back(idxToId.size()); + label += ".jpg"; + cv::imwrite(faceGalleryPath + "/" + label, newFace); +} + +RegistrationStatus EmbeddingsGallery::registerIdentity(const std::string& identityLabel, + const cv::Mat& image, + bool crop, + FaceDetector& detector, + AsyncModel& landmarksDet, + AsyncModel& imageReid, + cv::Mat& embedding) { + cv::Mat target = image; + if (crop) { + detector.submitData(image); + std::vector faces = detector.getResults(); + if (faces.size() == 0) { + return RegistrationStatus::FAILURE_NOT_DETECTED; + } + if (faces.size() != 1) { + throw std::runtime_error("More than 1 face on the image provided for face gallery"); + } + cv::Mat faceRoi = image(faces[0].face); + target = faceRoi; + } + + cv::Mat landmarks; + std::vector images = { target }; + std::vector landmarksVec = landmarksDet.infer(images); + alignFaces(images, landmarksVec); + embedding = imageReid.infer(images)[0]; + return RegistrationStatus::SUCCESS; +} diff --git a/demos/tests/cases.py b/demos/tests/cases.py index 84f8011855e..79f714728a6 100644 --- a/demos/tests/cases.py +++ b/demos/tests/cases.py @@ -304,6 +304,45 @@ def single_option_cases(key, *args): ), )), + CppDemo(name='face_recognition_demo', + model_keys=['--mfd', '--mlm', '--mreid', '--mas'], + device_keys=['--dfd', '--dlm', '--dreid', '--das'], + test_cases=combine_cases( + TestCase(options={'--noshow': None, + **MONITORS, + '-i': DataPatternArg('face-detection-adas'), + '--mfd': ModelArg('face-detection-adas-0001'), + }), + [ + *combine_cases( + [ + TestCase(options={}), + TestCase(options={ + '--fg': DataDirectoryArg('face-recognition-gallery'), + '--mlm': ModelArg('landmarks-regression-retail-0009'), + '--mreid': ModelArg('Sphereface'), + }), + TestCase(options={ + '--fg': DataDirectoryArg('face-recognition-gallery'), + '--mlm': ModelArg('landmarks-regression-retail-0009'), + '--mreid': ModelArg('face-recognition-resnet100-arcface-onnx'), + }), + TestCase(options={ + '--fg': DataDirectoryArg('face-recognition-gallery'), + '--mlm': ModelArg('landmarks-regression-retail-0009'), + '--mreid': ModelArg('face-recognition-resnet100-arcface-onnx'), + }), + TestCase(options={ + '--fg': DataDirectoryArg('face-recognition-gallery'), + '--mlm': ModelArg('landmarks-regression-retail-0009'), + '--mreid': ModelArg('facenet-20180408-102900'), + '--mas': ModelArg('anti-spoof-mn3'), + }), + ], + ), + ], + )), + CppDemo(name='gaze_estimation_demo', model_keys=['-m', '-m_fd', '-m_hp', '-m_lm', '-m_es'], device_keys=['-d', '-d_fd', '-d_hp', '-d_lm'], diff --git a/models/intel/face-detection-adas-0001/README.md b/models/intel/face-detection-adas-0001/README.md index e695e8cb2d2..dd883f40567 100644 --- a/models/intel/face-detection-adas-0001/README.md +++ b/models/intel/face-detection-adas-0001/README.md @@ -55,6 +55,7 @@ bounding boxes. The results are sorted by confidence in decreasing order. Each d The model can be used in the following demos provided by the Open Model Zoo to show its capabilities: +* [Face Recognition C++ Demo](../../../demos/face_recognition_demo/cpp/README.md) * [Face Recognition Python\* Demo](../../../demos/face_recognition_demo/python/README.md) * [Gaze Estimation Demo](../../../demos/gaze_estimation_demo/cpp/README.md) * [G-API Gaze Estimation Demo](../../../demos/gaze_estimation_demo/cpp_gapi/README.md) diff --git a/models/intel/face-detection-retail-0004/README.md b/models/intel/face-detection-retail-0004/README.md index fa990323a2f..63ffc9e716f 100644 --- a/models/intel/face-detection-retail-0004/README.md +++ b/models/intel/face-detection-retail-0004/README.md @@ -51,6 +51,7 @@ bounding boxes. Each detection has the format [`image_id`, `label`, `conf`, `x_m The model can be used in the following demos provided by the Open Model Zoo to show its capabilities: +* [Face Recognition C++ Demo](../../../demos/face_recognition_demo/cpp/README.md) * [Face Recognition Python\* Demo](../../../demos/face_recognition_demo/python/README.md) * [Gaze Estimation Demo](../../../demos/gaze_estimation_demo/cpp/README.md) * [G-API Gaze Estimation Demo](../../../demos/gaze_estimation_demo/cpp_gapi/README.md) diff --git a/models/intel/face-detection-retail-0005/README.md b/models/intel/face-detection-retail-0005/README.md index c908945611d..002442205cb 100644 --- a/models/intel/face-detection-retail-0005/README.md +++ b/models/intel/face-detection-retail-0005/README.md @@ -50,6 +50,7 @@ bounding boxes. Each detection has the format [`image_id`, `label`, `conf`, `x_m The model can be used in the following demos provided by the Open Model Zoo to show its capabilities: +* [Face Recognition C++ Demo](../../../demos/face_recognition_demo/cpp/README.md) * [Gaze Estimation Demo](../../../demos/gaze_estimation_demo/cpp/README.md) * [G-API Gaze Estimation Demo](../../../demos/gaze_estimation_demo/cpp_gapi/README.md) * [Interactive Face Detection C++ Demo](../../../demos/interactive_face_detection_demo/cpp/README.md) diff --git a/models/intel/face-reidentification-retail-0095/README.md b/models/intel/face-reidentification-retail-0095/README.md index e0cfae4af38..b968027b4aa 100644 --- a/models/intel/face-reidentification-retail-0095/README.md +++ b/models/intel/face-reidentification-retail-0095/README.md @@ -51,6 +51,7 @@ The net outputs a blob with the shape `1, 256, 1, 1`, containing a row-vector of The model can be used in the following demos provided by the Open Model Zoo to show its capabilities: +* [Face Recognition C++ Demo](../../../demos/face_recognition_demo/cpp/README.md) * [Face Recognition Python\* Demo](../../../demos/face_recognition_demo/python/README.md) * [Smart Classroom C++ Demo](../../../demos/smart_classroom_demo/cpp/README.md) * [Smart Classroom C++ G-API Demo](../../../demos/smart_classroom_demo/cpp_gapi/README.md) diff --git a/models/intel/landmarks-regression-retail-0009/README.md b/models/intel/landmarks-regression-retail-0009/README.md index 618b0bdde8d..2dee4d2a142 100644 --- a/models/intel/landmarks-regression-retail-0009/README.md +++ b/models/intel/landmarks-regression-retail-0009/README.md @@ -45,6 +45,7 @@ All the coordinates are normalized to be in range [0, 1]. The model can be used in the following demos provided by the Open Model Zoo to show its capabilities: +* [Face Recognition C++ Demo](../../../demos/face_recognition_demo/cpp/README.md) * [Face Recognition Python\* Demo](../../../demos/face_recognition_demo/python/README.md) * [Smart Classroom C++ Demo](../../../demos/smart_classroom_demo/cpp/README.md) * [Smart Classroom C++ G-API Demo](../../../demos/smart_classroom_demo/cpp_gapi/README.md) diff --git a/models/public/Sphereface/README.md b/models/public/Sphereface/README.md index 0291649b9c0..9ad602c3dc9 100644 --- a/models/public/Sphereface/README.md +++ b/models/public/Sphereface/README.md @@ -82,6 +82,7 @@ omz_converter --name The model can be used in the following demos provided by the Open Model Zoo to show its capabilities: +* [Face Recognition C++ Demo](../../../demos/face_recognition_demo/cpp/README.md) * [Face Recognition Python\* Demo](../../../demos/face_recognition_demo/python/README.md) * [Smart Classroom C++ Demo](../../../demos/smart_classroom_demo/cpp/README.md) * [Smart Classroom C++ G-API Demo](../../../demos/smart_classroom_demo/cpp_gapi/README.md) diff --git a/models/public/anti-spoof-mn3/README.md b/models/public/anti-spoof-mn3/README.md index baeda64f724..f095b64de16 100644 --- a/models/public/anti-spoof-mn3/README.md +++ b/models/public/anti-spoof-mn3/README.md @@ -79,6 +79,7 @@ omz_converter --name The model can be used in the following demos provided by the Open Model Zoo to show its capabilities: +* [Face Recognition C++ Demo](../../../demos/face_recognition_demo/cpp/README.md) * [Interactive Face Detection C++ Demo](../../../demos/interactive_face_detection_demo/cpp/README.md) * [G-API Interactive Face Detection Demo](../../../demos/interactive_face_detection_demo/cpp_gapi/README.md) diff --git a/models/public/face-recognition-resnet100-arcface-onnx/README.md b/models/public/face-recognition-resnet100-arcface-onnx/README.md index 394cb195856..982b26720b4 100644 --- a/models/public/face-recognition-resnet100-arcface-onnx/README.md +++ b/models/public/face-recognition-resnet100-arcface-onnx/README.md @@ -82,6 +82,7 @@ omz_converter --name The model can be used in the following demos provided by the Open Model Zoo to show its capabilities: +* [Face Recognition C++ Demo](../../../demos/face_recognition_demo/cpp/README.md) * [Face Recognition Python\* Demo](../../../demos/face_recognition_demo/python/README.md) * [Smart Classroom C++ Demo](../../../demos/smart_classroom_demo/cpp/README.md) * [Smart Classroom C++ G-API Demo](../../../demos/smart_classroom_demo/cpp_gapi/README.md) diff --git a/models/public/facenet-20180408-102900/README.md b/models/public/facenet-20180408-102900/README.md index 08d81395603..aeab6eb4f4f 100644 --- a/models/public/facenet-20180408-102900/README.md +++ b/models/public/facenet-20180408-102900/README.md @@ -77,6 +77,7 @@ omz_converter --name The model can be used in the following demos provided by the Open Model Zoo to show its capabilities: +* [Face Recognition C++ Demo](../../../demos/face_recognition_demo/cpp/README.md) * [Face Recognition Python\* Demo](../../../demos/face_recognition_demo/python/README.md) * [Smart Classroom C++ Demo](../../../demos/smart_classroom_demo/cpp/README.md) * [Smart Classroom C++ G-API Demo](../../../demos/smart_classroom_demo/cpp_gapi/README.md)