About inference #8

fenguan · 2023-03-13T14:21:31Z

Hello, I'm having some problems with inference. I can successfully infer using the photos you provided but not on my dataset. It's still the original photo.

In addition, I want to use this to judge where people are looking. Is it possible to do this?
Here is the image I use.

hnuzhy · 2023-03-14T02:19:28Z

It is normal that our released model may not predict head pose well of persons from some in-the-wild images. This is because the train-set AGORA-HPE is a synthetic dataset, which inherently has domain gap with real images such as your demo picture. I guess you may turn the --conf-thres value smaller (e.g., from 0.4 to 0.1 or 0.05) for detecting possible head bboxes and their poses.

hnuzhy · 2023-03-14T02:29:57Z

By the way, if you want to judge where people are looking, I think the body orientation is an alternative indicator. You can refer my another work about joint multi-person body detection and orientation estimation in https://github.com/hnuzhy/JointBDOE. It is based and trained on COCO-MEBOW dataset with in-the-wild images. It has good generalization for real application.

fenguan · 2023-03-25T14:02:35Z

It is normal that our released model may not predict head pose well of persons from some in-the-wild images. This is because the train-set AGORA-HPE is a synthetic dataset, which inherently has domain gap with real images such as your demo picture. I guess you may turn the value smaller (e.g., from 0.4 to 0.1 or 0.05) for detecting possible head bboxes and their poses.--conf-thres

Your suggestion works great. I ordered --conf-thres=0.05. The effect is as follows:

Are there other ways to improve it?

fenguan · 2023-03-25T14:24:16Z

By the way, if you want to judge where people are looking, I think the body orientation is an alternative indicator. You can refer my another work about joint multi-person body detection and orientation estimation in https://github.com/hnuzhy/JointBDOE. It is based and trained on COCO-MEBOW dataset with in-the-wild images. It has good generalization for real application

In this project, I set --conf-thres=0.05 --iou-thres=0.1. This is the result of direct inference

But sometimes people cannot be detected well.

So want to judge whether a person is looking around in the crop.

Is it possible to combine your two efforts to achieve this? I would be very grateful if you have other suggestions.

hnuzhy · 2023-03-25T14:48:17Z

It seems that your task is mostly related to multi-person body detection and orientation estimation. I simply run my JointBDOE (https://github.com/hnuzhy/JointBDOE) work in your demo images. The results of person orientation are basically reliable.

hnuzhy · 2023-03-25T14:51:55Z

If you want to further judge whether a person is looking around or not, you may run single HPE task with full-range view. You may refer the method WHE-Net (https://github.com/Ascend-Research/HeadPoseEstimation-WHENet or https://github.com/PINTO0309/HeadPoseEstimation-WHENet-yolov4-onnx-openvino) which uses a well-cropped head image as its input.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

About inference #8

About inference #8

fenguan commented Mar 13, 2023 •

edited

Loading

hnuzhy commented Mar 14, 2023

hnuzhy commented Mar 14, 2023

fenguan commented Mar 25, 2023

fenguan commented Mar 25, 2023

hnuzhy commented Mar 25, 2023

hnuzhy commented Mar 25, 2023

About inference #8

About inference #8

Comments

fenguan commented Mar 13, 2023 • edited Loading

hnuzhy commented Mar 14, 2023

hnuzhy commented Mar 14, 2023

fenguan commented Mar 25, 2023

fenguan commented Mar 25, 2023

hnuzhy commented Mar 25, 2023

hnuzhy commented Mar 25, 2023

fenguan commented Mar 13, 2023 •

edited

Loading