BodyFlow is a comprehensive library that leverages cutting-edge deep learning and other AI techniques, including new algorithms developed by the AI group at ITA, to accurately estimate human pose in 2D and 3D from videos. With its state-of-the-art algorithms, BodyFlow can detect events such as falls and walking, and in the future, the aim is to further expand its capabilities by developing classifiers for certain neurodegenerative diseases. The use of deep learning and advanced AI methods, combined with the innovative algorithms developed by the ITA AI group, makes BodyFlow a highly sophisticated and effective tool for analyzing human motion and detecting important events.
The first module of this library contains three 2d detectors (MediaPipe2D, CPN, Lightweight) and six 3d detectors (Videopose3D, MHFormer, MixSTE, MotionBert, MediaPipe3D, ExPose) for predicting Human Pose from a set of monocular RGB images in a video sequence. The code from the original works has been refactored in a way that makes them easy to manipulate and combine since the methods used in this project are those that use 2d lifting, this is, first, a 2d pose estimator is used and then it is lifted with another algorithm to the final 3d pose. It is possible to add new 2d and 3d pose estimation algorithms if needed.
We also support multi-person pose estimation by tracking the different people in the frame. We include three different tracking algorithms (Sort, DeepSort, ByteTrack).
It also admits end-to-end models (given a frame predict the 3d pose, like ExPose and MediaPipe3D). However, if the 2D pose is not detected then it will not predict the 3d pose. This is to preserve the working of the lifting 3d models that need a 2d keypoints buffer. To avoid that, you must use the Dummy2D detector, which always returns the 3d pose.
Then, be careful because bounding boxes only appear when 2d keypoints have been detected. This may lead to confusion because you may see few bounding boxes in the scene, but that means that 2d keypoints have been detected.
The second module of the library classifies the activity held in the input. This module allows the user to train or test on custom data.
BodyFlow includes models for activity recognition trained with UP-FALL dataset. If data is available, the user may choose which type of features and which of the three pretrained models. It is possible to choose between 5 different types of features:
- Only 2d pose.
'2d'
-> 68 features - Only 3d pose.
'3d'
-> 102 features - Only IMUs.
'imus'
-> 30 features - 2d + 3d + IMUs.
'all'
-> 68 features - Only one IMU.
'ankle'
-> 6 features
There are three HAR models to choose from:
- An LSTM based HAR model
- A CNN-based HAR model
- A Transformer based HAR model
The HAR module implements two open-source libraries to facilitate the training and management of models: Pytorch Lightning and MLflow.
-
PyTorch Lightning is a popular open-source framework that is designed to make it easier to train machine learning models with PyTorch. It provides a high-level interface for PyTorch that simplifies the process of training and evaluating deep learning models.
-
On the other hand, MLflow is an open-source platform for managing the end-to-end machine learning lifecycle. MLflow, which also supports integration with Pytorch, helps simplify and accelerate the machine learning development process by providing a unified platform for managing experiments, models, and deployments.
The code has been tested with Python 3.10. Other Python versions may work, but they have not been tested.
We start by creating a conda environment with the following command:
$ conda create -n bodyflow-env python=3.10.11
$ conda activate bodyflow-env
Then, you should install the needed libraries (dependencies) which are defined in the requirements file. To do so, run the following command in the prompt:
$ pip install -r src/main/python/requirements.txt
If everything correct, pytorch, torchvision and torchaudio should be sucessfully installed. Cuda is needed to run some models. If there is any problem please check the Problems & Solutions
section in this readme. Then you should install ffmpeg by running the following command:
$ conda install ffmpeg
It is also needed to install EGL devices for the ExPose module (see below). To do so, run the following command in the prompt:
$ sudo apt-get install libegl1-mesa-dev libgles2-mesa-dev
All the model weights files have been wrapped and may be downloaded by executing the following command:
$ python src/main/python/human_pose_estimation/model_downloader.py
Note: If you have permission issues, you might need to execute the above line with sudo
.
- If there is an error raised by mediapipe, the following command can solve it
$ pip install --upgrade protobuf==3.20.0
. - If an error is raised by load_weights, please downgrade PyTorch to the version described in the installation.
- MLflow fails if there is another mlflow session running. To solve this, find the id of the mlflow process with the following command, kill it, and rerun mlfow ui.
$ ps -ax | grep "gunicorn"
$ mlflow ui
- The installation of PyTorch library is needed. This repository has been tested with cuda-enabled pytorch version 1.12.0 and torchvision 0.13.0 installed via conda (see below). Please refer to the official website Pytorch Webpage to obtain the correct PyTorch version according to your CUDA version. You may know the CUDA version by running
nvcc --version
in the prompt.
$ conda install pytorch torchvision torchaudio pytorch-cuda=11.7 -c pytorch -c nvidia
Note: The command nvidia-smi
also provides the CUDA version, and it can be different from the previous command, so this might be the real CUDA version.
Sometimes it appears some error related with the chardet
package. In that case you install it $ pip install chardet
. Note that this package has GNU License.
For multimodal input, please refer to the UPFALL_synchronization instructions.
For the HPE (Human Pose Estimation) module, the main script to run the code is located in src/main/python/human_pose_estimation/inference_server.py
.
To run the code, it is needed to select a 2d and 3d pose estimation predictor and the input data type.
Available 2d predictors are:
- Mediapipe - (Included)
- Cascade Pyramid Network (CPN) - (Included)
- Lightweight - (Included)
Available 3d predictors are:
- MHFormer - (Included)
- Mediapipe - (Included)
- MotionBert - (Installation instructions)
- VideoPose3D - (Installation instructions)
- MixSTE - (Installation instructions)
- ExPose - (Installation instructions)
They are indicated in the following form:
$ python src/main/python/human_pose_estimation/inference_server.py --predictor_2d {mediapipe2d, cpn, lightweight} --predictor_3d {mhformer, videopose, motionbert, mixste, mediapipe3d, expose}
For example, to run CPN and MHFormer:
$ python src/main/python/human_pose_estimation/inference_server.py --predictor_2d cpn --predictor_3d mhformer
You can solo indicate the max_age of the tracker with the parameter --max_age
.
The input type can be a .mp4 video, an orderer set of images in format .png, or video captured directly with the webcam. In the following examples we show how to run the code with each input type with CPN and MHformer combination.
On the other hand, for the HAR module, training or test may be used as follows:
$ python src/main/python/human_activity_recognition/train.py --har_model cnn
$ python src/main/python/human_activity_recognition/test.py --har_model cnn
The video must be accessible and the route has to be indicated as follows:
$ python src/main/python/human_pose_estimation/inference_server.py --predictor_2d cpn --predictor_3d mhformer --input video --path path/to/video.mp4
The folder must contain the images as follows 000001.png, 000002.png, 000003.png, etc. Then, the folder has to be passed as an argument as follows:
$ python src/main/python/human_pose_estimation/inference_server.py --predictor_2d cpn --predictor_3d mhformer --input pictures --path path/to/pictures
You have to know the camera number device that will use opencv to access it. If you do not know and do not have any other device connected to the laptop, then run the following code:
$ python src/main/python/human_pose_estimation/inference_server.py --predictor_2d cpn --predictor_3d mhformer --input cam --video_source 0
Please refer to HAR module instructions
The output is a .csv which contains the 2d and 3d landmarks per each frame and the activity predictions. It is located in the folder logs
. Additionally, if you input a video or a sequence of images, it is eligible for visualizing the pose.
If there is clearly one single person in the video, we highly recommend setting --tracking single
. On the other hand, if there is more than one person in the video you can choose your suitable tracker, however, we only allow displaying one single pose at a time, so the procedure would be setting --bboxes_viz True
to render a video with the identified bounding boxes. Then you should identify the id of the bounding box of your interest and then select the id to plot the pose with --viz [id]
where the id is the id of the bounding box. If no id is specified, by default it is 1. Both videos are stored at the root of the project. An example frame looks as follows:
An example of the pose estimation for the person with id 1 looks as follows:
To see the interface of MLflow you need to run $ mlflow ui
.
The ExPose module extends its functionality by generating additional data types such as point clouds, meshes, and videos. These outputs are organized into specific directories: data/exposeFraming
for point clouds and meshes, and data/exposeVisualizer
for videos.
These repositories are used to extend our code. We appreciate the developers sharing the codes.
- Lightweight
- Deep High-Resolution Representation Learning for Human Pose Estimation
- MHFormer
- VideoPose3d
- MixSTE
- MotionBERT
- ExPose
# --------------------------------------------------------------------------------
# BodyFlow
# Version: 2.0
# Copyright (c) 2024 Instituto Tecnologico de Aragon (www.ita.es) (Spain)
# Date: March 2024
# Authors: Ana Caren Hernandez Ruiz [email protected]
# Angel Gimeno Valero [email protected]
# Carlos Maranes Nueno [email protected]
# Irene Lopez Bosque [email protected]
# Jose Ignacio Calvo Callejo [email protected]
# Maria de la Vega Rodrigalvarez Chamarro [email protected]
# Pilar Salvo Ibanez [email protected]
# Rafael del Hoyo Alonso [email protected]
# Rocio Aznar Gimeno [email protected]
# Pablo Perez Lazaro [email protected]
# Marcos Marina Castello [email protected]
# All rights reserved
# --------------------------------------------------------------------------------
Please refer to the LICENSE in this folder.
BodyFlow was funded by project MIA.2021.M02.0007 of NextGenerationEU program and the Integration and Development of Big Data and Electrical Systems (IODIDE) group of the Aragon Goverment program.
BodyFlow version 2.0 - Included HPE and HAR modules BodyFlow Version 2.0 - Included Multi-person HPE module with tracking and HAR module.