UniAP: Towards Universal Animal Perception in Vision via Few-shot Learning
Meiqi Sun*, Zhonghan Zhao*, Wenhao Chai*, Hanjun Luo, Shidong Cao, Yanting Zhang, Jenq-Neng Hwang, Gaoang Wang
AAAI 2024
We introduce UniAP, a novel Universal Animal Perception model that leverages few-shot learning to enable cross-species perception among various visual tasks.
- [2023.12.10]: 🎉 Our paper is accepted by AAAI 2024.
- [2023.08.20] : We release our code.
- [2023.08.19] 📃 We release the paper.
- Download Datasets
- Animal Kingdom Dataset (pose estimation) from the official GitHub page https://github.com/sutdcv/Animal-Kingdom/blob/master/Animal_Kingdom/pose_estimation/README_pose_estimation.md.
- Animal Pose Dataset from the official GitHub page https://github.com/noahcao/animal-pose-dataset.
- APT-36K Dataset from the official GitHub page https://github.com/pandorgan/APT-36K.
- Oxford-IIIT Pet Dataset from the official page https://www.robots.ox.ac.uk/~vgg/pets/.
- (Optional) Resize the images and labels into (256, 256) resolution.
- We stored data from all animal images and labels in a single directory. The directory structure looks like:
<Root>
|--<AnimalKingdom>
| |--<animal1>_<rgb>
| | ...
| |--<animal2>_<label>
| |...
|
|--<APT-36K>
| |--<animal1>_<rgb>
| | ...
| |--<animal2>_<label>
| |...
|
|--<AnimalPose>
| |--<animal1>_<rgb>
| | ...
| |--<animal2>_<label>
| |...
|
|--<Oxford-IIITPet>
| |--<animal1>_<rgb>
| | ...
| |--<animal2>_<label>
| |...
|
|...
-
Create
data_paths.yaml
file and write the root directory path (<Root>
in the above structure) byUniASET: PATH_TO_YOUR_UniASET
. -
Install pre-requirements by
pip install -r requirements.txt
. -
Create
model/pretrained_checkpoints
directory and download BEiT pre-trained checkpoints to the directory.
- We used
beit_base_patch16_224_pt22k
checkpoint for our experiment. - We also provided the pre-trained model trained on the AnimalKingdom dataset that can be used to run the configs/demo.yaml (https://drive.google.com/file/d/1HmSMn1h4rY5JtEjS7Th8iPTJhFbAnW9x/view?usp=sharing)
python main.py --stage 0 --task_id [0/1/2/3]
- If you want to train universally on all tasks, please set
task_id=3
. - If you want to train on the specific task, please follow
task_id=0
: pose estimation,task_id=1
: semantic segmentation,task_id=2
: classification.
python main.py --stage 1 --task [kp/mask/cls]
- If you want to finetune on the specific task, please follow
task=kp
: pose estimation,task=mask
: semantic segmentation,task=cls
: classification.
python main.py --stage 2 --task [kp/mask/cls]
- If you want to evaluate on the specific task, please follow
task=kp
: pose estimation,task=mask
: semantic segmentation,task=cls
: classification.
Our code refers the following repositores:
- BEiT: BERT Pre-Training of Image Transformers
- Pose for Everything: Towards Category-Agnostic Pose Estimation
- Images Speak in Images: A Generalist Painter for In-Context Visual Learning
- Segment Anything
- Contrastive Language-Image Pre-Training
If you find STEVE useful for your your research and applications, please cite using this BibTeX:
@article{sun2023uniap,
title={UniAP: Towards Universal Animal Perception in Vision via Few-shot Learning},
author={Sun, Meiqi and Zhao, Zhonghan and Chai, Wenhao and Luo, Hanjun and Cao, Shidong and Zhang, Yanting and Hwang, Jenq-Neng and Wang, Gaoang},
journal={arXiv preprint arXiv:2308.09953},
year={2023}
}