This is the official code of Non-Local Neural Networks With Grouped Bilinear Attentional Transforms for video classification on Kinetics.
Here we provide some of the pretrained models.
Method | Backbone | Input Frames | Top-1 Acc | Link |
---|---|---|---|---|
C2D | ResNet-50 | 8 | 72.0% | GoogleDrive / BaiduYun(Access Code: r0i2) |
I3D | ResNet-50 | 8 | 72.7% | GoogleDrive / BaiduYun(Access Code: dnwv) |
C2D + 2D-BAT | ResNet-50 | 8 | 74.6% | GoogleDrive / BaiduYun(Access Code: inb0) |
I3D + 2D-BAT | ResNet-50 | 8 | 75.1% | GoogleDrive / BaiduYun(Access Code: q8d8) |
C2D + 3D-BAT | ResNet-50 | 8 | 75.5% | GoogleDrive / BaiduYun(Access Code: rnrg) |
- Install Lintel
- pip install -r requirements.txt
- Download Kinetics-400 via the official scripts.
- Generate the training / validation list file. A list file looks like
video_path frame_num label
video_path frame_num label
...
To train a model, run main.py with the desired model architecture and other super-paremeters:
python main.py \
/PATH/TO/TRAIN_LIST \
/PATH/TO/VAL_LIST \
--read_mode video \
--resume /PATH/TO/IMAGENET_PRETRAINED/MODEL --soft_resume \
--arch c2d_resnet50 --nonlocal_mod 2 --nltype bat --k 8 --tk 4 \
--num_segments 1 --seq_length 8 --sample_rate 8 \
--lr 0.01 --lr_steps 40 80 --epochs 100 \
--eval-freq 5 --save-freq 5 -b 64 -j 48 --dropout 0.5
More training scripts can be found in scripts. The ImageNet pretrained models can be downloaded from GoogleDrive / BaiduYun(Acess Code: 1r48).
Fully-convolution inference (recommended):
python test_models.py \
/PATH/TO/VAL_LIST \
/PATH/TO/CHECKPOINT \
--read_mode video \
--arch c2d_resnet50 --nonlocal_mod 2 --nltype bat --k 8 --tk 4 \
--test_segments 10 --test_crops 3 --seq_length 8 --sample_rate 8 \
-j 16
10 crops and 25 segments:
python test_models.py \
/PATH/TO/VAL_LIST \
/PATH/TO/CHECKPOINT \
--read_mode video \
--arch c2d_resnet50 --nonlocal_mod 2 --nltype bat --k 8 --tk 4 \
--test_segments 25 --seq_length 8 --sample_rate 8 \
-j 16
If you find this work or code is helpful in your research, please cite:
@InProceedings{Chi_2020_CVPR,
author = {Chi, Lu and Yuan, Zehuan and Mu, Yadong and Wang, Changhu},
title = {Non-Local Neural Networks With Grouped Bilinear Attentional Transforms},
booktitle = {The IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)},
month = {June},
year = {2020}
}