Bilinear Attentional Transforms (BAT) for Video Classification

This is the official code of Non-Local Neural Networks With Grouped Bilinear Attentional Transforms for video classification on Kinetics.

Pretrained models

Here we provide some of the pretrained models.

Method	Backbone	Input Frames	Top-1 Acc	Link
C2D	ResNet-50	8	72.0%	GoogleDrive / BaiduYun(Access Code: r0i2)
I3D	ResNet-50	8	72.7%	GoogleDrive / BaiduYun(Access Code: dnwv)
C2D + 2D-BAT	ResNet-50	8	74.6%	GoogleDrive / BaiduYun(Access Code: inb0)
I3D + 2D-BAT	ResNet-50	8	75.1%	GoogleDrive / BaiduYun(Access Code: q8d8)
C2D + 3D-BAT	ResNet-50	8	75.5%	GoogleDrive / BaiduYun(Access Code: rnrg)

Quick starts

Requirements

Install Lintel
pip install -r requirements.txt

Data preparation

Download Kinetics-400 via the official scripts.
Generate the training / validation list file. A list file looks like

video_path frame_num label
video_path frame_num label
...

Training

To train a model, run main.py with the desired model architecture and other super-paremeters:

python main.py \
    /PATH/TO/TRAIN_LIST \
    /PATH/TO/VAL_LIST \
    --read_mode video \
    --resume /PATH/TO/IMAGENET_PRETRAINED/MODEL --soft_resume \
    --arch c2d_resnet50 --nonlocal_mod 2 --nltype bat --k 8 --tk 4 \
    --num_segments 1 --seq_length 8 --sample_rate 8 \
    --lr 0.01 --lr_steps 40 80 --epochs 100 \
    --eval-freq 5 --save-freq 5 -b 64 -j 48 --dropout 0.5

More training scripts can be found in scripts. The ImageNet pretrained models can be downloaded from GoogleDrive / BaiduYun(Acess Code: 1r48).

Testing

Fully-convolution inference (recommended):

python test_models.py \
    /PATH/TO/VAL_LIST \
    /PATH/TO/CHECKPOINT \
    --read_mode video \
    --arch c2d_resnet50 --nonlocal_mod 2 --nltype bat --k 8 --tk 4 \
    --test_segments 10 --test_crops 3 --seq_length 8 --sample_rate 8 \
    -j 16

10 crops and 25 segments:

python test_models.py \
    /PATH/TO/VAL_LIST \
    /PATH/TO/CHECKPOINT \
    --read_mode video \
    --arch c2d_resnet50 --nonlocal_mod 2 --nltype bat --k 8 --tk 4 \
    --test_segments 25 --seq_length 8 --sample_rate 8 \
    -j 16

Other applications of BAT

Image Classification

Citation

If you find this work or code is helpful in your research, please cite:

@InProceedings{Chi_2020_CVPR,
  author = {Chi, Lu and Yuan, Zehuan and Mu, Yadong and Wang, Changhu},
  title = {Non-Local Neural Networks With Grouped Bilinear Attentional Transforms},
  booktitle = {The IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)},
  month = {June},
  year = {2020}
}

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
models		models
scripts		scripts
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
dataset.py		dataset.py
figure1.png		figure1.png
main.py		main.py
opts.py		opts.py
test_models.py		test_models.py
transforms.py		transforms.py
utils.py		utils.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Bilinear Attentional Transforms (BAT) for Video Classification

Pretrained models

Quick starts

Requirements

Data preparation

Training

Testing

Other applications of BAT

Citation

About

Releases

Packages

Languages

License

BA-Transform/BAT-Video-Classification

Folders and files

Latest commit

History

Repository files navigation

Bilinear Attentional Transforms (BAT) for Video Classification

Pretrained models

Quick starts

Requirements

Data preparation

Training

Testing

Other applications of BAT

Citation

About

Topics

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages