UGLF - Unifying global local features

Introduction

This repository contains the implementation of UGLF model. It is a deep learning model which is used to solve the action spotting task on the SoccerNet-v2 dataset. By investigating in many current researches, we found that most of them just focus on the global feature (the whole frame) without considering the local feature (objects). From that insight, we propose a UGLF by unifying the global and local feature.

Gallery

Our proposed model

Dataset

You can download the dataset SoccerNet-v2 from the offical repository of the challenges after signing the NDA form.

Usage

Libraries installation

To prepare for the required libraries, you can either install them on the local environment, or use virtual environment with conda. Run the following command to install all the needed libraries:

pip install -r requirements.txt

Data downloader

In the repository, you can use the code in /downloader to download the data. Particularly, you will get the password after signing the NDA and replace in the following command:

python3 download.py --password <password> \
    --directory <download_path> \
    --low_quality

You can change the flag --low_quality by these one:

label: Download labels
baidu: Download baidu feature
high_quality: Download video 720p
low_quality: Download video 224p

Frame extracting

From the downloaded videos, you need to use the frames_as_jpg_soccernet script to extract frames:

python frames_as_jpg_soccernet.py <video_dir> \
    --out_dir <output_dir>

By default, it extracts the video at 2fps and use $\frac{cpu_count}{4}$ workers. If you need to tune these value, use the following command:

python frames_as_jpg_soccernet.py <video_dir> \
    --out_dir <output_dir> \
    --sample_fps <fps> \
    --num_workers <n_workers>

Parse labels

Before training the models, run the parse_soccernet script to convert the labels to the approriate format:

python parse_soccernet.py <label_dir> \
	<frame_dir> \
	--out_dir <out_dir>

As a result, the parser script will generate the labels by frame for each dataset. For instance, the output may look like:

[
    {
        "events": [
            {
                "comment": "away; visible",
                "frame": 5509,
                "label": "Foul"
            },
            {
                "comment": "home; visible",
                "frame": 5598,
                "label": "Indirect free-kick"
            }
        ],
        "fps": 2.0833333333333335,
        "height": 224,
        "num_events": 65,
        "num_frames": 5625,
        "video": "england_epl/2014-2015/2015-05-17 - 18-00 Manchester United 1 - 1 Arsenal/1",
        "width": 398
    },
    ...
]

Train model

To train the model, please use GPUs to acclerate the training process. Following the below script and replacing the params:

feature_architecture: The global context feature extractor
- ResNet
- RegNet-Y
- ConvNextt
temporal_architecture: The temporal reasoning module
- GRU
- AS-Former
- Transformer encoder
label type: The label encoding type and loss function
- integer: Use integer encoding if not using mixup, with cross-entropy loss
- one-hot: Use one-hot encoding with focal loss

export CUDA_VISIBLE_DEVICES = <list_of_gpu_ids>

python3 train_e2e.py <dataset_name> \
	<frame_dir> \
	--save_dir <save_dir> \
	--feature_arch <feature_architecture> \
	--temporal_arch <temporal_architecture> \
	--glip_dir <local_feature_dir> \
	--learning_rate <learning_rate> \
	--num_epochs <n_epochs> \
	--start_val_epoch <start_validate_epoch> \
	--batch_size <batch_size> \
	--clip_len <snippet_length> \
	--crop_dim <crop_dimension> \
	--label_type <label_type> \
	--num_workers <n_workers> \
	--mixup \
	--gpu_parallel

Here is an example:

export CUDA_VISIBLE_DEVICES = 1,2
python3 train_e2e.py "soccernet_dataset" \
	"/data/soccernet_720p_2fps" \
	--save_dir "results/800MF_GRU_GSM_FOCAL_GLIP" \
	--glip_dir "/ext_drive/data/glip_feat" \
	--feature_arch "rny008_gsm" \
	--temporal_arch "gru" \
	--learning_rate 1e-3 \
	--num_epochs 150 \
	--start_val_epoch 149 \
	--warm_up_epochs 3 \
	--batch_size 8 \
	--clip_len 100 \
	--crop_dim -1 \
	--label_type "one-hot" \
	--num_workers 4 \
	--mixup \
    --gpu_parallel

Test model

After training the model, you can use that model to run inference on the other splits of dataset. Also, different from the E2E-Spot script, we have added a recall_thresh to tune the high recall filter threshold. Use the following command to run inference:

export CUDA_VISIBLE_DEVICES = <list_of_gpu_ids>

python3 test_e2e.py <save_dir> \
	<frame_dir> \
    --glip_dir <local_feature_dir> \
	--split <data_split> \
	--recall_thresh <recall_threshold> \
	--criterion_key "val" \
	--save

From the SoccerNet-v2, you can choose 1 of these 4 splits:

train
val
test
challenge

Post-processing & evaluate

If you need to do post-processing (NMS) and evaluate (exclude challenge set), use the eval_soccernetv2.py script:

python3 eval_soccernetv2.py <output_file> \
	--split <data_split> \
	--eval_dir <output_dir> \
	--soccernet_path <label_path> \
	--nms_window <nms_window> \
	--filter_score <filter_score> \
	--allow_remove

We have added the two arguments filter_score to filter out all the prediction whose confidence (score) under the provided threshold. Also, if the output folder is existed, you can automatically remove it by passing the --allow_remove flag.

Regards the challenge set, please submit your prediction on the eval.ai challenge.

Loss visualize

To monitor the training process, you can use the loss_visualize.py script to generate a training curve with the output file loss.json while training the model.

python3 loss_visualize.py --input <loss_file> \
    --output <output_image_file>

Merge predictions

A single model may not be a good solution to work on 17 classes. Sometimes, we may want to merge the predictions for multiple models. To do so, use the merge_prediction script as following:

python3 merge_prediction.py <first_prediction_dir> \
    <second_prediction_dir> \
    <output_dir> \
    --either <list_of_either_class> \
    --both <list_of_both_class> \
    --first <list_of_first_class> \
    --second <list_of_second_class>

For example, I want to keep the cards predictions from the 2nd model and the penalty prediction from both models:

python3 merge_prediction.py "prediction_1.json" \
    "prediction_2.json" \
    "prediction_merge.json" \
    --either "Penalty" \
    --second "Red card,Yellow card,Yellow->red card"

Prediction analyze

To analyze the prediction, you can use the view script. Also, you can pass the --nms flag to run the NMS with the score filter 0.2 threshold.

python view.py <data_name> \
  <prediction_folder> \
  <frame_folder> \
  --nms

As a result, a website should be hosted in localhost:8000.

Prediction analyze

Prediction video visualize

With a given video, you can use the visualize_result to see the video and select the event the you want to navigate to. Firstly, please place the prediction in the same folder with the video:

|- match_name
   |- 1_720.mkv
   |- 2_720.mkv
   |- results_spotting.json

We recommended that you should use anaconda to create the virtual environment in this application:

conda create -n annotation python=3.8
conda activate annotation
pip install --upgrade pip
pip install pyqt5

Then, run the application with:

cd visualize_result/src
python3 main.py

Prediction visualize

Result

By combining our UGLF model with the E2E-Spot model, we achieve the top-1 result on SoccerNet-v2 dataset:

Method	Test set		Challenge set
Method	Tight	Loose	Tight	Loose
CALF	-	-	15.33	42.22
CALF-calib	-	46.80	15.83	46.39
RMS-Net	28.83	63.49	27.69	60.92
NetVLAD++	-	-	43.99	74.63
Zhou et al.	47.05	73.77	49.56	74.84
Soares et al.	65.07	78.59	67.81*	78.05*
E2E-Spot (baseline)	61.82	74.05	66.73*	73.26*
UGLF-Combine (ours)	62.49	73.98	69.38*	76.14*

Contribution

The project is implemented by:

Under the instructions of our mentors:

Also, we also want to send a gracefully thank to these public researches, which has supported our implementation:

Citation

UNDER REVIEWING

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

UGLF - Unifying global local features

Introduction

Gallery

Dataset

Usage

Libraries installation

Data downloader

Frame extracting

Parse labels

Train model

Test model

Post-processing & evaluate

Loss visualize

Merge predictions

Prediction analyze

Prediction video visualize

Result

Contribution

Citation

About

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 212 Commits
data/soccernet		data/soccernet
dataset		dataset
downloader		downloader
external		external
img		img
model		model
spot		spot
util		util
visualize_result		visualize_result
web		web
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
baseline.py		baseline.py
eval.py		eval.py
eval_ensemble.py		eval_ensemble.py
eval_soccernetv2.py		eval_soccernetv2.py
extract_card_pred.py		extract_card_pred.py
frames_as_jpg_soccernet.py		frames_as_jpg_soccernet.py
loss_visualize.py		loss_visualize.py
merge_prediction.py		merge_prediction.py
parse_soccernet.py		parse_soccernet.py
print_dataset_stats.py		print_dataset_stats.py
print_video_stats.py		print_video_stats.py
requirements.txt		requirements.txt
test_e2e.py		test_e2e.py
train_e2e.py		train_e2e.py
view.py		view.py

License

phuc16102001/unifying-global-local-feature

Folders and files

Latest commit

History

Repository files navigation

UGLF - Unifying global local features

Introduction

Gallery

Dataset

Usage

Libraries installation

Data downloader

Frame extracting

Parse labels

Train model

Test model

Post-processing & evaluate

Loss visualize

Merge predictions

Prediction analyze

Prediction video visualize

Result

Contribution

Citation

About

Topics

Resources

License

Stars

Watchers

Forks

Languages