Towards Scalable Neural Representation for Diverse Videos (CVPR2023)

Project Page | Paper

The official repository of our paper "Towards Scalable Neural Representation for Diverse Videos".

Model Overview

Requirements

You can install the conda environment by running:

conda create -n dnerv python=3.9.7
conda activate dnerv
conda install pytorch torchvision pytorch-cuda=11.7 -c pytorch -c nvidia
pip install tensorboard
pip install tqdm dahuffman pytorch_msssim

Video Compression

We adopt the existing deep image compression models provided by CompressAI. We provide the pre-extracted ground-truth video frames and pre-compressed keyframes for UVG and UCF101 datasets in this google drive link.

Unzip it under the data/ folder and make sure the data structure is as below.

 ├── data
     └── UVG
         ├── gt
         ├── keyframe
         ├── annotation
     └── UCF101
         ├── gt
         ├── keyframe
         ├── annotation

Please note that, we split the 1024x1920 UVG videos into non-overlap 256x320 frame patches during training due to the GPU memory limitation.

Running

Training

We train our model on 4 RTX-A6000 GPUs. To compare with other state-of-the-art video compression methods, we run for 1600 epochs on UVG dataset and 800 epochs on UCF101 dataset. You can change to a smaller number of epochs to reduce the training time.

# UVG datset
python train.py --dataset UVG --model_type ${model_type} --model_size ${model_size} \
    -e 1600 -b 32 --lr 5e-4 --loss_type Fusion6 -d

# UCF101 datset
python train.py --dataset UCF101 --model_type ${model_type} --model_size ${model_size} \
    -e 800  -b 32 --lr 5e-4 --loss_type Fusion19 -d

Testing

# Evaluate model without model quantization
python train.py --dataset UVG --model_type D-NeRV --model_size M \
        --eval_only --model saved_model/UVG/D-NeRV_M.pth

# Evaluate model with model quantization
python train.py --dataset UVG --model_type D-NeRV --model_size M \
        --eval_only --model saved_model/UVG/D-NeRV_M.pth --quant_model

Dump Predicted Frames

python train.py --dataset UVG --model_type D-NeRV --model_size M \
        --eval_only --model saved_model/UVG/D-NeRV_M.pth --quant_model \
        --dump_images

Please note that, for the UVG dataset, after we splitting 1024x1920 videos into 256x320 frame patches, the PSNR/MS-SSIM results will be different from the actual PSNR/MS-SSIM of 1024x1920. Therefore, we need to dump the predicted frame patches first, and then re-evaluate the PSNR/MS-SSIM with the ground-truth 1024x1980 video frames.

PSNR/MS-SSIM vs. BPP Ratio Calculation

UVG Dataset

Results for different model configs are shown in the following table. The PSNR/MS-SSIM results are reported from the model with quantization.

Model	Arch	Model Param(M)	Entropy Encoding	Keyframe Size(Mb)	Total(Mb)	BPP	PNSR	MS-SSIM	Link
D-NeRV	XS	8.02	0.883	88.39	145.0	0.0189	34.11	0.9479	`link`
D-NeRV	S	15.96	0.881	88.39	200.9	0.0262	34.76	0.9540	`link`
D-NeRV	M	24.20	0.880	123.2	293.6	0.0383	35.74	0.9604	`link`
D-NeRV	L	41.66	0.877	175.1	467.3	0.0609	36.78	0.9668	`link`
D-NeRV	XL	69.75	0.875	254.7	730.3	0.0952	37.43	0.9719	`link`

UCF101 Dataset (training split)

Model	Arch	Model Param(M)	Entropy Encoding	Keyframe Size(Mb)	Total(Mb)	BPP	PNSR	MS-SSIM	Link
D-NeRV	S	21.40	0.882	481.6	632.7	0.0559	28.11	0.9153	`link`
D-NeRV	M	38.90	0.891	481.6	758.7	0.0671	29.15	0.9364	`link`
D-NeRV	L	61.30	0.891	481.6	918.3	0.0812	29.97	0.9501	`link`
NeRV	S	88.00	0.903		635.9	0.0562	26.78	0.9094	`link`
NeRV	M	105.3	0.900		758.4	0.0671	27.06	0.9177	`link`
NeRV	L	127.2	0.903		919.1	0.0813	27.61	0.9284	`link`

BPP Calculation

$BPP=\dfrac{\overbrace{\text{Model Param} * 8}^{\text{int8 quantization}} * \text{Entropy Encoding} + \text{Keyframe Size}}{\text{H} * \text{W} * \text{Num Frames}}$

For UVG dataset, H = 1024, W = 1920, Num Frames = 3900.

For UCF101 dataset, training split, H = 256, W = 320, Num Frames = 138041.

Citation

If you find our code or our paper useful for your research, please [★star] this repo and [cite] the following paper:

@inproceedings{he2023dnerv,
  title = {Towards Scalable Neural Representation for Diverse Videos},
  author = {He, Bo and Yang, Xitong and Wang, Hanyu and Wu, Zuxuan and Chen, Hao and Huang, Shuaiyi and Ren, Yixuan and Lim, Ser-Nam and Shrivastava, Abhinav},
  booktitle = {Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)},
  year = {2023},
}

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
figs		figs
LICENSE		LICENSE
README.md		README.md
dataset.py		dataset.py
loss.py		loss.py
model.py		model.py
train.py		train.py
utils.py		utils.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

figs

figs

LICENSE

LICENSE

README.md

README.md

dataset.py

dataset.py

loss.py

loss.py

model.py

model.py

train.py

train.py

utils.py

utils.py

Repository files navigation

Towards Scalable Neural Representation for Diverse Videos (CVPR2023)

Project Page | Paper

Model Overview

Requirements

Video Compression

Running

Training

Testing

Dump Predicted Frames

PSNR/MS-SSIM vs. BPP Ratio Calculation

UVG Dataset

UCF101 Dataset (training split)

BPP Calculation

Citation

About

Releases

Packages

Languages

License

boheumd/D-NeRV

Folders and files

Latest commit

History

Repository files navigation

Towards Scalable Neural Representation for Diverse Videos (CVPR2023)

Project Page | Paper

Model Overview

Requirements

Video Compression

Running

Training

Testing

Dump Predicted Frames

PSNR/MS-SSIM vs. BPP Ratio Calculation

UVG Dataset

UCF101 Dataset (training split)

BPP Calculation

Citation

About

Topics

Resources

License

Stars

Watchers

Forks

Languages