Skip to content

miccunifi/TAPE

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

25 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

TAPE (WACV 2024)

Reference-based Restoration of Digitized Analog Videotapes

arXiv Generic badge Generic badge Generic badge GitHub Stars

PWC

πŸ”₯ πŸ”₯ πŸ”₯ [22/12/2023] The pre-trained model and the code for real-world inference, training and testing are now available

This is the official repository of the paper "Reference-based Restoration of Digitized Analog Videotapes".

Overview

Abstract

Analog magnetic tapes have been the main video data storage device for several decades. Videos stored on analog videotapes exhibit unique degradation patterns caused by tape aging and reader device malfunctioning that are different from those observed in film and digital video restoration tasks. In this work, we present a reference-based approach for the resToration of digitized Analog videotaPEs (TAPE). We leverage CLIP for zero-shot artifact detection to identify the cleanest frames of each video through textual prompts describing different artifacts. Then, we select the clean frames most similar to the input ones and employ them as references. We design a transformer-based Swin-UNet network that exploits both neighboring and reference frames via our Multi-Reference Spatial Feature Fusion (MRSFF) blocks. MRSFF blocks rely on cross-attention and attention pooling to take advantage of the most useful parts of each reference frame. To address the absence of ground truth in real-world videos, we create a synthetic dataset of videos exhibiting artifacts that closely resemble those commonly found in analog videotapes. Both quantitative and qualitative experiments show the effectiveness of our approach compared to other state-of-the-art methods.

Overview of the proposed approach

Overview of the proposed approach. Left given a video, we identify the cleanest frames with CLIP. First, we measure the similarity between the frames and textual prompts that describe different artifacts. Then, we employ Otsu's method to define a threshold for classifying the frames based on their similarity scores, resulting in a set of clean frames. Right given a window of $T$ degraded input frames, we select the most similar $D$ clean frames based on the CLIP image features and employ them as references. The proposed Swin-UNet then restores the input frames while effectively leveraging the references.

Dataset

Dataset frame example Dataset frame example Dataset frame example

We release a dataset of videos synthetically degraded with Adobe After Effects to exhibit artifacts resembling those of real-world analog videotapes. The original high-quality videos belong to the Venice scene of the Harmonic dataset. The artifacts taken into account are: 1) tape mistracking; 2) VHS edge waving; 3) chroma loss along the scanlines; 4) tape noise; 5) undersaturation. The dataset comprises a total of 26,392 frames corresponding to 40 clips. The clips are randomly divided into training and test sets with a 75%-25% ratio.

The dataset can be downloaded here. We release both the mp4 videos and the LMDB files associated with each split.

Citation

@inproceedings{agnolucci2024reference,
  title={Reference-based Restoration of Digitized Analog Videotapes},
  author={Agnolucci, Lorenzo and Galteri, Leonardo and Bertini, Marco and Del Bimbo, Alberto},
  booktitle={Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision},
  pages={1659--1668},
  year={2024}
}

Getting Started

Installation

We recommend using the Anaconda package manager to avoid dependency/reproducibility problems. For Linux systems, you can find a conda installation guide here.

  1. Clone the repository
git clone https://github.com/miccunifi/TAPE
  1. Install Python dependencies
conda create -n TAPE -y python=3.10
conda activate TAPE
cd TAPE
chmod +x install_requirements.sh
./install_requirements.sh
  1. (Optional) If you want to compute the VMAF score, you first need to install ffmpeg. Then, follow the instructions reported here to install the VMAF Python library. Finally, place the vmaf folder inside the utils directory.

Data Preparation

Download the dataset from here. At the end, the directory structure should look like this:

β”œβ”€β”€ data_base_path
|
|    β”œβ”€β”€ train
|    |   β”œβ”€β”€ input
|    |   |   β”œβ”€β”€ input.lmdb
|    |   |   β”œβ”€β”€ videos
|    |   β”œβ”€β”€ gt
|    |   |   β”œβ”€β”€ gt.lmdb
|    |   |   β”œβ”€β”€ videos
|
|    β”œβ”€β”€ test
|    |   β”œβ”€β”€ input
|    |   |   β”œβ”€β”€ input.lmdb
|    |   |   β”œβ”€β”€ videos
|    |   β”œβ”€β”€ gt
|    |   |   β”œβ”€β”€ gt.lmdb
|    |   |   β”œβ”€β”€ videos

Real-world Inference

To use our method for restoring a real-world video, download the pre-trained model from the release and place it under the experiments/pretrained_model directory. Then, run the following command:

 python real_world_inference.py --input-path <path_to_video> --output-path <path_to_output_folder>
--input-path <str>                           Path to the video to restore
--output-path <str>                          Path to the output folder
--checkpoint-path <str>                      Path to the pretrained model checkpoint (default=experiments/pretrained_model/checkpoint.pth)
--num-input-frames <int>                     Number of input frames T for each input window (default=5)
--num-reference-frames <int>                 Number of reference frames D for each input window (default=5)
--preprocess-mode <str>                      Preprocessing mode, options: ['crop', 'resize', 'none']. 'crop' extracts the --patch-size center
                                             crop, 'resize' resizes the longest side to --patch-size while keeping the aspect ratio, 'none'
                                             applies no preprocessing  (default=crop)
--patch-size <int>                           Maximum patch size for --preprocess-mode ['crop', 'resize'] (default=512)
    
--frame-format <str>                         Frame format of the extracted and restored frames (default=jpg)
--generate-combined-video <store_true>       Whether to generate the combined video (i.e. input and restored videos side by side)
--no-intermediate-products <store_true>      Whether to delete intermediate products (i.e. input frames, restored frames, references)
--batch-size <int>                           Batch size (default=1)
--num-workers <int>                          Number of workers of the data loader (default=20)

Training and Testing

To train our model from scratch, run the following command:

python main.py --experiment-name <name_of_the_experiment> --data-base-path <data_base_path> --comet-api-key <comet_api_key> --comet-project-name <comet_project_name>

You need a Comet ML for logging. See main.py for all the available options. The checkpoints will be saved inside the experiments/<name_of_the_experiment>/checkpoints folder. After training, main.py will run the evaluation on the test set and save the results inside the experiments/<name_of_the_experiment>/results folder.

If you want to skip the training and just run the evaluation on the test set, add the --test-only flag to the command above. In addition, if you want to avoid computing the VMAF score, add the --no-vmaf flag.

You can test our pre-trained model by adding the --eval-type pretrained flag. Note that you first need to download the pre-trained model from the release and to place it under the experiments/pretrained_model directory.

Authors

Acknowledgements

This work was partially supported by the European Commission under European Horizon 2020 Programme, grant number 101004545 - ReInHerit.

LICENSE

Creative Commons License
All material is made available under Creative Commons BY-NC 4.0. You can use, redistribute, and adapt the material for non-commercial purposes, as long as you give appropriate credit by citing our paper and indicate any changes that you've made.