Skip to content

Latest commit

 

History

History
46 lines (40 loc) · 1.51 KB

prepare.md

File metadata and controls

46 lines (40 loc) · 1.51 KB

Preparation

WebVid is a large-scale text-video dataset, containing 10 million video-text pairs scraped from the stock footage sites. To download the dataset, run the following command:

wget -nc http://www.robots.ox.ac.uk/~maxbain/webvid/results_2M_train.csv

video2dataset --url_list="results_10M_train.csv" \
        --input_format="csv" \
        --output-format="webdataset" \
	    --output_folder="dataset" \
        --url_col="contentUrl" \
        --caption_col="name" \
        --save_additional_columns='[videoid,page_idx,page_dir,duration]' \
        --enable_wandb=True \
	    --config="path/to/config.yaml" \

For more datails, please refer to video2dataset.

Postprocess

Once you've downloaded the dataset, please verify the download status, as some video files may not have been successfully downloaded. Afterward, organize the dataset into a json file with the following format:

[   
    {
        "caption": "Merida, mexico - may 23, 2017: tourists are walking on a roadside near catholic church in the street of mexico at sunny summer day.",
        "video_name": "31353427.mp4"
    },
    {
        "caption": "Happy family using laptop on bed at home",
        "video_name": "14781349.mp4"
    },
    ...
]

The data file structure should be:

data/T-X_pair_data/webvid
├── webvid.json
├── videos
|   ├── 31353427.mp4
|   ├── 14781349.mp4
|   └── ...