Skip to content

MVQG collects engaging questions corresponding to image sequences related to human events. This dataset was proposed to EMNLP 2022 paper-- Multi-VQG: Generating Engaging Questions for Multiple Images

License

Notifications You must be signed in to change notification settings

AcademiaSinicaNLPLab/Multi-VQG

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

2 Commits
 
 
 
 
 
 

Repository files navigation

Multi-VQG: Generating Engaging Questions for Multiple Images

MVQG is the dataset of EMNLP 2022 long paper "MultiVQG: Generating Engaging Questions for Multiple Images." This dataset is collected to enhance the ability of visual-and-language (VL) models to generate engaging questions for image sequences. We sampled the image sequences from the VIST dataset [1], and collected engaging questions corresponding to these image sequences via Amazon Mechanical Turk.

Data Structure

We split MVQG dataset into train, val, and test set. Each set is formed as a json file with the structure shown below.

{
    [key of the 1st image sequence]: [
        {
            "Summary": "...",
            "Question": "..."
        },
        ...
    ],
    [key of the 2nd image sequence]: [],
    ...
}

The key of each image sequence is consisted of five numbers concatenated with "_". Each number is the image id in the VIST dataset. You can download the images from here.

Each image sequence has 2 to 5 data points. Every data point include a summary and an engaging question written by workers from Amazon Mechanical Turk.

[1]: Ting-Hao (Kenneth) Huang et al., "Visual Storytelling", Proceedings of the 2016 Conference of the North American Chapter of the Association for Computational Linguistics (NAACL 2016)

About

MVQG collects engaging questions corresponding to image sequences related to human events. This dataset was proposed to EMNLP 2022 paper-- Multi-VQG: Generating Engaging Questions for Multiple Images

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published