Ning Xie, Farley Lai, Derek Doran, Asim Kadav
SNLI-VE is the dataset proposed for the Visual Entailment (VE) task investigated in Visual Entailment Task for Visually-Grounded Language Learning accpeted to NeurIPS 2018 ViGIL workshop). Refer to our full paper for detailed analysis and evaluations.
12/10/2021:
- The Flickr images download is updated and now hosted by AlleNLP
- The Flickr features download link is updated but the archive may require newer unzip to decompress on Linux
NOTE:
- The data remains hosted by external parties and subject to change
Checkout the leaderboard from paperswith code
NOTE
e-SNLI-VE-2.0
relabels the dev
as well as test
splits of the neutral class and evalutes the resulting performance in order of the original, val-correction and val/test correction configurations.
SNLI-VE is built on top of SNLI and Flickr30K.
The problem that VE is trying to solve is to reason about the relationship between an image premise
Pimage and a text hypothesis
Htext.
Specifically, given an image as premise
, and a natural language sentence as hypothesis
, three labels (entailment
, neutral
and contradiction
) are assigned based on the relationship conveyed by the (Pimage, Htext)
entailment
holds if there is enough evidence in Pimage to conclude that Htext is true.contradiction
holds if there is enough evidence in Pimage to conclude that Htext is false.- Otherwise, the relationship is
neutral
, implying the evidence in Pimage is insufficient to draw a conclusion about Htext.
Below is some highlighted dataset statistic, details can be found in our paper.
The data details of train
, dev
and test
split is shown below. The instances of three labels (entailment
, neutral
and contradiction
) are evenly distributed for each split.
Train | Dev | Test | |
---|---|---|---|
#Image | 29783 | 1000 | 1000 |
#Entailment | 176932 | 5959 | 5973 |
#Neutral | 176045 | 5960 | 5964 |
#Contradiction | 176550 | 5939 | 5964 |
Vocabulary Size | 29550 | 6576 | 6592 |
Below is a dataset comparison among SNLI-VE, VQA-v2.0 and CLEVR.
SNLI-VE | VQA-v2.0 | CLEVR | |
---|---|---|---|
Partition Size: | |||
Training | 529527 | 443757 | 699989 |
Validation | 17858 | 214354 | 149991 |
Test | 17901 | 555187 | 149988 |
Question Length: | |||
Mean | 7.4 | 6.1 | 18.4 |
Median | 7 | 6 | 17 |
Mode | 6 | 5 | 14 |
Max | 56 | 23 | 43 |
Vocabulary Size | 32191 | 19174 | 87 |
The question here for SNLI-VE dataset is the hypothesis
.
As shown in the figure, the question length of SNLI-VE dataset is distributed with a quite long tail.
To check the quality of SNLI-VE dataset, we randomly sampled 217 pairs from all three splits (565286 pairs in total).
Among all sampled pairs, 20 (about 9.2%) examples are incorrectly labeled, among which the majority is in the neutral
class.
This is consistent to the analysis reported by GTE in its Table 2.
It is worth noting that the original SNLI dataset is not perfectly labeled,
with 8.8% of the sampled data not assigned a gold label
,
implying the disagreement within human labelers.
SNLI-VE is no exception but we believe it is a common scenario in other large scale datasets.
However, if the dataset quality is a major concern to you,
we suggest dropping the neutral
classs and only use entailment
and contradiction
examples.
snli_ve_generator.py generates the SNLI-VE dataset in train
, dev
and test
splits with disjoint image sets.
Each entry contains a Flickr30kID
field to associate with the original Flickr30K image id.
snli_ve_parser.py parses entires in SNLI-VE for applications and is free to revise.
Follow the instructions below to set up the environment and generate SNLI-VE:
-
Set the conda environment and dependencies
conda create -n vet37 python=3.7 conda activate vet37 conda install jsonlines # conda install -c NECLA-ML ml
-
Clone the repo
git clone https://github.com/necla-ml/SNLI-VE.git
-
Generate SNLI-VE in
data/
cd SNLI-VE python -m vet.tools.snli_ve_generator.py
-
Download dependent datasets: Flickr30K, Entities, SNLI, and RoI features
cd data ./download # y to all if necessary
Flickr30k Entities dataset is an extension to Flickr30k, which contains grounded RoI and entity annotations.
It is easy to extend our SNLI-VE dataset with Flickr30k Entities if fine-grained annotations are required in your experiments.
The first is our full paper while the second is the ViGiL workshop version.
@article{xie2019visual,
title={Visual Entailment: A Novel Task for Fine-grained Image Understanding},
author={Xie, Ning and Lai, Farley and Doran, Derek and Kadav, Asim},
journal={arXiv preprint arXiv:1901.06706},
year={2019}
}
@article{xie2018visual,
title={Visual Entailment Task for Visually-Grounded Language Learning},
author={Xie, Ning and Lai, Farley and Doran, Derek and Kadav, Asim},
journal={arXiv preprint arXiv:1811.10582},
year={2018}
}
Thank you for your interest in our dataset!
Please contact us for any questions, comments, or suggestions!