This repository contains the official code of the paper: "SCROLLS: Standardized CompaRison Over Long Language Sequences".
Setup instructions are in the baselines and evaluator folders.
For the live leaderboard, checkout the official website.
-
via 🤗 Datasets (huggingface/datasets) library (recommended):
-
Usage:
from datasets import load_dataset qasper_dataset = load_dataset("tau/scrolls", "qasper") """ Options are: ["gov_report", "summ_screen_fd", "qmsum", "narrative_qa", "qasper", "quality", "contract_nli"] """
-
via ZIP files, where each split is in a JSONL file:
@inproceedings{shaham-etal-2022-scrolls,
title = "{SCROLLS}: Standardized {C}ompa{R}ison Over Long Language Sequences",
author = "Shaham, Uri and
Segal, Elad and
Ivgi, Maor and
Efrat, Avia and
Yoran, Ori and
Haviv, Adi and
Gupta, Ankit and
Xiong, Wenhan and
Geva, Mor and
Berant, Jonathan and
Levy, Omer",
booktitle = "Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing",
month = dec,
year = "2022",
address = "Abu Dhabi, United Arab Emirates",
publisher = "Association for Computational Linguistics",
url = "https://aclanthology.org/2022.emnlp-main.823",
pages = "12007--12021",
}
When citing SCROLLS, please make sure to cite all the original dataset papers. [bibtex]