Skip to content

Latest commit

 

History

History
43 lines (34 loc) · 2.93 KB

README.md

File metadata and controls

43 lines (34 loc) · 2.93 KB

SportsSum2.0: Generating High-Quality Sports News from Live Text Commentary

SportsSum2.0 is a Chinese sports game summarization dataset which is based on SportsSum. In short, SportsSum2.0 is the cleaned version of SportsSum. Sports Game Summarization is a challenging task, which aims to generate sports summaries (i.e., news articles) from corresponding live commentaries.

For more details pls refer to the following papers:

BTW, our new paper Knowledge Enhanced Sports Game Summarization has been accepted by WSDM 2022 as a long paper. In this paper, we provide K-SportsSum dataset which contains more data (~1.45 times) than SportsSum / SportsSum2.0. K-SportsSum also offers a large-scale knowledge corpus containing information of games as well as players. More details can be found at K-SportsSum.

Download the Dataset

You can download the data here

Each Game has four related files:

  • news.txt: Original news article from SportsSum.
  • [League]_[id].txt: Cleaned news article. [league] indicates the which league did the game take place in, such as, Bundesliga, CSL, Europa, La Liga, etc. [id] is the identifier of game.
  • live.json: Live commentary document which contains commentary sentences, timeline information and real time scores.
  • linesup.json: Metadata file (contains rosters, starting lineups, player positions, etc.).

Citation and Contact

If you find this data is useful or use the data in your work, please cite our paper and original SportsSum.

@article{Wang2021SportsSum20GH,
  title={SportsSum2.0: Generating High-Quality Sports News from Live Text Commentary},
  author={Jiaan Wang and Zhixu Li and Qiang Yang and Jianfeng Qu and Zhigang Chen and Qingsheng Liu and Guoping Hu},
  journal={Proceedings of the 30th ACM International Conference on Information \& Knowledge Management},
  year={2021}
}
@inproceedings{Huang2020sportssum,
    author    = {Kuan-Hao Huang and
                 Chen Li and
                 Kai-Wei Chang},
    title     = {Generating Sports News from Live Commentary: A Chinese Dataset for Sports Game Summarization},
    booktitle = {Proceedings of the 1st Conference of the Asia-Pacific Chapter of the Association for Computational Linguistics (AACL)},
    year      = {2020},
}

Please contact Jiaan Wang (jawang1[at].stu.suda.edu.cn) for questions and suggestions.