Skip to content

CIKM‘2021: SportsSum2.0: Generating High-Quality Sports News from Live Text Commentary

Notifications You must be signed in to change notification settings

krystalan/SportsSum2.0

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

16 Commits
 
 

Repository files navigation

SportsSum2.0: Generating High-Quality Sports News from Live Text Commentary

SportsSum2.0 is a Chinese sports game summarization dataset which is based on SportsSum. In short, SportsSum2.0 is the cleaned version of SportsSum. Sports Game Summarization is a challenging task, which aims to generate sports summaries (i.e., news articles) from corresponding live commentaries.

For more details pls refer to the following papers:

BTW, our new paper Knowledge Enhanced Sports Game Summarization has been accepted by WSDM 2022 as a long paper. In this paper, we provide K-SportsSum dataset which contains more data (~1.45 times) than SportsSum / SportsSum2.0. K-SportsSum also offers a large-scale knowledge corpus containing information of games as well as players. More details can be found at K-SportsSum.

Download the Dataset

You can download the data here

Each Game has four related files:

  • news.txt: Original news article from SportsSum.
  • [League]_[id].txt: Cleaned news article. [league] indicates the which league did the game take place in, such as, Bundesliga, CSL, Europa, La Liga, etc. [id] is the identifier of game.
  • live.json: Live commentary document which contains commentary sentences, timeline information and real time scores.
  • linesup.json: Metadata file (contains rosters, starting lineups, player positions, etc.).

Citation and Contact

If you find this data is useful or use the data in your work, please cite our paper and original SportsSum.

@article{Wang2021SportsSum20GH,
  title={SportsSum2.0: Generating High-Quality Sports News from Live Text Commentary},
  author={Jiaan Wang and Zhixu Li and Qiang Yang and Jianfeng Qu and Zhigang Chen and Qingsheng Liu and Guoping Hu},
  journal={Proceedings of the 30th ACM International Conference on Information \& Knowledge Management},
  year={2021}
}
@inproceedings{Huang2020sportssum,
    author    = {Kuan-Hao Huang and
                 Chen Li and
                 Kai-Wei Chang},
    title     = {Generating Sports News from Live Commentary: A Chinese Dataset for Sports Game Summarization},
    booktitle = {Proceedings of the 1st Conference of the Asia-Pacific Chapter of the Association for Computational Linguistics (AACL)},
    year      = {2020},
}

Please contact Jiaan Wang (jawang1[at].stu.suda.edu.cn) for questions and suggestions.

About

CIKM‘2021: SportsSum2.0: Generating High-Quality Sports News from Live Text Commentary

Topics

Resources

Stars

Watchers

Forks