kpgen-lowres-data-aug

Data Augmentation for Low-Resource Keyphrase Generation

Abstract

Keyphrase generation is the task of summarizing the contents of any given article into a few salient phrases (or keyphrases). Existing works for the task mostly rely on large-scale annotated datasets, which are not easy to acquire. Very few works address the problem of keyphrase generation in low-resource settings, but they still rely on a lot of additional unlabeled data for pretraining and on automatic methods for pseudo-annotations. In this paper, we present data augmentation strategies specifically to address keyphrase generation in purely resource-constrained domains. We design techniques that use the full text of the articles to improve both present and absent keyphrase generation. We test our approach comprehensively on three datasets and show that the data augmentation strategies consistently improve the state-of-the-art performance.

Environment

Please refer environment.yml

Preprocessing

Please refer preprocess/ folder. Note that we create different versions of the original dataset apriori.

Train & Test

# Train
python train.py

# Train on limited data
python train.py --limit=100

# Load Checkpoint
python train.py --checkpoint=True

# Train for multiple runs after the initial run(s)
python train.py --times=3 --initial_time=1

# Test (assuming that saved weights are present)
python train.py --test=True

Citation

Please consider citing our paper if you find this work useful:

@inproceedings{garg-etal-2023-data,
    title = "Data Augmentation for Low-Resource Keyphrase Generation",
    author = "Garg, Krishna  and
      Ray Chowdhury, Jishnu  and
      Caragea, Cornelia",
    editor = "Rogers, Anna  and
      Boyd-Graber, Jordan  and
      Okazaki, Naoaki",
    booktitle = "Findings of the Association for Computational Linguistics: ACL 2023",
    month = jul,
    year = "2023",
    address = "Toronto, Canada",
    publisher = "Association for Computational Linguistics",
    url = "https://aclanthology.org/2023.findings-acl.534",
    doi = "10.18653/v1/2023.findings-acl.534",
    pages = "8442--8455",
    abstract = "Keyphrase generation is the task of summarizing the contents of any given article into a few salient phrases (or keyphrases). Existing works for the task mostly rely on large-scale annotated datasets, which are not easy to acquire. Very few works address the problem of keyphrase generation in low-resource settings, but they still rely on a lot of additional unlabeled data for pretraining and on automatic methods for pseudo-annotations. In this paper, we present data augmentation strategies specifically to address keyphrase generation in purely resource-constrained domains. We design techniques that use the full text of the articles to improve both present and absent keyphrase generation. We test our approach comprehensively on three datasets and show that the data augmentation strategies consistently improve the state-of-the-art performance. We release our source code at \url{https://github.com/kgarg8/kpgen-lowres-data-aug}.",
}

Questions

Please contact [email protected] for any questions related to this work.

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
agents		agents
collaters		collaters
configs		configs
controllers		controllers
models		models
preprocess		preprocess
res		res
trainers		trainers
utils		utils
LED_download.py		LED_download.py
LICENSE		LICENSE
README.md		README.md
environment.yml		environment.yml
inference_union.py		inference_union.py
parser.py		parser.py
train.py		train.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

kpgen-lowres-data-aug

Data Augmentation for Low-Resource Keyphrase Generation

Abstract

Environment

Preprocessing

Train & Test

Citation

Questions

About

Releases

Packages

Languages

License

kgarg8/kpgen-lowres-data-aug

Folders and files

Latest commit

History

Repository files navigation

kpgen-lowres-data-aug

Data Augmentation for Low-Resource Keyphrase Generation

Abstract

Environment

Preprocessing

Train & Test

Citation

Questions

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages