Skip to content

PyTAIL - Interactive and Incremental Learning of NLP Models with Human in the Loop for Online Data

License

Notifications You must be signed in to change notification settings

socialmediaie/pytail

Repository files navigation

PyTAIL - Interactive and Incremental Learning of NLP Models with Human in the Loop for Online Data

PyTAIL is a successor of the Java Library called SAIL: Sentiment Analysis and Incremental Learning

NOTE: THIS CODE IS UNDER WORKS. For paper version visit the code in SocialMediaIE repo as mentioned in the Legacy Code section below

Dataset DOI arXiv Poster Slides YouTube Video Views

PyTAIL Logo

Abstract

Online data streams make training machine learning models hard because of distribution shift and new patterns emerging over time. For natural language processing (NLP) tasks that utilize a collection of features based on lexicons and rules, it is important to adapt these features to the changing data. To address this challenge we introduce PyTAIL, a python library, which allows a human in the loop approach to actively train NLP models. PyTAIL enhances generic active learning, which only suggests new instances to label by also suggesting new features like rules and lexicons to label. Furthermore, PyTAIL is flexible enough for users to accept, reject, or update rules and lexicons as the model is being trained. Finally, we simulate the performance of PyTAIL on existing social media benchmark datasets for text classification. We compare various active learning strategies on these benchmarks. The model closes the gap with as few as 10% of the training data. Finally, we also highlight the importance of tracking evaluation metric on remaining data (which is not yet merged with active learning) alongside the test dataset. This highlights the effectiveness of the model in accurately annotating the remaining dataset, which is especially suitable for batch processing of large unlabeled corpora.

Legacy Code

The code used for experiments in the paper can be found as part of the SocialMediaIE tool at: https://github.com/socialmediaie/SocialMediaIE/blob/master/SocialMediaIE/scripts/active_learning_experiment.py

About

PyTAIL - Interactive and Incremental Learning of NLP Models with Human in the Loop for Online Data

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published