Part-of-speech tagging (POS tagging) is the task of tagging a word in a text with its part of speech. A part of speech is a category of words with similar grammatical properties. Common English parts of speech are noun, verb, adjective, adverb, pronoun, preposition, conjunction, etc.
Example:
Vinken | , | 61 | years | old |
---|---|---|---|---|
NNP | , | CD | NNS | JJ |
A standard dataset for POS tagging is the Wall Street Journal (WSJ) portion of the Penn Treebank, containing 45 different POS tags. Sections 0-18 are used for training, sections 19-21 for development, and sections 22-24 for testing. Models are evaluated based on accuracy.
The Ritter (2011) dataset has become the benchmark for social media part-of-speech tagging. This is comprised of some 50K tokens of English social media sampled in late 2011, and is tagged using an extended version of the PTB tagset.
Model | Accuracy | Paper | Source |
---|---|---|---|
ACE + fine-tune (Wang et al., 2020) | 93.4 | Automated Concatenation of Embeddings for Structured Prediction | Official |
PretRand (Meftah et al., 2019) | 91.46 | Joint Learning of Pre-Trained and Random Units for Domain Adaptation in Part-of-Speech Tagging | |
FastText + CNN + CRF | 90.53 | Twitter word embeddings (Godin et al. 2019 (Chapter 3)) | |
CMU | 90.0 ± 0.5 | Improved Part-of-Speech Tagging for Online Conversational Text with Word Clusters | |
GATE | 88.69 | Twitter Part-of-Speech Tagging for All: Overcoming Sparse and Noisy Data |
Universal Dependencies (UD) is a framework for cross-linguistic grammatical annotation, which contains more than 100 treebanks in over 60 languages. Models are typically evaluated based on the average test accuracy across 21 high-resource languages (♦ evaluated on 17 languages).
Model | Avg accuracy | Paper / Source |
---|---|---|
XLM-R + SUB^2 data augmentation (Shi et al., 2021) | 97.7 | Substructure Substitution: Structured Data Augmentation for NLP / code |
XLM-R (Shi et al., 2021) | 97.7 | Substructure Substitution: Structured Data Augmentation for NLP / code |
Multilingual BERT and BPEmb (Heinzerling and Strube, 2019) | 96.77 | Sequence Tagging with Contextual and Non-Contextual Subword Representations: A Multilingual Evaluation |
Adversarial Bi-LSTM (Yasunaga et al., 2018) | 96.65 | Robust Multilingual Part-of-Speech Tagging via Adversarial Training |
MultiBPEmb (Heinzerling and Strube, 2019) | 96.62 | Sequence Tagging with Contextual and Non-Contextual Subword Representations: A Multilingual Evaluation |
Bi-LSTM (Plank et al., 2016) | 96.40 | Multilingual Part-of-Speech Tagging with Bidirectional Long Short-Term Memory Models and Auxiliary Loss |
Joint Bi-LSTM (Nguyen et al., 2017)♦ | 95.55 | A Novel Neural Network Model for Joint POS Tagging and Graph-based Dependency Parsing |