This repository contains links to models for sentiment analysis of texts in Russian, which were trained within Evaluation of Pre-Trained Transformers for Sentiment Analysis of Texts in Russian and Deep Transfer Learning Baselines for Sentiment Analysis in Russian articles.
Model | Score |
Rank | Dataset | |||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
SentiRuEval-2016 |
RuSentiment | KRND | LINIS Crowd | RuTweetCorp | RuReviews | |||||||||
TC | Banks | |||||||||||||
micro F1 | macro F1 | F1 | micro F1 | macro F1 | F1 | wighted F1 | F1 | F1 | F1 | F1 | F1 | |||
SOTA | n/s | 76.71 | 66.40 | 70.68 | 67.51 | 69.53 | 74.06 | 78.50 | n/s | 73.63 | 60.51 | 83.68 | 77.44 | |
XLM-RoBERTa-Large | 76.37 | 1 | 82.26 | 76.36 | 79.42 | 76.35 | 76.08 | 80.89 | 78.31 | 75.27 | 75.17 | 60.03 | 88.91 | 78.81 |
SBERT-Large | 75.43 | 2 | 78.40 | 71.36 | 75.14 | 72.39 | 71.87 | 77.72 | 78.58 | 75.85 | 74.20 | 60.64 | 88.66 | 77.41 |
MBARTRuSumGazeta | 74.70 | 3 | 76.06 | 68.95 | 73.04 | 72.34 | 71.93 | 77.83 | 76.71 | 73.56 | 74.18 | 60.54 | 87.22 | 77.51 |
Conversational RuBERT | 74.44 | 4 | 76.69 | 69.09 | 73.11 | 69.44 | 68.68 | 75.56 | 77.31 | 74.40 | 73.10 | 59.95 | 87.86 | 77.78 |
LaBSE | 74.11 | 5 | 77.00 | 69.19 | 73.55 | 70.34 | 69.83 | 76.38 | 74.94 | 70.84 | 73.20 | 59.52 | 87.89 | 78.47 |
XLM-RoBERTa-Base | 73.60 | 6 | 76.35 | 69.37 | 73.42 | 68.45 | 67.45 | 74.05 | 74.26 | 70.44 | 71.40 | 60.19 | 87.90 | 78.28 |
RuBERT | 73.45 | 7 | 74.03 | 66.14 | 70.75 | 66.46 | 66.40 | 73.37 | 75.49 | 71.86 | 72.15 | 60.55 | 86.99 | 77.41 |
MBART-50-Large-Many-to-Many | 73.15 | 8 | 75.38 | 67.81 | 72.26 | 67.13 | 66.97 | 73.85 | 74.78 | 70.98 | 71.98 | 59.20 | 87.05 | 77.24 |
SlavicBERT | 71.96 | 9 | 71.45 | 63.03 | 68.44 | 64.32 | 63.99 | 71.31 | 72.13 | 67.57 | 72.54 | 58.70 | 86.43 | 77.16 |
EnRuDR-BERT | 71.51 | 10 | 72.56 | 64.74 | 69.07 | 61.44 | 60.21 | 68.34 | 74.19 | 69.94 | 69.33 | 56.55 | 87.12 | 77.95 |
RuDR-BERT | 71.14 | 11 | 72.79 | 64.23 | 68.36 | 61.86 | 60.92 | 68.48 | 74.65 | 70.63 | 68.74 | 54.45 | 87.04 | 77.91 |
MBART-50-Large | 69.46 | 12 | 70.91 | 62.67 | 67.24 | 61.12 | 60.25 | 68.41 | 72.88 | 68.63 | 70.52 | 46.39 | 86.48 | 77.52 |
This repository contains the fine-tuned Multilingual Bidirectional Encoder Representations from Transformers (M-BERT), RuBERT, and two versions of Multilingual Universal Sentence Encoder (M-USE) for sentiment classification in Russian referenced in Deep Transfer Learning Baselines for Sentiment Analysis in Russian.
Dataset | Measure | Current SOTA | M-BERT | RuBERT | M-USE-CNN | M-USE-Trans |
---|---|---|---|---|---|---|
SentiRuEval-2016 TC | F1 | 68.42 | 66.29 |
70.68 |
63.64 | 68.27 |
macro F1PN | 66.07 | 61.78 | 66.40 | 58.97 | 62.77 | |
micro F1PN | 74.11 | 72.45 | 76.71 | 71.31 | 75.00 | |
SentiRuEval-2016 Banks | F1 | 74.06 | 65.31 | 72.83 | 66.71 | 72.40 |
macro F1PN | 69.53 | 58.00 | 65.89 | 58.73 | 65.04 | |
micro F1PN | 71.76 | 60.52 | 68.43 | 62.41 | 68.21 | |
SentiRuEval-2016 TC | F1 | 68.54 | 60.47 | 64.39 | 60.57 | 64.28 |
macro F1PN | 63.47 | 53.16 | 57.76 | 52.37 | 57.60 | |
micro F1PN | 67.51 | 57.03 | 61.38 | 57.76 | 61.18 | |
SentiRuEval-2016 Banks | F1 | 79.51 | 67.65 | 70.58 | 66.32 | 69.62 |
macro F1PN | 67.44 | 56.97 | 60.95 | 54.74 | 59.12 | |
micro F1PN | 70.09 | 59.32 | 63.33 | 57.61 | 62.17 | |
RuSentiment | F1 | n/s | 71.37 | 72.03 | 66.27 | 68.60 |
weighted F1 | 78.50 | 75.13 | 75.71 | 71.05 | 73.42 | |
Kaggle Russian News Dataset | F1 | 70.00 | 71.36 | 73.63 | 71.27 | 72.66 |
LINIS Crowd | F1 | 37.29 | 42.73 | 60.51 | 56.34 | 56.95 |
RuTweetCorp (binary) | F1 | 75.95 | 83.04 | 83.69 | 81.34 | 83.17 |
RuTweetCorp (trinary) | F1 | 78.1 | 80.10 | 80.79 | 78.39 | 79.69 |
RuReviews | F1 | 75.45 | 77.31 | 77.44 | 76.63 | 76.94 |
SOTA approaches for RuReviews, RuSentiment, Kaggle Russian News Dataset, and RuTweetCorp were described in papers (Smetanin and Komarov, 2019), (Baymurzina et al., 2019), (Shalkarbayuli et al., 2018), and (Rubtsova, 2018), consequently. The SOTA approach for LINIS Crowd was implemented based on the paper (Koltsova et al., 2016).
Despite the fact that Russian is one of the most common languages in the World Wide Web, generally it is not as well-resourced as the English language, especially in the field of sentiment analysis. Even though many studies aim at sentiment classification, only few of them makes their datasets publicly available for the research community.
Dataset | Classes | Average lengths | Max lengths | Train Samples | Test Samples | Overall Samples | Download Link |
---|---|---|---|---|---|---|---|
SentiRuEval-2016 (Loukachevitch and Rubtsova, 2016) | 3 | 87.0928 | 172 | 18,035 | 5,560 | 23,595 | Project page |
SentiRuEval-2015 Subtask (Loukachevitch et al., 2015) | 3 | 81.4986 | 172 | 8,580 | 7,738 | 16,318 | Project page |
RuTweetCorp (Rubtsova, 2013) | 3 | 89.1725 | 189 | n/a | n/a | 334836 | Project page |
LINIS Crowd (Koltsova et al., 2016) | 5 | n/a | n/a | n/a | n/a | n/a | Project page |
RuSentiment (Rogers et al., 2018) | 5 | 82.0279 | 800 | 28218 | 2967 | 31185 | Project page |
Kaggle Russian News Dataset | 3 | 3911.8501 | 381498 | n/a | n/a | 8263 | Kaggle page |
RuReviews (Smetanin and Komarov, 2019) | 3 | 130.0693 | 1007 | n/a | n/a | 90,000 | GitHub page |
To download fine-tuned models for Russian, please follow the link https://yadi.sk/d/Xp5vLG_5xCQL-Q.
@article{Smetanin2020Deep,
title = {Deep transfer learning baselines for sentiment analysis in Russian},
author = {Sergey Smetanin and Mikhail Komarov},
journal = {Information Processing & Management},
volume = {58},
number = {3},
pages = {102484},
year = {2021},
issn = {0306-4573},
doi = {https://doi.org/10.1016/j.ipm.2020.102484},
url = {https://www.sciencedirect.com/science/article/pii/S0306457320309730}
}
See LICENSE.