Sentiment Analysis in Russian

This repository contains links to models for sentiment analysis of texts in Russian, which were trained within Evaluation of Pre-Trained Transformers for Sentiment Analysis of Texts in Russian and Deep Transfer Learning Baselines for Sentiment Analysis in Russian articles.

Evaluation of Pre-Trained Transformers for Sentiment Analysis of Texts in Russian

Model	Score	Rank	Dataset
			SentiRuEval-2016						RuSentiment		KRND	LINIS Crowd	RuTweetCorp	RuReviews
			TC			Banks			RuSentiment		KRND	LINIS Crowd	RuTweetCorp	RuReviews
			micro F₁	macro F₁	F₁	micro F₁	macro F₁	F₁	wighted F₁	F₁	F₁	F₁	F₁	F₁
SOTA	n/s		76.71	66.40	70.68	67.51	69.53	74.06	78.50	n/s	73.63	60.51	83.68	77.44
XLM-RoBERTa-Large	76.37	1	82.26	76.36	79.42	76.35	76.08	80.89	78.31	75.27	75.17	60.03	88.91	78.81
SBERT-Large	75.43	2	78.40	71.36	75.14	72.39	71.87	77.72	78.58	75.85	74.20	60.64	88.66	77.41
MBARTRuSumGazeta	74.70	3	76.06	68.95	73.04	72.34	71.93	77.83	76.71	73.56	74.18	60.54	87.22	77.51
Conversational RuBERT	74.44	4	76.69	69.09	73.11	69.44	68.68	75.56	77.31	74.40	73.10	59.95	87.86	77.78
LaBSE	74.11	5	77.00	69.19	73.55	70.34	69.83	76.38	74.94	70.84	73.20	59.52	87.89	78.47
XLM-RoBERTa-Base	73.60	6	76.35	69.37	73.42	68.45	67.45	74.05	74.26	70.44	71.40	60.19	87.90	78.28
RuBERT	73.45	7	74.03	66.14	70.75	66.46	66.40	73.37	75.49	71.86	72.15	60.55	86.99	77.41
MBART-50-Large-Many-to-Many	73.15	8	75.38	67.81	72.26	67.13	66.97	73.85	74.78	70.98	71.98	59.20	87.05	77.24
SlavicBERT	71.96	9	71.45	63.03	68.44	64.32	63.99	71.31	72.13	67.57	72.54	58.70	86.43	77.16
EnRuDR-BERT	71.51	10	72.56	64.74	69.07	61.44	60.21	68.34	74.19	69.94	69.33	56.55	87.12	77.95
RuDR-BERT	71.14	11	72.79	64.23	68.36	61.86	60.92	68.48	74.65	70.63	68.74	54.45	87.04	77.91
MBART-50-Large	69.46	12	70.91	62.67	67.24	61.12	60.25	68.41	72.88	68.63	70.52	46.39	86.48	77.52

Deep Transfer Learning Baselines for Sentiment Analysis in Russian

This repository contains the fine-tuned Multilingual Bidirectional Encoder Representations from Transformers (M-BERT), RuBERT, and two versions of Multilingual Universal Sentence Encoder (M-USE) for sentiment classification in Russian referenced in Deep Transfer Learning Baselines for Sentiment Analysis in Russian.

Dataset	Measure	Current SOTA	M-BERT	RuBERT	M-USE-CNN	M-USE-Trans
SentiRuEval-2016 TC	F₁	68.42	66.29	70.68	63.64	68.27
	macro F₁^PN	66.07	61.78	66.40	58.97	62.77
	micro F₁^PN	74.11	72.45	76.71	71.31	75.00
SentiRuEval-2016 Banks	F₁	74.06	65.31	72.83	66.71	72.40
	macro F₁^PN	69.53	58.00	65.89	58.73	65.04
	micro F₁^PN	71.76	60.52	68.43	62.41	68.21
SentiRuEval-2016 TC	F₁	68.54	60.47	64.39	60.57	64.28
	macro F₁^PN	63.47	53.16	57.76	52.37	57.60
	micro F₁^PN	67.51	57.03	61.38	57.76	61.18
SentiRuEval-2016 Banks	F₁	79.51	67.65	70.58	66.32	69.62
	macro F₁^PN	67.44	56.97	60.95	54.74	59.12
	micro F₁^PN	70.09	59.32	63.33	57.61	62.17
RuSentiment	F₁	n/s	71.37	72.03	66.27	68.60
	weighted F₁	78.50	75.13	75.71	71.05	73.42
Kaggle Russian News Dataset	F₁	70.00	71.36	73.63	71.27	72.66
LINIS Crowd	F₁	37.29	42.73	60.51	56.34	56.95
RuTweetCorp (binary)	F₁	75.95	83.04	83.69	81.34	83.17
RuTweetCorp (trinary)	F₁	78.1	80.10	80.79	78.39	79.69
RuReviews	F₁	75.45	77.31	77.44	76.63	76.94

SOTA approaches for RuReviews, RuSentiment, Kaggle Russian News Dataset, and RuTweetCorp were described in papers (Smetanin and Komarov, 2019), (Baymurzina et al., 2019), (Shalkarbayuli et al., 2018), and (Rubtsova, 2018), consequently. The SOTA approach for LINIS Crowd was implemented based on the paper (Koltsova et al., 2016).

Sentiment Datasets in Russian

Despite the fact that Russian is one of the most common languages in the World Wide Web, generally it is not as well-resourced as the English language, especially in the field of sentiment analysis. Even though many studies aim at sentiment classification, only few of them makes their datasets publicly available for the research community.

Dataset	Classes	Average lengths	Max lengths	Train Samples	Test Samples	Overall Samples	Download Link
SentiRuEval-2016 (Loukachevitch and Rubtsova, 2016)	3	87.0928	172	18,035	5,560	23,595	Project page
SentiRuEval-2015 Subtask (Loukachevitch et al., 2015)	3	81.4986	172	8,580	7,738	16,318	Project page
RuTweetCorp (Rubtsova, 2013)	3	89.1725	189	n/a	n/a	334836	Project page
LINIS Crowd (Koltsova et al., 2016)	5	n/a	n/a	n/a	n/a	n/a	Project page
RuSentiment (Rogers et al., 2018)	5	82.0279	800	28218	2967	31185	Project page
Kaggle Russian News Dataset	3	3911.8501	381498	n/a	n/a	8263	Kaggle page
RuReviews (Smetanin and Komarov, 2019)	3	130.0693	1007	n/a	n/a	90,000	GitHub page

Fine-Tuned Models

To download fine-tuned models for Russian, please follow the link https://yadi.sk/d/Xp5vLG_5xCQL-Q.

Citation

@article{Smetanin2020Deep,
  title = {Deep transfer learning baselines for sentiment analysis in Russian},
  author = {Sergey Smetanin and Mikhail Komarov},
  journal = {Information Processing & Management},
  volume = {58},
  number = {3},
  pages = {102484},
  year = {2021},
  issn = {0306-4573},
  doi = {https://doi.org/10.1016/j.ipm.2020.102484},
  url = {https://www.sciencedirect.com/science/article/pii/S0306457320309730}
}

License

See LICENSE.

Name		Name	Last commit message	Last commit date
Latest commit History 7 Commits
.gitignore		.gitignore
LICENSE.txt		LICENSE.txt
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Sentiment Analysis in Russian

Evaluation of Pre-Trained Transformers for Sentiment Analysis of Texts in Russian

Deep Transfer Learning Baselines for Sentiment Analysis in Russian

Sentiment Datasets in Russian

Fine-Tuned Models

Citation

License

About

Releases

Packages

License

sismetanin/sentiment-analysis-in-russian

Folders and files

Latest commit

History

Repository files navigation

Sentiment Analysis in Russian

Evaluation of Pre-Trained Transformers for Sentiment Analysis of Texts in Russian

Deep Transfer Learning Baselines for Sentiment Analysis in Russian

Sentiment Datasets in Russian

Fine-Tuned Models

Citation

License

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Packages