-
Notifications
You must be signed in to change notification settings - Fork 9
Speed Comparison
Koichi Akabe edited this page Jun 9, 2022
·
5 revisions
This wiki shows the analysis speed of Vaporetto and other tokenizers and morphological analyzers.
We compared the following softwares:
- KyTea (2020-04-03)
- Vaporetto (v0.4.0)
- MeCab (2020-09-14)
- Lindera (v0.13.2)
- sudachi.rs (v0.6.4-a1)
- rust-tinysegmenter (v0.1.1)
For Vaporetto and KyTea, we used the compact SVM model based on BCCWJ and UniDic downloaded from KyTea Models page. For MeCab, we used IPADic and UniDic. For Lindera, we used UniDic. For sudachi.rs, we used sudachi-dictionary-20210802-core based on UniDic.
We tokenized I Am a Cat (by Soseki Natsume), which is available at Aozora Bunko, and measured elapsed time 100 times for each software.
The following is the specification of the used machine:
- CPU: Intel(R) Core(TM) i7-8086K CPU @ 4.00GHz
- Memory: 64GiB
- OS: CentOS Linux release 7.5.1804 (Core)
- Compilers:
- Rust: 1.60.0
- GCC: 11.2.0
Tool Name | Elapsed Time [ms] | STD |
---|---|---|
KyTea | 219.6 | 2.9 |
Vaporetto | 29.0 | 0.6 |
Vaporetto (charwise) | 25.3 | 0.4 |
rust-tinysegmenter | 272.7 | 6.1 |
MeCab (IPADic) | 102.9 | 1.8 |
MeCab (UniDic) | 255.1 | 3.1 |
Lindera | 397.1 | 7.1 |
sudachi.rs | 286.2 | 4.7 |