Thai Natural Language Processing in Rust, with Python-binding.
- newmm dictionary-based word tokenization, at ultra fast speed
- support custom dictionary
Install:
pip install pythainlp-rust-modules
Use in Python:
from oxidized_thainlp import load_dict, segment
load_dict("path/to/dict.file", "dict_name")
segment("สวัสดีครับ", "dict_name")
Just that!
- Rust 2018 Edition
- Python 3.6 or newer
- Python Development Headers
- Ubuntu:
sudo apt-get install python3-dev
- macOS: No action needed
- Ubuntu:
- PyO3 - already included in Cargo.toml
- Maturin
maturin build --release -i python --manylinux off
or
maturin build --release -i python
path\\to\\maturin.exe build --release -i python
maturin build --release -i python3
This should generate a wheel file, in target/wheels/
directory, which can be installed by pip.
Note: Omitting "-i python" will let Maturin build for all Python versions detected.
Please report issues at https://github.com/PyThaiNLP/oxidized-thainlp