Skip to content

Thai Natural Language Processing in Rust, with Python-binding.

License

Notifications You must be signed in to change notification settings

thanathip-wisesight/oxidized-thainlp

 
 

Repository files navigation

oxidized-thainlp

Thai Natural Language Processing in Rust, with Python-binding.

Features

  • newmm dictionary-based word tokenization, at ultra fast speed
  • support custom dictionary

Usage

Install:

pip install pythainlp-rust-modules

Use in Python:

from oxidized_thainlp import load_dict, segment

load_dict("path/to/dict.file", "dict_name")
segment("สวัสดีครับ", "dict_name")

Just that!

Build It Yourself

Build requirements

  • Rust 2018 Edition
  • Python 3.6 or newer
  • Python Development Headers
    • Ubuntu: sudo apt-get install python3-dev
    • macOS: No action needed
  • PyO3 - already included in Cargo.toml
  • Maturin

Build steps

Linux

maturin build --release -i python --manylinux off

or

maturin build --release -i python

Windows (PowerShell)

path\\to\\maturin.exe build --release -i python

macOS

maturin build --release -i python3

This should generate a wheel file, in target/wheels/ directory, which can be installed by pip.

Note: Omitting "-i python" will let Maturin build for all Python versions detected.

Support

Please report issues at https://github.com/PyThaiNLP/oxidized-thainlp

About

Thai Natural Language Processing in Rust, with Python-binding.

Resources

License

Stars

Watchers

Forks

Packages

No packages published

Languages

  • Rust 66.1%
  • Jupyter Notebook 33.0%
  • Python 0.9%