Releases · roshan-research/hazm

16 Jan 16:49

sir-kokabi

v0.10.0

270e897

Hazm 0.10.0 Latest

Latest

Added SpacyPOSTagger class for utilizing the hazm deep learning transformer-based model in POS tagging. @MortezaMahdaviMortazavi
Added SpacyChunker class for leveraging the hazm deep learning transformer-based model in chunking. @MortezaMahdaviMortazavi
Added SpacyDependencyParser class for employing the hazm deep learning transformer-based model in dependency parsing. @MortezaMahdaviMortazavi
Added 160,000 new words to improve normalizer and lemmatizer. @sir-kokabi
Added FaSpellReader to read FAspell corpus. @sir-kokabi
Added ArmanReader to read ArmanPersoNERCorpus. @sir-kokabi
Added PnSummaryReader to read pn-summary corpus. @sir-kokabi
Removed unnecessary old Stanford dependencies.. @sir-kokabi

Download pretrained-models

Full Changelog: v0.9.4...v0.10.0

Contributors

MortezaMahdaviMortazavi and sir-kokabi

Assets 2

01 Oct 20:12

sir-kokabi

v0.9.4

af46be6

Hazm 0.9.4

Added join_abbreviations to skip abbrs tokenizing using ParsiNorm's abbreviation lists. #216 @optimopium @sir-kokabi.
Added MizanReader to read Mizan corpus. @sir-kokabi.
Added NaabReader to read Naab corpus. @sir-kokabi.
Added NerReader to read NER corpus. @sir-kokabi.
Improved Normalizer by adding support for normalizing words with the suffix 'هایی'. @sir-kokabi.
Fixed #298: Incompatibility issues with numpy. @mhdi707 @sir-kokabi

Download pretrained-models

Full Changelog: v0.9.3...v0.9.4

Contributors

optimopium, sir-kokabi, and mhdi707

Assets 2

19 Jul 15:25

sir-kokabi

v0.9.3

216ea56

Hazm 0.9.3

Fixed

Fix critical bug in Lemmatizer that caused incorrect lemmatization of certain words. @sir-kokabi.
Fix bug caused WikipediaReader to not work as before #287. @sir-kokabi.
Fix missing imports for WikipediaReaderand PersianPlainTextReader #286. @sir-kokabi.
Fix some issues in the demo to make it compatible with the latest version of Hazm. @sir-kokabi.
Fix a few issues related to tests and mkdocs build. @sir-kokabi.
Improve documentation. @sir-kokabi.
improve dependency tree visualization on the demo page. @sir-kokabi.

Download pretrained-models

Full Changelog: v0.9.2...v0.9.3

Contributors

sir-kokabi

Assets 2

08 Jul 12:48

sir-kokabi

v0.9.2

54f1437

Hazm 0.9.2

Added

Add pretrained DependencyParser models. @E-Ghafour.
Add UniversalDadeganReader class for process and read Universal Persian Dependency Treebank corpus. @E-Ghafour, @imani.
Add 400+ new words to improve Normalizer, Lemmatizer and Tokenizer. @sir-kokabi.

Fixed

Fix DependencyParser issue #282. @E-Ghafour, @imani.
Fix Some tests issues. @E-Ghafour.

Download pretrained-models

Full Changelog: v0.9...v0.9.2

Contributors

imani, E-Ghafour, and sir-kokabi

Assets 2

20 May 15:48

sir-kokabi

v0.9

548c4b1

Hazm 0.9

Added

Windows compaitiblity by using Python-crfsuite instead of Wapiti. @E-Ghafour.
Pretrained Chunker and POSTagger models with Python-crfsuite. @E-Ghafour.
new parameters in Normalizer to better text processing. @sir-kokabi.
Three regex patterns in Normalizer to fix ZWNJs and spacing issues. @sir-kokabi.
400 Non-standard unicode characters to be replaced in Normalizer. @sir-kokabi.
40,000+ new words to improve Lemmatizer and Tokenizer. @sir-kokabi.
train function for Word2vec and Sent2vec modules in Embedding. @E-Ghafour.
Implement keywordExtraction with the embedRank approach as a sample of Hazm usage. @E-Ghafour.
Support Universal tags in POSTagger. @E-Ghafour.
Support universal POS mapper in PeykareReader & DadeganReader (#239). @phsfr.
PersianPlainTextReader to process raw text datasets (#120). @mhbashari.
Support EZ tag in PeykareReader. @E-Ghafour.
Slash & back-slash (/ ) support in Tokenizer (#102). @elahimanesh.
Conjugation class to handle verb conjugation. @sir-kokabi.

Fixed

Improve the accuracy of POSTagger and Chunker. @E-Ghafour.
Improve InformalNormalizer #219. @riasati.
Fix pep8 issues. (#135). @hadifar.
Fix Some tests issues. @sir-kokabi @E-Ghafour.
Fix Stemmer issues with multiple suffixes. @sir-kokabi.
Fix various reported issues

Changed

Drop Python 2 support and migrate all code to Python 3. @sir-kokabi.
Use data_maker function instead of patterns in SequenceTagger. @E-Ghafour.
Refactor IOBTagger and POSTagger to be compatible with data_maker. @E_Ghafour.
Change می روم to می‌روم in example (#203). @SMSadegh19.
Overhaul the project structure and GitHub repo. @sir-kokabi.

Download Pretrained models

Full Changelog: v0.8.2...v0.9

Contributors

mhbashari, hadifar, and 6 other contributors

Assets 2

29 Nov 12:03

imani

v0.8.2

cd59c1e

Hazm 0.8

Release notes:

Add WordEmbedding (Download the pre-trained model(Fasttext) from here)
Add SentenceEmbedding (Download the pre-trained model from here)
Add Documentation webpage (link)
Improve normalizer, informal normalizer, and tokenizer
Add Degarbayan and MirasText corpus reader

What's Changed

fixed MAGHSURAH Y bug in normalizer by @mavahedinia in #116
change list to set in stopwords_list method to remove duplicate stop … by @Azdy-dev in #175
Add Degarbayan interface by @maanijou in #176
fix endless loop in python3 by @mohamad-qodosi in #186
Update README.md by @edalatfard in #187
Fix self.words in WordTokenizer by @SinRas in #190
Fix some tokenization issues by @behnam-sa in #199
Modifying the extra space and newline removal patterns by @asdoost in #200
improvement of InformalNormalizer by @riasati in #214
add some rules to InformalNormalizer by @riasati in #215
Embedding by @E-Ghafour in #229
Embedding by @imani in #230