Releases: roshan-research/hazm
Releases · roshan-research/hazm
Hazm 0.10.0
- Added
SpacyPOSTagger
class for utilizing the hazm deep learning transformer-based model in POS tagging. @MortezaMahdaviMortazavi - Added
SpacyChunker
class for leveraging the hazm deep learning transformer-based model in chunking. @MortezaMahdaviMortazavi - Added
SpacyDependencyParser
class for employing the hazm deep learning transformer-based model in dependency parsing. @MortezaMahdaviMortazavi - Added 160,000 new words to improve
normalizer
andlemmatizer
. @sir-kokabi - Added
FaSpellReader
to read FAspell corpus. @sir-kokabi - Added
ArmanReader
to read ArmanPersoNERCorpus. @sir-kokabi - Added
PnSummaryReader
to read pn-summary corpus. @sir-kokabi - Removed unnecessary old Stanford dependencies.. @sir-kokabi
Full Changelog: v0.9.4...v0.10.0
Hazm 0.9.4
- Added
join_abbreviations
to skip abbrs tokenizing using ParsiNorm's abbreviation lists. #216 @optimopium @sir-kokabi. - Added
MizanReader
to read Mizan corpus. @sir-kokabi. - Added
NaabReader
to read Naab corpus. @sir-kokabi. - Added
NerReader
to read NER corpus. @sir-kokabi. - Improved
Normalizer
by adding support for normalizing words with the suffix 'هایی'. @sir-kokabi. - Fixed #298: Incompatibility issues with numpy. @mhdi707 @sir-kokabi
Full Changelog: v0.9.3...v0.9.4
Hazm 0.9.3
Fixed
- Fix critical bug in
Lemmatizer
that caused incorrect lemmatization of certain words. @sir-kokabi. - Fix bug caused
WikipediaReader
to not work as before #287. @sir-kokabi. - Fix missing imports for
WikipediaReader
andPersianPlainTextReader
#286. @sir-kokabi. - Fix some issues in the demo to make it compatible with the latest version of Hazm. @sir-kokabi.
- Fix a few issues related to tests and mkdocs build. @sir-kokabi.
- Improve documentation. @sir-kokabi.
- improve dependency tree visualization on the demo page. @sir-kokabi.
Full Changelog: v0.9.2...v0.9.3
Hazm 0.9.2
Added
- Add pretrained
DependencyParser
models. @E-Ghafour. - Add
UniversalDadeganReader
class for process and read Universal Persian Dependency Treebank corpus. @E-Ghafour, @imani. - Add 400+ new words to improve
Normalizer
,Lemmatizer
andTokenizer
. @sir-kokabi.
Fixed
- Fix
DependencyParser
issue #282. @E-Ghafour, @imani. - Fix Some tests issues. @E-Ghafour.
Full Changelog: v0.9...v0.9.2
Hazm 0.9
Added
- Windows compaitiblity by using
Python-crfsuite
instead ofWapiti
. @E-Ghafour. - Pretrained
Chunker
andPOSTagger
models withPython-crfsuite
. @E-Ghafour. - new parameters in Normalizer to better text processing. @sir-kokabi.
- Three regex patterns in Normalizer to fix ZWNJs and spacing issues. @sir-kokabi.
- 400 Non-standard unicode characters to be replaced in
Normalizer
. @sir-kokabi. - 40,000+ new words to improve
Lemmatizer
andTokenizer
. @sir-kokabi. train
function forWord2vec
andSent2vec
modules inEmbedding
. @E-Ghafour.- Implement
keywordExtraction
with theembedRank
approach as a sample of Hazm usage. @E-Ghafour. - Support
Universal tags
inPOSTagger
. @E-Ghafour. - Support universal POS mapper in
PeykareReader
&DadeganReader
(#239). @phsfr. PersianPlainTextReader
to process raw text datasets (#120). @mhbashari.- Support
EZ
tag inPeykareReader
. @E-Ghafour. - Slash & back-slash (/ ) support in
Tokenizer
(#102). @elahimanesh. Conjugation
class to handle verb conjugation. @sir-kokabi.
Fixed
- Improve the accuracy of
POSTagger
andChunker
. @E-Ghafour. - Improve
InformalNormalizer
#219. @riasati. - Fix pep8 issues. (#135). @hadifar.
- Fix Some tests issues. @sir-kokabi @E-Ghafour.
- Fix
Stemmer
issues with multiple suffixes. @sir-kokabi. - Fix various reported issues
Changed
- Drop Python 2 support and migrate all code to Python 3. @sir-kokabi.
- Use
data_maker
function instead ofpatterns
inSequenceTagger
. @E-Ghafour. - Refactor
IOBTagger
andPOSTagger
to be compatible withdata_maker
. @E_Ghafour. - Change می روم to میروم in example (#203). @SMSadegh19.
- Overhaul the project structure and GitHub repo. @sir-kokabi.
Full Changelog: v0.8.2...v0.9
Hazm 0.8
Release notes:
- Add WordEmbedding (Download the pre-trained model(Fasttext) from here)
- Add SentenceEmbedding (Download the pre-trained model from here)
- Add Documentation webpage (link)
- Improve normalizer, informal normalizer, and tokenizer
- Add Degarbayan and MirasText corpus reader
What's Changed
- fixed MAGHSURAH Y bug in normalizer by @mavahedinia in #116
- change list to set in stopwords_list method to remove duplicate stop … by @Azdy-dev in #175
- Add Degarbayan interface by @maanijou in #176
- fix endless loop in python3 by @mohamad-qodosi in #186
- Update README.md by @edalatfard in #187
- Fix self.words in WordTokenizer by @SinRas in #190
- Fix some tokenization issues by @behnam-sa in #199
- Modifying the extra space and newline removal patterns by @asdoost in #200
- improvement of InformalNormalizer by @riasati in #214
- add some rules to InformalNormalizer by @riasati in #215
- Embedding by @E-Ghafour in #229
- Embedding by @imani in #230
New Contributors
- @mavahedinia made their first contribution in #116
- @Azdy-dev made their first contribution in #175
- @maanijou made their first contribution in #176
- @mohamad-qodosi made their first contribution in #186
- @edalatfard made their first contribution in #187
- @SinRas made their first contribution in #190
- @behnam-sa made their first contribution in #199
- @asdoost made their first contribution in #200
- @riasati made their first contribution in #214
- @E-Ghafour made their first contribution in #229
- @imani made their first contribution in #230
Full Changelog: v0.7...v0.8.2