Releases: hankcs/HanLP
v1.8.4 常规维护
- 将<>视作分隔符 fix https://bbs.hankcs.com/t/topic/4527
- Segment 添加是否进行 Normalize 的配置方法 close #1714
- 修复文本推荐的评分器分数计算时 scorer.boost 的 bug fix: #1718
- bugfix: 修复 bintrie 树全分词时 提前跳出循环 bug by @carl10086 in #1775
- 自定义词典支持.tsv格式 fix: #1785
- 修复自定义词典路径传参 fix: #1799
- 为DoubleArrayTrie增加enableFastBuild by @qiangwang in #1805
- 数据包兼容data-for-1.7.5.zip
md5=1d9e1be4378b2dbc635858d9c3517aaa
- Portable版同步升级到v1.8.4
<dependency>
<groupId>com.hankcs</groupId>
<artifactId>hanlp</artifactId>
<version>portable-1.8.4</version>
</dependency>
🎉感谢所有在issue中提出宝贵建议的用户!
New Contributors
- @carl10086 made their first contribution in #1775
- @qiangwang made their first contribution in #1805
Full Changelog: v1.8.3...v1.8.4
v1.8.3 常规维护
- 修复动态自定义词典与CustomDictionaryForcing的搭配问题 fix #1712
- 调整
莎=sha1,suo1
fix #1670 - 根据总词频动态决定未登录词的默认词频
- DoubleArrayTrie里的LongestSearcher的next支持null作为值 by @tiandiweizun in #1674
- Update DoubleArrayTrie.java的注释 by @TITC in #1699
- 数据包兼容data-for-1.7.5.zip
md5=1d9e1be4378b2dbc635858d9c3517aaa
- Portable版同步升级到v1.8.3
<dependency>
<groupId>com.hankcs</groupId>
<artifactId>hanlp</artifactId>
<version>portable-1.8.3</version>
</dependency>
Full Changelog: v1.8.2...v1.8.3
New Contributors
🎉感谢所有在issue中提出宝贵建议的用户!
v2.1.0-beta 104 languages, 10 tasks, dual backends
We are proud to announce the beta release of HanLP 2.1, which now offers 10 joint tasks on 104 languages: tokenization, lemmatization, part-of-speech tagging, token feature extraction, dependency parsing, constituency parsing, semantic role labeling, semantic dependency parsing, abstract meaning representation (AMR) parsing.
v1.8.2 常规维护与准确率提升
- 调整公式,维特比分词准确率从94.49提升至94.69 https://bbs.hankcs.com/t/topic/136/61?u=hankcs
- 改进 HMM 采样函数 https://bbs.hankcs.com/t/topic/136/64?u=hankcs
- 支持禁用自动刷新词典缓存(CustomDictionaryAutoRefreshCache=false)fix #1655
- 修复CoreDictionary的reload方法
- 修订bigram模型
- 修订简繁映射表
- lve4的韵母修正为ve fix #1644
- 修复 CustomDictionary.reload() fix #1635
- 数据包兼容data-for-1.7.5.zip
md5=1d9e1be4378b2dbc635858d9c3517aaa
- Portable版同步升级到v1.8.2
<dependency>
<groupId>com.hankcs</groupId>
<artifactId>hanlp</artifactId>
<version>portable-1.8.2</version>
</dependency>
🎉感谢所有在issue中提出宝贵建议的用户!
v1.8.1 常规维护与修复
- 修复 convertToPinyinList fix #1634
- 修复CharTable 归一化部分字符错误 fix #1615
- 数据包兼容data-for-1.7.5.zip
md5=1d9e1be4378b2dbc635858d9c3517aaa
- Portable版同步升级到v1.8.1
<dependency>
<groupId>com.hankcs</groupId>
<artifactId>hanlp</artifactId>
<version>portable-1.8.1</version>
</dependency>
🎉感谢所有在issue中提出宝贵建议的用户!
v1.8.0 支持多实例、补充字符集
- 重构CustomDictionary,支持多实例 #1339
- 支持𩽾𩾌(ān kāng)之类的补充字符集 fix #1564
- 修复 CoreStopWordDictionary.dictionary.clear() fix #1603
- 双数组trie树防止传入空白key导致无法转移状态 fix https://bbs.hankcs.com/t/dat/3196/8
- 新增热更新方法 CoreDictionary.reload() fix #1594
- 新增 KBeamArcEagerDependencyParser(String modelPath, String cwsModelPath, String posModelPath) fix #1585
- Fix Sentence.create on compound word consisting of single word
- HiddenMarkovModel构造时备份参数 fix #1530
- 数据包兼容data-for-1.7.5.zip
md5=1d9e1be4378b2dbc635858d9c3517aaa
- Portable版同步升级到v1.8.0
<dependency>
<groupId>com.hankcs</groupId>
<artifactId>hanlp</artifactId>
<version>portable-1.8.0</version>
</dependency>
🎉感谢所有在issue中提出宝贵建议的用户!
v2.1.0-alpha 104 languages, 10 tasks, dual backends
We are proud to announce the release of HanLP 2.1, which now offers 10 joint tasks on 104 languages: tokenization, lemmatization, part-of-speech tagging, token feature extraction, dependency parsing, constituency parsing, semantic role labeling, semantic dependency parsing, abstract meaning representation (AMR) parsing.
v1.7.8 常规维护
- CharType使用IOAdapter fix #1480
- portable文件补全
- 加入自定义词条“雄安”
- 数据包兼容data-for-1.7.5.zip
md5=1d9e1be4378b2dbc635858d9c3517aaa
- Portable版同步升级到v1.7.8
<dependency>
<groupId>com.hankcs</groupId>
<artifactId>hanlp</artifactId>
<version>portable-1.7.8</version>
</dependency>
🎉感谢所有在issue中提出宝贵建议的用户!
v1.7.7 常规维护、多项改进
- 改进原子切分 fix #1421
- 修复聚类数目大于文档数目时引发的异常 fix #1397
- 使用构造函数代替静态NERInstance.create,方便子类继承
- 去掉 幺=么 fix #1427
- CRFModel support getting all tags
- 修复 AbstractClassifier.enableProbability fix #1423
- 开放 CWSEvaluator.Result 内部成员 fix https://bbs.hankcs.com/t/topic/887
- 公开HMM的成员
- 数据包兼容data-for-1.7.5.zip
md5=1d9e1be4378b2dbc635858d9c3517aaa
- Portable版同步升级到v1.7.7
<dependency>
<groupId>com.hankcs</groupId>
<artifactId>hanlp</artifactId>
<version>portable-1.7.7</version>
</dependency>
🎉感谢所有在issue中提出宝贵建议的用户!
v2.0.0-alpha.0 NLP for the next decade
HanLP 2.0 embraces the state-of-the-art Natural Language Processing with Deep Learning and massive unlabeled corpora. Featuring updates are:
- Easy model building and serving with TensorFlow 2.0 and Keras.
- Multilingual Support.
- Tokenization, Part-of-Speech Tagging, Named Entity Recognition, Syntactic & Semantic Dependency Parsing, Document Classification via one unified interface.
Currently, HanLP 2.0 is in alpha stage with more killer features on the roadmap. For news and updates, join our forum.