Releases: hankcs/HanLP
v1.8.5 常规维护
What's Changed
- 修复mini二元文法在JRE初始化后第一次分词可能出现的不一致 fix: #1851 (comment)
- 修复ViterbiSegment分词器中加载自定义词典时未替换DoubleArrayTrie导致分词不符合预期的问题 by @wxy929629 in #1835
- fix:修复CWSEvaluator比较切分语句时的计算错误 by @webSue in #1853
- 数据包兼容data-for-1.7.5.zip
md5=1d9e1be4378b2dbc635858d9c3517aaa
- Portable版同步升级到v1.8.5
<dependency>
<groupId>com.hankcs</groupId>
<artifactId>hanlp</artifactId>
<version>portable-1.8.5</version>
</dependency>
New Contributors
- @wxy929629 made their first contribution in #1835
Full Changelog: v1.8.4...v1.8.5
v2.1.0-beta.62 Routine Release
What's Changed
- Release mMiniLMv2L12 version of MTL on UD210
- Release a small MTL model trained on our new corpora
- Multi-process compatible loader
- Support new versions of tensorflow and numpy
- Add support for Python 3.10
- Implementation of "Graph Pre-training for AMR Parsing and Generation"
- Let PipeLine support copy() by @Vela-zz in #1861
New Contributors
Full Changelog: v2.1.0-beta.0...v2.1.0-beta.62
v1.8.4 常规维护
- 将<>视作分隔符 fix https://bbs.hankcs.com/t/topic/4527
- Segment 添加是否进行 Normalize 的配置方法 close #1714
- 修复文本推荐的评分器分数计算时 scorer.boost 的 bug fix: #1718
- bugfix: 修复 bintrie 树全分词时 提前跳出循环 bug by @carl10086 in #1775
- 自定义词典支持.tsv格式 fix: #1785
- 修复自定义词典路径传参 fix: #1799
- 为DoubleArrayTrie增加enableFastBuild by @qiangwang in #1805
- 数据包兼容data-for-1.7.5.zip
md5=1d9e1be4378b2dbc635858d9c3517aaa
- Portable版同步升级到v1.8.4
<dependency>
<groupId>com.hankcs</groupId>
<artifactId>hanlp</artifactId>
<version>portable-1.8.4</version>
</dependency>
🎉感谢所有在issue中提出宝贵建议的用户!
New Contributors
- @carl10086 made their first contribution in #1775
- @qiangwang made their first contribution in #1805
Full Changelog: v1.8.3...v1.8.4
v1.8.3 常规维护
- 修复动态自定义词典与CustomDictionaryForcing的搭配问题 fix #1712
- 调整
莎=sha1,suo1
fix #1670 - 根据总词频动态决定未登录词的默认词频
- DoubleArrayTrie里的LongestSearcher的next支持null作为值 by @tiandiweizun in #1674
- Update DoubleArrayTrie.java的注释 by @TITC in #1699
- 数据包兼容data-for-1.7.5.zip
md5=1d9e1be4378b2dbc635858d9c3517aaa
- Portable版同步升级到v1.8.3
<dependency>
<groupId>com.hankcs</groupId>
<artifactId>hanlp</artifactId>
<version>portable-1.8.3</version>
</dependency>
Full Changelog: v1.8.2...v1.8.3
New Contributors
🎉感谢所有在issue中提出宝贵建议的用户!
v2.1.0-beta 104 languages, 10 tasks, dual backends
We are proud to announce the beta release of HanLP 2.1, which now offers 10 joint tasks on 104 languages: tokenization, lemmatization, part-of-speech tagging, token feature extraction, dependency parsing, constituency parsing, semantic role labeling, semantic dependency parsing, abstract meaning representation (AMR) parsing.
v1.8.2 常规维护与准确率提升
- 调整公式,维特比分词准确率从94.49提升至94.69 https://bbs.hankcs.com/t/topic/136/61?u=hankcs
- 改进 HMM 采样函数 https://bbs.hankcs.com/t/topic/136/64?u=hankcs
- 支持禁用自动刷新词典缓存(CustomDictionaryAutoRefreshCache=false)fix #1655
- 修复CoreDictionary的reload方法
- 修订bigram模型
- 修订简繁映射表
- lve4的韵母修正为ve fix #1644
- 修复 CustomDictionary.reload() fix #1635
- 数据包兼容data-for-1.7.5.zip
md5=1d9e1be4378b2dbc635858d9c3517aaa
- Portable版同步升级到v1.8.2
<dependency>
<groupId>com.hankcs</groupId>
<artifactId>hanlp</artifactId>
<version>portable-1.8.2</version>
</dependency>
🎉感谢所有在issue中提出宝贵建议的用户!
v1.8.1 常规维护与修复
- 修复 convertToPinyinList fix #1634
- 修复CharTable 归一化部分字符错误 fix #1615
- 数据包兼容data-for-1.7.5.zip
md5=1d9e1be4378b2dbc635858d9c3517aaa
- Portable版同步升级到v1.8.1
<dependency>
<groupId>com.hankcs</groupId>
<artifactId>hanlp</artifactId>
<version>portable-1.8.1</version>
</dependency>
🎉感谢所有在issue中提出宝贵建议的用户!
v1.8.0 支持多实例、补充字符集
- 重构CustomDictionary,支持多实例 #1339
- 支持𩽾𩾌(ān kāng)之类的补充字符集 fix #1564
- 修复 CoreStopWordDictionary.dictionary.clear() fix #1603
- 双数组trie树防止传入空白key导致无法转移状态 fix https://bbs.hankcs.com/t/dat/3196/8
- 新增热更新方法 CoreDictionary.reload() fix #1594
- 新增 KBeamArcEagerDependencyParser(String modelPath, String cwsModelPath, String posModelPath) fix #1585
- Fix Sentence.create on compound word consisting of single word
- HiddenMarkovModel构造时备份参数 fix #1530
- 数据包兼容data-for-1.7.5.zip
md5=1d9e1be4378b2dbc635858d9c3517aaa
- Portable版同步升级到v1.8.0
<dependency>
<groupId>com.hankcs</groupId>
<artifactId>hanlp</artifactId>
<version>portable-1.8.0</version>
</dependency>
🎉感谢所有在issue中提出宝贵建议的用户!
v2.1.0-alpha 104 languages, 10 tasks, dual backends
We are proud to announce the release of HanLP 2.1, which now offers 10 joint tasks on 104 languages: tokenization, lemmatization, part-of-speech tagging, token feature extraction, dependency parsing, constituency parsing, semantic role labeling, semantic dependency parsing, abstract meaning representation (AMR) parsing.
v1.7.8 常规维护
- CharType使用IOAdapter fix #1480
- portable文件补全
- 加入自定义词条“雄安”
- 数据包兼容data-for-1.7.5.zip
md5=1d9e1be4378b2dbc635858d9c3517aaa
- Portable版同步升级到v1.7.8
<dependency>
<groupId>com.hankcs</groupId>
<artifactId>hanlp</artifactId>
<version>portable-1.7.8</version>
</dependency>
🎉感谢所有在issue中提出宝贵建议的用户!