Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

修复ViterbiSegment分词器中加载自定义词典时未替换DoubleArrayTrie导致分词不符合预期的问题 #1835

Merged
merged 1 commit into from
Aug 13, 2023

Conversation

wxy929629
Copy link

@wxy929629 wxy929629 commented Aug 11, 2023

修复ViterbiSegment分词器中加载自定义词典时未替换DoubleArrayTrie导致分词不符合预期的问题

Description

ViterbiSegment加载自定义词典时未正确替换DoubleArrayTrie, 导致应该被切分出的词条未被切分

Fixes # (issue)

Type of Change

Please check any relevant options and delete the rest.

  • Bug fix (non-breaking change which fixes an issue)
  • Breaking change (fix or feature that would cause existing functionality to not work as expected)
  • New feature (non-breaking change which adds functionality)
  • This change requires a documentation update

How Has This Been Tested?

com/hankcs/hanlp/seg/SegmentTest.java

    public void testExtendViterbi() throws Exception
    {
        HanLP.Config.enableDebug(false);
        String path = System.getProperty("user.dir") + "/" + "data/dictionary/custom/CustomDictionary.txt;" +
            System.getProperty("user.dir") + "/" + "data/dictionary/custom/全国地名大全.txt";
        path = path.replace("\\", "/");
        String text = "一半天帕克斯曼是走不出丁字桥镇的";
        Segment segment = HanLP.newSegment().enableCustomDictionary(false);
        Segment seg = new ViterbiSegment(path);
        System.out.println("不启用字典的分词结果:" + segment.seg(text));
        System.out.println("默认分词结果:" + HanLP.segment(text));
        seg.enableCustomDictionaryForcing(true).enableCustomDictionary(true);
        List<Term> termList = seg.seg(text);
        System.out.println("自定义字典的分词结果:" + termList);
    }

image

Checklist

Check all items that apply.

  • ⚠️Changes must be made on dev branch instead of master
  • I have added tests that prove my fix is effective or that my feature works
  • New and existing unit tests pass locally with my changes
  • My code follows the style guidelines of this project
  • I have commented my code, particularly in hard-to-understand areas
  • I have made corresponding changes to the documentation
  • My changes generate no new warnings
  • I have checked my code and corrected any misspellings

@hankcs
Copy link
Owner

hankcs commented Aug 13, 2023

感谢pr!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants