-
Notifications
You must be signed in to change notification settings - Fork 50
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Jdom/plugin/thai #510
Jdom/plugin/thai #510
Conversation
is_word_char = re.match(pattern, word) is not None | ||
is_end_of_sentence = word in language.regexp_split_sentences | ||
if is_end_of_sentence: | ||
is_word_char = False |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just wanted to leave a comment about this logic here,
In Thai the "period" punctuation is ฯ
and in my tests, the parser was showing it as a word and I figured it didn't make sense for punctuation to be words so that is why I have is_word_char
set to False
if the text is a sentence delimiter
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Right, sometimes there maybe funny cases like this where you have to hardcode it.
Looks great, thanks! |
Released to pypi woot. |
No description provided.