-
Notifications
You must be signed in to change notification settings - Fork 174
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
German stemmer doesn't match schlummert/schlummern or grüßend/gegrüßt/grüßen #139
Comments
Looks like I didn't explicitly repeat the advice from #91 here, but to achieve what you ask for we would need a way to remove these suffixes (or prefix in the case of ge-) that doesn't negatively affect words that happen to end in The website does actually already note
For example, you want
The last two are nouns so should be capitalised in text, but the current expectation is that input it lower-cased before being fed to the stemmer so we can't use the capitalisation as a clue. Potentially that could be changed, but doing so would be somewhat disruptive for users of the stemmers so it's not a simple change to make. It would also need to deal with words which aren't nouns being capitalised at the start of a sentence, in titles, etc. A solution doesn't have to be perfect, it just needs to not be harmful to other cases, so if there's a rule we can use to identify a significant number of cases where Removing |
Hello,
I'm using Snowball via Elasticsearch, which is based on Lucene. The Snowball German stemming is not matching some common forms:
Original Lucene bug was here: https://issues.apache.org/jira/browse/LUCENE-9410?page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel&focusedCommentId=17217670#comment-17217670
The text was updated successfully, but these errors were encountered: