SOLR-17346: Synchronise stopwords from snowball with those in lucene #2533
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
https://issues.apache.org/jira/browse/SOLR-17346
Description
Solr's default configset comes with a collection of sample stopwords from the snowball project, There is a similar list of stopwords in the lucene repository, however these have been updated to a more recent list of snowball.
Specifically, the most recent list of stopwords for the french language has removed a number of words which are homonyms of other useful words which shouldn't be skipped.
Solution
Copy the stopword files from the snowball project from lucene to solr.
I only copied files that were present in https://github.com/apache/lucene/tree/main/lucene/analysis/common/src/resources/org/apache/lucene/analysis/snowball and only if the version of this file in solr was also from the snowball project (e.g. the english and indonesian stopwords files in solr aren't from snowball, so I didn't copy them from lucene even though they existed there).
Tests
build solr with
./gradlew dev
start solr and create a new core
verify that the expected files were coped to the new core
verify that the core starts up
Checklist
Please review the following and check all that apply:
main
branch../gradlew check
.