Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Throw Error on deprecated nGram and edgeNGram custom filters #50376

Merged
merged 3 commits into from
Dec 20, 2019

Conversation

cbuescher
Copy link
Member

The camel-case nGram and edgeNGram filter names were deprecated in 6. We
currently throw errors on new indices when they are used. However these errors
are currently only thrown for pre-configured filters, adding them as custom
filters doesn't trigger the warning and error. This change adds the appropriate
exceptions for nGram and edgeNGram respectively.

Closes #50360

@cbuescher cbuescher added >bug :Search Relevance/Analysis How text is split into tokens v7.6.0 labels Dec 19, 2019
@elasticmachine
Copy link
Collaborator

Pinging @elastic/es-search (:Search/Analysis)

@@ -242,7 +244,17 @@
filters.put("dictionary_decompounder", requiresAnalysisSettings(DictionaryCompoundWordTokenFilterFactory::new));
filters.put("dutch_stem", DutchStemTokenFilterFactory::new);
filters.put("edge_ngram", EdgeNGramTokenFilterFactory::new);
filters.put("edgeNGram", EdgeNGramTokenFilterFactory::new);
filters.put("edgeNGram", (IndexSettings indexSettings, Environment environment, String name, Settings settings) -> {
if (indexSettings.getIndexVersionCreated().onOrAfter(org.elasticsearch.Version.V_7_6_0)) {
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unfortunately we can only error for these cases starting with 7.6 now since there might already be existing indices <7.6 that we don't want to break on minor version upgrade. Those should at least be logging a deprecation now.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you think we should wait until 8 because this is technically breaking? I mean, using them relies on a bug, but still.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We already removed the PreConfiguredTokenFilter registration for "nGram" and "edgeNGram" in 8 (using those should have thrown errors since 7.0), but I see we still register them in getTokenFilters(). Still we clearly documented the deprecation and removal and I think we don't mention those variants in the docs for a long time now. I think disallowing new index creation using those in 7.6 should be okay, wdyt?
I think for 8.0 this means we cannot remove the registration for these names just yet and need to forward-port this error/deprecation handling there as well unfortunately.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I just don't want someone who does a minor upgrade to have their daily indices not get created any more.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sound like we should start logging the deprecation warning with 7.6 and throw an error on 8.0 for this then. That would only apply for this type of custom filter use left out by the original deprecation back in #30209.
I will update the PR and see if I can change the target branch so we add this on master and then backport to 7.x

@@ -242,7 +244,17 @@
filters.put("dictionary_decompounder", requiresAnalysisSettings(DictionaryCompoundWordTokenFilterFactory::new));
filters.put("dutch_stem", DutchStemTokenFilterFactory::new);
filters.put("edge_ngram", EdgeNGramTokenFilterFactory::new);
filters.put("edgeNGram", EdgeNGramTokenFilterFactory::new);
filters.put("edgeNGram", (IndexSettings indexSettings, Environment environment, String name, Settings settings) -> {
if (indexSettings.getIndexVersionCreated().onOrAfter(org.elasticsearch.Version.V_7_6_0)) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you think we should wait until 8 because this is technically breaking? I mean, using them relies on a bug, but still.

Christoph Büscher added 2 commits December 20, 2019 11:53
The camel-case `nGram` and `edgeNGram` filter names were deprecated in 6. We
currently throw errors on new indices when they are used. However these errors
are currently only thrown for pre-configured filters, adding them as custom
filters doesn't trigger the warning and error. This change adds the appropriate
exceptions for `nGram` and `edgeNGram` respectively.

Closes elastic#50360
@cbuescher cbuescher changed the base branch from 7.x to master December 20, 2019 15:13
@cbuescher
Copy link
Member Author

cbuescher commented Dec 20, 2019

@nik9000 moved the PR to target master, the deprecation part would be backported to 7.6.0.

Copy link
Member

@nik9000 nik9000 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Sorry for all the trouble.

@cbuescher
Copy link
Member Author

@elasticmachine update branch

@cbuescher cbuescher merged commit c6f7166 into elastic:master Dec 20, 2019
cbuescher pushed a commit to cbuescher/elasticsearch that referenced this pull request Dec 20, 2019
The camel-case `nGram` and `edgeNGram` filter names were deprecated in 6. We
currently throw errors on new indices when they are used. However these errors
are currently only thrown for pre-configured filters, adding them as custom
filters doesn't trigger the warning and error. This change adds the appropriate
deprecation warnings for `nGram` and `edgeNGram` respectively on version 7
indices.

Relates elastic#50360
cbuescher pushed a commit that referenced this pull request Dec 20, 2019
The camel-case `nGram` and `edgeNGram` filter names were deprecated in 6. We
currently throw errors on new indices when they are used. However these errors
are currently only thrown for pre-configured filters, adding them as custom
filters doesn't trigger the warning and error. This change adds the appropriate
deprecation warnings for `nGram` and `edgeNGram` respectively on version 7
indices.

Relates #50360
SivagurunathanV pushed a commit to SivagurunathanV/elasticsearch that referenced this pull request Jan 23, 2020
…#50376)

The camel-case `nGram` and `edgeNGram` filter names were deprecated in 6. We
currently throw errors on new indices when they are used. However these errors
are currently only thrown for pre-configured filters, adding them as custom
filters doesn't trigger the warning and error. This change adds the appropriate
exceptions for `nGram` and `edgeNGram` respectively.

Closes elastic#50360
javanna added a commit to javanna/elasticsearch that referenced this pull request Sep 17, 2024
edgeNGram and NGram tokenizers and token filters were deprecated. They have not been supported in indices created from 8.0,
hence their support can entirely be removed from main.

The version related logic around the min grams can also be removed as it refers to 7.x which we no longer need to support.

Relates to elastic#50376, elastic#50862, elastic#43568
javanna added a commit that referenced this pull request Sep 18, 2024
edgeNGram and NGram tokenizers and token filters were deprecated. They have not been supported in indices created from 8.0,
hence their support can entirely be removed from main.

The version related logic around the min grams can also be removed as it refers to 7.x which we no longer need to support.

Relates to #50376, #50862, #43568
javanna added a commit to javanna/elasticsearch that referenced this pull request Sep 18, 2024
…c#113009)

edgeNGram and NGram tokenizers and token filters were deprecated. They have not been supported in indices created from 8.0,
hence their support can entirely be removed from main.

The version related logic around the min grams can also be removed as it refers to 7.x which we no longer need to support.

Relates to elastic#50376, elastic#50862, elastic#43568
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

nGram and edgeNGram names don't throw an error with custom filters on ES 7+
4 participants