max_ngram_diff problem in _analyze api #56586

aninda052 · 2020-05-12T11:25:11Z

Elasticsearch 7.6.2

I'm trying to test a analyzer using _analyze api . In my filter i use 'ngram' with 'min_gram' = 3 and 'max_gram' = 8 , AS "The difference between max_gram and min_gram in NGram Tokenizer must be less than or equal to 1 " i can't use ngram with my desire setting . I can not use "max_ngram_diff" in _analyze api . Is there any way i can test my analyzer ?

my analyzer "
{

"tokenizer": "standard",

"filter": [
"lowercase",
{
"type" : "ngram",
"min_gram": 3,
"max_gram": 8,
"token_chars": [ "letter", "digit" ]

		  },
		  {
		  	"type" : "stop",
		  	"stopwords" : "_english_",
		  	"ignore_case": true


		  }
  	],

"text": "test text"
}

Elasticsearch 7.6.2

The text was updated successfully, but these errors were encountered:

albertzaharovits · 2020-05-12T19:46:16Z

@aninda052 Thank you for raising this to our attention.

I've identified two issues from your description:

There is a documentation problem with the analyzer request parameter. It can only take a String value, but not an object. Hence the correctly formatted request from the above must be:

curl -X GET "localhost:9200/_analyze?pretty&error_trace=true" -H 'Content-Type: application/json' -d'
{
  "tokenizer": {
    "type": "ngram",
    "min_gram": 3,
    "max_gram": 5,
    "token_chars": [
        "letter",
        "digit"
    ]
  },
  "filter": [
    "lowercase"
  ],
  "text" : "Quick Brown Foxes!"
}
'

There is an obscure ngram slack parameter for analyze requests without an index, which triggers exceptions such as:

"[The difference between max_gram and min_gram in NGram Tokenizer must be less than or equal to: [1] but was [5]. This limit can be set by changing the [index.max_ngram_diff] index level setting.]; nested: IllegalArgumentException[The difference between max_gram and min_gram in NGram Tokenizer must be less than or equal to: [1] but was [5]. This limit can be set by changing the [index.max_ngram_diff] index level setting.];at org.elasticsearch.ElasticsearchException.guessRootCauses(ElasticsearchException.java:644)
at org.elasticsearch.ElasticsearchException.generateFailureXContent(ElasticsearchException.java:572)
at org.elasticsearch.rest.BytesRestResponse.build(BytesRestResponse.java:138)
at org.elasticsearch.rest.BytesRestResponse.<init>(BytesRestResponse.java:96)
at org.elasticsearch.rest.BytesRestResponse.<init>(BytesRestResponse.java:91)
at org.elasticsearch.rest.action.RestActionListener.onFailure(RestActionListener.java:58)
at org.elasticsearch.action.support.TransportAction$1.onFailure(TransportAction.java:79)
at org.elasticsearch.action.support.single.shard.TransportSingleShardAction$AsyncSingleAction$1.handleException(TransportSingleShardAction.java:197)
at org.elasticsearch.transport.TransportService$ContextRestoreResponseHandler.handleException(TransportService.java:1130)
at org.elasticsearch.transport.TransportService$DirectResponseChannel.processException(TransportService.java:1239)
at org.elasticsearch.transport.TransportService$DirectResponseChannel.sendResponse(TransportService.java:1213)
at org.elasticsearch.transport.TaskTransportChannel.sendResponse(TaskTransportChannel.java:60)
at org.elasticsearch.action.support.ChannelActionListener.onFailure(ChannelActionListener.java:56)
at org.elasticsearch.action.ActionRunnable.onFailure(ActionRunnable.java:88)
at org.elasticsearch.common.util.concurrent.ThreadContext$ContextPreservingAbstractRunnable.onFailure(ThreadContext.java:680)
at org.elasticsearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:39)
at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1130)
at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:630)
at java.base/java.lang.Thread.run(Thread.java:832)\nCaused by: java.lang.IllegalArgumentException: The difference between max_gram and min_gram in NGram Tokenizer must be less than or equal to: [1] but was [5]. This limit can be set by changing the [index.max_ngram_diff] index level setting.
at org.elasticsearch.analysis.common.NGramTokenizerFactory.<init>(NGramTokenizerFactory.java:119)
at org.elasticsearch.index.analysis.AnalysisRegistry.getComponentFactory(AnalysisRegistry.java:131)
at org.elasticsearch.index.analysis.AnalysisRegistry.buildCustomAnalyzer(AnalysisRegistry.java:229)
at org.elasticsearch.action.admin.indices.analyze.TransportAnalyzeAction.buildCustomAnalyzer(TransportAnalyzeAction.java:200)
at org.elasticsearch.action.admin.indices.analyze.TransportAnalyzeAction.analyze(TransportAnalyzeAction.java:131)
at org.elasticsearch.action.admin.indices.analyze.TransportAnalyzeAction.shardOperation(TransportAnalyzeAction.java:121)
at org.elasticsearch.action.admin.indices.analyze.TransportAnalyzeAction.shardOperation(TransportAnalyzeAction.java:73)
at org.elasticsearch.action.support.single.shard.TransportSingleShardAction.lambda$asyncShardOperation$0(TransportSingleShardAction.java:110)
at org.elasticsearch.action.ActionRunnable.lambda$supply$0(ActionRunnable.java:58)
at org.elasticsearch.action.ActionRunnable$2.doRun(ActionRunnable.java:73)
at org.elasticsearch.common.util.concurrent.ThreadContext$ContextPreservingAbstractRunnable.doRun(ThreadContext.java:692)
at org.elasticsearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:37)

Can someone from @elastic/es-search investigate further, please?

jtibshirani · 2020-05-12T22:32:05Z

I opened #56650 to address the docs issue that @albertzaharovits found.

As for the ngram diff issue, here's a suggested workaround:

Create a temporary test index with the setting index.max_ngram_diff increased to a larger number.
Run the analyze API against that index, as in GET /my-test-index/_analyze ....

To me this is a reasonable limitation of the analyze API, and doesn't seem like a bug. I don't think we would want to add the ability to pass index settings when testing out a transient custom analysis chain.

aninda052 · 2020-05-13T08:07:28Z

thanks for the suggesting @jtibshirani :)

aninda052 added >bug needs:triage Requires assignment of a team area label labels May 12, 2020

albertzaharovits added :Search Relevance/Analysis How text is split into tokens and removed needs:triage Requires assignment of a team area label labels May 12, 2020

elasticmachine added the Team:Search Meta label for search team label May 12, 2020

jtibshirani mentioned this issue May 12, 2020

Correct the type of the 'analyzer' parameter in the _analyze docs. #56650

Merged

aninda052 closed this as completed May 13, 2020

javanna added Team:Search Relevance Meta label for the Search Relevance team in Elasticsearch and removed Team:Search Meta label for search team labels Jul 12, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

max_ngram_diff problem in _analyze api #56586

max_ngram_diff problem in _analyze api #56586

aninda052 commented May 12, 2020 •

edited

Loading

albertzaharovits commented May 12, 2020 •

edited

Loading

jtibshirani commented May 12, 2020

aninda052 commented May 13, 2020

max_ngram_diff problem in _analyze api #56586

max_ngram_diff problem in _analyze api #56586

Comments

aninda052 commented May 12, 2020 • edited Loading

albertzaharovits commented May 12, 2020 • edited Loading

jtibshirani commented May 12, 2020

aninda052 commented May 13, 2020

aninda052 commented May 12, 2020 •

edited

Loading

albertzaharovits commented May 12, 2020 •

edited

Loading