Unusual CPU/Memory Usage while Highlighting #3128

lmenezes · 2013-06-03T12:35:20Z

So, trying to run this on ES 0.90 results in an OOM...
Its quite an unusual query I agree, but still the behavior seems wrong.

curl -XPOST http://localhost:9200/test_hl -d '{ "index": { "number_of_shards": "1", "number_of_replicas": "0", "analysis": { "filter": { "wordDelimiter": { "type": "word_delimiter", "split_on_numerics": "false", "generate_word_parts": "true", "generate_number_parts": "true", "catenate_words": "true", "catenate_numbers": "true", "catenate_all": "false" } }, "analyzer": { "custom_analyzer": { "tokenizer": "whitespace", "filter": [ "lowercase", "wordDelimiter" ] } } } } }'

curl -XPUT http://localhost:9200/test_hl/profile/_mapping -d '{ "profile": { "dynamic": "strict", "properties": { "id": { "type": "integer", "index": "not_analyzed", "store": "yes" }, "content": { "type": "string", "index_analyzer": "custom_analyzer", "search_analyzer": "custom_analyzer", "store": "yes", "term_vector": "with_positions_offsets" } } } }'

curl -XPUT http://localhost:9200/test_hl/profile/2 -d '{"content": "Test: http://www.facebook.com http://elasticsearch.org http://xing.com http://cnn.com http://quora.com http://twitter.com this is a test for highlighting feature Test: http://www.facebook.com http://elasticsearch.org http://xing.com http://cnn.com http://quora.com http://twitter.com this is a test for highlighting feature", "id": 2}'

curl -XGET http://localhost:9200/test_hl/profile/_search -d '{ "from": 0, "size": 10, "query": { "match": { "content": { "query": "Test: http://www.facebook.com http://elasticsearch.org http://xing.com http://cnn.com http://quora.com http://twitter.com this is a test for highlighting feature Test: http://www.facebook.com http://elasticsearch.org http://xing.com http://cnn.com http://quora.com http://twitter.com this is a test for highlighting feature", "type": "phrase" } } }, "highlight": { "fields": { "content": { "fragment_size": 50, "number_of_fragments": 5 } } } }'

The text was updated successfully, but these errors were encountered:

kroepke · 2013-06-04T12:42:18Z

I've been looking at this problem and it basically consumes all its time in org.apache.lucene.search.vectorhighlight.FieldQuery() trying to expand the flat queries in to PhraseQueries for later matching it against the Hits.

Put another way: It's extremely easy to break the fast vector highlighter by asking it to highlight the exact field content, provided that the query terms and stored field terms are the same. In this case the implementation degenerates to a cartesian product of the terms, leading to memory exhaustion.

Moral of the story, beware of automatically generated search queries that come from field contents when also highlighting.
Maybe a fail safe mode could be added somewhere, stopping the iteration during construction of the FieldQuery object to prevent it bringing down the cluster in seconds.

s1monw · 2013-06-05T15:42:20Z

This in-fact a bug in Lucene or maybe a really bad situation. not sure if it's a bug per-se. Yet, the problem is that due to the settings "generate_number_parts": "true", "catenate_words": "true", this creates a MultiPhraseQuery (multiple token on the same position) and if this one gets too long it kind runs in a very expensive loop. Setting those to "false" makes it fast and it seems to produce the right thing too I will look into this more in detail in lucene land

kroepke · 2013-06-05T15:58:25Z

This definitely is a corner case, I agree, unfortunately our system sometimes generates queries like this based on user action, and these are tricky to filter out (although we will try to, of course).
This specific example will generate 4095 queries internally when constructing the FieldQuery object, and so far I haven't quite found a way to have it bail out earlier. Needless to say you run out of memory rather quickly when this happens.
I agree it's not really a bug, but rather a unfortunate worst-case scenario for the highlighter, but it has very dire consequences once you hit it.
Let me know if you need anything more from us.

s1monw · 2013-06-05T18:27:38Z

Yeah I know it's tricky though. I wonder if you can just not use phrase for higlighting here. I mean it's pretty intense though. if you use generate_number_parts": "false", "catenate_words": "false" you will not have the problems. you might want to just do this for the search here as a search analyzer or so? I think we will go forward and add a cutoff for this to make sure it doesn't blow up but rather return a not so correct highlighthing result ;), what do you think?

Currently if MPQ is very large highlighing can take down a node or cause high CPU / RAM consumption. If the query grows > 16 terms we just extract the terms and do term by term highlighting. Closes elastic#3142 elastic#3128

s1monw · 2013-06-06T09:24:52Z

I got a workaround for this in this patch. I think this is the only reasonable thing to-do here really. The highlights will be different if you get the crazy query but it should return quickly. Let me know what you think?

lmenezes · 2013-06-06T09:36:04Z

Fine for me. I mean, I wonder if it even makes sense highlighting this kind of query. I was actually more concerned with cluster stability rather then "correct highlighting" behavior.
Will this make into 0.90.2?

synhershko · 2013-06-06T10:05:16Z

@lmenezes It does - think about phrase queries where each word passes through a SynonymFilter. You still want to be able to highlight such queries. We were seeing similar issues, thx @s1monw for the fix, I'll test it early next week

lmenezes · 2013-06-06T10:09:27Z

@synhershko I just meant for the case where the query is actually the exact content of an indexed document field.

s1monw · 2013-06-06T10:11:10Z

I think it makes sense to highlight MultiPhraseQueries, I am just concerened about crazy ones that are somewhat generated. I am also all for preventing cluster stability problems and give somewhat bad highlighting instead. I will push this soon

synhershko · 2013-06-06T10:24:04Z

Speaking of cluster stability problems, and sorry for hijacking the thread, the most imminent problem today is the lack of a timeout concept in Lucene. Once ES hands over control to any Lucene component, you are at risk of that thread running forever. This makes ES timeouts be on a "best effort basis" to quote Shay's words, and in practice timeouts are almost never enforced.

This is basically what happened with this FVH issue, but can also happen with complex searches, and we are seeing nodes come unresponsive after a while in our cluster, presumably because of such issues

If only Lucene could respect timeouts, that would be great

/cc @bleskes

Currently if MPQ is very large highlighing can take down a node or cause high CPU / RAM consumption. If the query grows > 16 terms we just extract the terms and do term by term highlighting. Closes #3142 #3128

s1monw · 2013-06-06T10:49:17Z

If only Lucene could respect timeouts, that would be great

when you think about it, how would you implement it? this is a really tough problem and during searches lucene respects it. Each search with a timeout checks if it has timed out after each collected document so this is pretty accurate. Not sure what you are referring to here.

@lmenezes I pushed the fix it will be in 0.90.2

synhershko · 2013-06-06T10:53:37Z

Let me get back to you with more concrete details

clintongormley · 2013-06-06T11:18:53Z

@synhershko Rather move the timeout conversation to issue #3129

Currently if MPQ is very large highlighing can take down a node or cause high CPU / RAM consumption. If the query grows > 16 terms we just extract the terms and do term by term highlighting. Closes elastic#3142 elastic#3128 Conflicts: src/test/java/org/elasticsearch/test/integration/search/highlight/HighlighterSearchTests.java

Currently if MPQ is very large highlighing can take down a node or cause high CPU / RAM consumption. If the query grows > 16 terms we just extract the terms and do term by term highlighting. Closes elastic#3142 elastic#3128

lmenezes mentioned this issue Jun 4, 2013

Timeout is not respected #3129

Closed

ghost assigned s1monw Jun 5, 2013

lmenezes closed this as completed Jun 6, 2013

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Unusual CPU/Memory Usage while Highlighting #3128

Unusual CPU/Memory Usage while Highlighting #3128

lmenezes commented Jun 3, 2013

kroepke commented Jun 4, 2013

s1monw commented Jun 5, 2013

kroepke commented Jun 5, 2013

s1monw commented Jun 5, 2013

s1monw commented Jun 6, 2013

lmenezes commented Jun 6, 2013

synhershko commented Jun 6, 2013

lmenezes commented Jun 6, 2013

s1monw commented Jun 6, 2013

synhershko commented Jun 6, 2013

s1monw commented Jun 6, 2013

synhershko commented Jun 6, 2013

clintongormley commented Jun 6, 2013

Unusual CPU/Memory Usage while Highlighting #3128

Unusual CPU/Memory Usage while Highlighting #3128

Comments

lmenezes commented Jun 3, 2013

kroepke commented Jun 4, 2013

s1monw commented Jun 5, 2013

kroepke commented Jun 5, 2013

s1monw commented Jun 5, 2013

s1monw commented Jun 6, 2013

lmenezes commented Jun 6, 2013

synhershko commented Jun 6, 2013

lmenezes commented Jun 6, 2013

s1monw commented Jun 6, 2013

synhershko commented Jun 6, 2013

s1monw commented Jun 6, 2013

synhershko commented Jun 6, 2013

clintongormley commented Jun 6, 2013