Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Unusual CPU/Memory Usage while Highlighting #3128

Closed
lmenezes opened this issue Jun 3, 2013 · 13 comments
Closed

Unusual CPU/Memory Usage while Highlighting #3128

lmenezes opened this issue Jun 3, 2013 · 13 comments
Assignees

Comments

@lmenezes
Copy link
Contributor

lmenezes commented Jun 3, 2013

So, trying to run this on ES 0.90 results in an OOM...
Its quite an unusual query I agree, but still the behavior seems wrong.

curl -XPOST http://localhost:9200/test_hl -d '{ "index": { "number_of_shards": "1", "number_of_replicas": "0", "analysis": { "filter": { "wordDelimiter": { "type": "word_delimiter", "split_on_numerics": "false", "generate_word_parts": "true", "generate_number_parts": "true", "catenate_words": "true", "catenate_numbers": "true", "catenate_all": "false" } }, "analyzer": { "custom_analyzer": { "tokenizer": "whitespace", "filter": [ "lowercase", "wordDelimiter" ] } } } } }'

curl -XPUT http://localhost:9200/test_hl/profile/_mapping -d '{ "profile": { "dynamic": "strict", "properties": { "id": { "type": "integer", "index": "not_analyzed", "store": "yes" }, "content": { "type": "string", "index_analyzer": "custom_analyzer", "search_analyzer": "custom_analyzer", "store": "yes", "term_vector": "with_positions_offsets" } } } }'

curl -XPUT http://localhost:9200/test_hl/profile/2 -d '{"content": "Test: http://www.facebook.com http://elasticsearch.org http://xing.com http://cnn.com http://quora.com http://twitter.com this is a test for highlighting feature Test: http://www.facebook.com http://elasticsearch.org http://xing.com http://cnn.com http://quora.com http://twitter.com this is a test for highlighting feature", "id": 2}'

curl -XGET http://localhost:9200/test_hl/profile/_search -d '{ "from": 0, "size": 10, "query": { "match": { "content": { "query": "Test: http://www.facebook.com http://elasticsearch.org http://xing.com http://cnn.com http://quora.com http://twitter.com this is a test for highlighting feature Test: http://www.facebook.com http://elasticsearch.org http://xing.com http://cnn.com http://quora.com http://twitter.com this is a test for highlighting feature", "type": "phrase" } } }, "highlight": { "fields": { "content": { "fragment_size": 50, "number_of_fragments": 5 } } } }'
@kroepke
Copy link

kroepke commented Jun 4, 2013

I've been looking at this problem and it basically consumes all its time in org.apache.lucene.search.vectorhighlight.FieldQuery() trying to expand the flat queries in to PhraseQueries for later matching it against the Hits.

Put another way: It's extremely easy to break the fast vector highlighter by asking it to highlight the exact field content, provided that the query terms and stored field terms are the same. In this case the implementation degenerates to a cartesian product of the terms, leading to memory exhaustion.

Moral of the story, beware of automatically generated search queries that come from field contents when also highlighting.
Maybe a fail safe mode could be added somewhere, stopping the iteration during construction of the FieldQuery object to prevent it bringing down the cluster in seconds.

@ghost ghost assigned s1monw Jun 5, 2013
@s1monw
Copy link
Contributor

s1monw commented Jun 5, 2013

This in-fact a bug in Lucene or maybe a really bad situation. not sure if it's a bug per-se. Yet, the problem is that due to the settings "generate_number_parts": "true", "catenate_words": "true", this creates a MultiPhraseQuery (multiple token on the same position) and if this one gets too long it kind runs in a very expensive loop. Setting those to "false" makes it fast and it seems to produce the right thing too I will look into this more in detail in lucene land

@kroepke
Copy link

kroepke commented Jun 5, 2013

This definitely is a corner case, I agree, unfortunately our system sometimes generates queries like this based on user action, and these are tricky to filter out (although we will try to, of course).
This specific example will generate 4095 queries internally when constructing the FieldQuery object, and so far I haven't quite found a way to have it bail out earlier. Needless to say you run out of memory rather quickly when this happens.
I agree it's not really a bug, but rather a unfortunate worst-case scenario for the highlighter, but it has very dire consequences once you hit it.
Let me know if you need anything more from us.

@s1monw
Copy link
Contributor

s1monw commented Jun 5, 2013

Yeah I know it's tricky though. I wonder if you can just not use phrase for higlighting here. I mean it's pretty intense though. if you use generate_number_parts": "false", "catenate_words": "false" you will not have the problems. you might want to just do this for the search here as a search analyzer or so? I think we will go forward and add a cutoff for this to make sure it doesn't blow up but rather return a not so correct highlighthing result ;), what do you think?

s1monw added a commit to s1monw/elasticsearch that referenced this issue Jun 6, 2013
Currently if MPQ is very large highlighing can take down a node
or cause high CPU / RAM consumption. If the query grows > 16 terms
we just extract the terms and do term by term highlighting.

Closes  elastic#3142 elastic#3128
@s1monw
Copy link
Contributor

s1monw commented Jun 6, 2013

I got a workaround for this in this patch. I think this is the only reasonable thing to-do here really. The highlights will be different if you get the crazy query but it should return quickly. Let me know what you think?

@lmenezes
Copy link
Contributor Author

lmenezes commented Jun 6, 2013

Fine for me. I mean, I wonder if it even makes sense highlighting this kind of query. I was actually more concerned with cluster stability rather then "correct highlighting" behavior.
Will this make into 0.90.2?

@lmenezes lmenezes closed this as completed Jun 6, 2013
@synhershko
Copy link
Contributor

@lmenezes It does - think about phrase queries where each word passes through a SynonymFilter. You still want to be able to highlight such queries. We were seeing similar issues, thx @s1monw for the fix, I'll test it early next week

@lmenezes
Copy link
Contributor Author

lmenezes commented Jun 6, 2013

@synhershko I just meant for the case where the query is actually the exact content of an indexed document field.

@s1monw
Copy link
Contributor

s1monw commented Jun 6, 2013

I think it makes sense to highlight MultiPhraseQueries, I am just concerened about crazy ones that are somewhat generated. I am also all for preventing cluster stability problems and give somewhat bad highlighting instead. I will push this soon

@synhershko
Copy link
Contributor

Speaking of cluster stability problems, and sorry for hijacking the thread, the most imminent problem today is the lack of a timeout concept in Lucene. Once ES hands over control to any Lucene component, you are at risk of that thread running forever. This makes ES timeouts be on a "best effort basis" to quote Shay's words, and in practice timeouts are almost never enforced.

This is basically what happened with this FVH issue, but can also happen with complex searches, and we are seeing nodes come unresponsive after a while in our cluster, presumably because of such issues

If only Lucene could respect timeouts, that would be great

/cc @bleskes

s1monw added a commit that referenced this issue Jun 6, 2013
Currently if MPQ is very large highlighing can take down a node
or cause high CPU / RAM consumption. If the query grows > 16 terms
we just extract the terms and do term by term highlighting.

Closes  #3142 #3128
@s1monw
Copy link
Contributor

s1monw commented Jun 6, 2013

If only Lucene could respect timeouts, that would be great

when you think about it, how would you implement it? this is a really tough problem and during searches lucene respects it. Each search with a timeout checks if it has timed out after each collected document so this is pretty accurate. Not sure what you are referring to here.

@lmenezes I pushed the fix it will be in 0.90.2

@synhershko
Copy link
Contributor

Let me get back to you with more concrete details

@clintongormley
Copy link
Contributor

@synhershko Rather move the timeout conversation to issue #3129

synhershko pushed a commit to synhershko/elasticsearch that referenced this issue Sep 2, 2013
Currently if MPQ is very large highlighing can take down a node
or cause high CPU / RAM consumption. If the query grows > 16 terms
we just extract the terms and do term by term highlighting.

Closes  elastic#3142 elastic#3128

Conflicts:
	src/test/java/org/elasticsearch/test/integration/search/highlight/HighlighterSearchTests.java
mute pushed a commit to mute/elasticsearch that referenced this issue Jul 29, 2015
Currently if MPQ is very large highlighing can take down a node
or cause high CPU / RAM consumption. If the query grows > 16 terms
we just extract the terms and do term by term highlighting.

Closes  elastic#3142 elastic#3128
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants