Search API: Expontial performance degradation and increase std. deviation when specifying "size" param greater than 999999 #5466

bobbyhubbard · 2014-03-19T19:31:33Z

(Now first of all, this scenario is ridiculous and quite unscientific in its method. Furthermore, this is not blocking us in any way shape or form but we found it interesting so I thought I would report it anyway.)

One of our clients has a use case where they want all search hits to be returned without pagination. Typically, in this case, results sets max out at around 300 documents. Since they want all results, and there is no ALL option for size, the developer choose to use 999999999 (9 9's) as the size. While silly, this is still well within the limits of an Integer in Java and was just meant to signify something like MAX_INT.

The result was a query that took on average between 2000 and 5000ms for 229 total hits. They reported this issue to us and we investigated. Now the interesting part - reducing the size parameter by a factor of 10 (remove one 9) showed a similar factor of 10 reduction in performance time. So 99999999 (8 9's) loads on average in 200 to 500ms. Reduce by another factor of 10 (7 9's) and it drops almost another factor of 10. At 7 9's loads times are between 40 and 150ms. This is still a much higher std deviation than typical for repeat search's for the exact same search...likely cached.

At 6 9's, the results are more in line with expectations and have a much smaller std deviation at 20-30ms on avg.

This test was done on an isolated 2 node cluster with no other activity on the system. The same query was executed for all tests with the only difference being the size param.

dadoonet · 2014-03-19T19:47:20Z

To extract data from elasticsearch you should use scan&scroll API.

Using size is not the way to go.

Closing. Feel free to reopen if I misunderstood your use case.

bobbyhubbard · 2014-03-19T20:42:52Z

Its a performance defect that someone could exploit.

My point is that some unsuspecting user typing a simple search query with a size >999999 could have significant performance impact on a cluster as response time seems to increase exponentially until MAX_INT at which point you get an index out of bounds from the json parser. Of course specifying a size that large is not optimal... but I'm not always the one writing the query.

I'm unable to reopen but if I could, i would. :)

nik9000 · 2014-03-19T20:54:16Z

It might be worth having a cluster wide max size that could be configured
to reject requests like this. It opens up a can of worms, too. Like, you
should probably excuse scan type queries.

On Wed, Mar 19, 2014 at 4:42 PM, bobbyhubbard [email protected]:

My point is that some unsuspecting user typing a simple search query with
a size >999999 could have significant performance impact on a cluster as
response time seems to increase exponentially until MAX_INT when you an
index out of bounds from the json parser. Of course specifying a size that
large is not optimal... but I'm not always the one writing the query. Its a
performance defect that someone could exploit.

Reply to this email directly or view it on GitHubhttps://github.com//issues/5466#issuecomment-38104150
.

dadoonet · 2014-03-19T21:02:39Z

I agreed that we should perhaps come with reasonable defaults (500?) which can be set.
What does others think?

Reopening.

uboness · 2014-03-20T08:23:47Z

It's not just about the json... There are many factors that play here (the priority queues that are responsible for the sorting and the size of the docs to name a couple). I agree that this should ideally be handled gracefully, probably by introducing another circuit breaker - and the response should be handled like all other CBs we have. It's a bit tricky though to figure out the proper thresholds for this CB... Will require some thought.

Btw, I'm not sure that the error belongs to the 500 range as this should not be perceived as a system error...

skade · 2014-09-23T16:57:10Z

I had a run-in with this bug today, where the maximum size was set to an "arbitrary high value" (9999999) because the number of potential responses was small (<100) and it was meant to retrieve "all". That query degraded to a point where it brought down the whole cluster node by node. That's a naive approach I see from time to time. In the end, the query retrieved only < 100kb payload.
The query was sent to the type-specific end-point /*index*/*type*.

The response time grew with the number of documents in the whole index, not showing itself on small datasets (dev), less on slightly larger (stage) and disastrous (largest time was 18m) on the live system. In Prod, it lead to sudden allocations of huge chunks of heap that would immediately be collected by a stop the world collection of multiple seconds, leading to nodes dropping out of the cluster and the search queue exploding. It seems like there is something leading to such a query to visit more documents then necessary. I'll try to build a testcase.

While this is misuse, I would expect Elasticsearch to handle such cases more gracefully. Also, the behaviour of "size" should be better documented.

clintongormley · 2014-09-23T17:02:21Z

Closing in favour of #4026

dadoonet closed this as completed Mar 19, 2014

dadoonet reopened this Mar 19, 2014

clintongormley closed this as completed Sep 23, 2014

bobbyhubbard mentioned this issue Sep 23, 2014

Bound the number of search results returned by elasticsearch #4026

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Search API: Expontial performance degradation and increase std. deviation when specifying "size" param greater than 999999 #5466

Search API: Expontial performance degradation and increase std. deviation when specifying "size" param greater than 999999 #5466

bobbyhubbard commented Mar 19, 2014

dadoonet commented Mar 19, 2014

bobbyhubbard commented Mar 19, 2014

nik9000 commented Mar 19, 2014

dadoonet commented Mar 19, 2014

uboness commented Mar 20, 2014

skade commented Sep 23, 2014

clintongormley commented Sep 23, 2014

Search API: Expontial performance degradation and increase std. deviation when specifying "size" param greater than 999999 #5466

Search API: Expontial performance degradation and increase std. deviation when specifying "size" param greater than 999999 #5466

Comments

bobbyhubbard commented Mar 19, 2014

dadoonet commented Mar 19, 2014

bobbyhubbard commented Mar 19, 2014

nik9000 commented Mar 19, 2014

dadoonet commented Mar 19, 2014

uboness commented Mar 20, 2014

skade commented Sep 23, 2014

clintongormley commented Sep 23, 2014