Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Re-enable filter editor suggestions #13376

Merged
merged 13 commits into from
Aug 25, 2017

Conversation

lukasolson
Copy link
Member

@lukasolson lukasolson commented Aug 7, 2017

Replaces #13286.
Fixes #12692.

This PR re-enables filter editor suggestions by default, but also adds additional parameters to the request to safeguard against long-running queries caused by the filter editor suggestions requests.

@lukasolson lukasolson self-assigned this Aug 7, 2017
@lukasolson
Copy link
Member Author

/cc @alexfrancoeur

@alexfrancoeur
Copy link

@lukasolson I see the advanced option for filterEditor:suggestValues is now set to true but I can't seem to find the missing additional parameters to modify the safeguard. Was that part of the PR or are we just providing a default and not allowing it to be configurable?

I have about a little over 3 million events using makelogs. If I do a unique count of _id it matches the total number of documents. So this filter is working through ~3m strings.

Initial request payload: {"field":"_id"}
Response: ["-0-bx10BmxeOLc-l1n-3","-0-bx10BmxeOLc-l1n63","-0-bx10BmxeOLc-l1oC3","-0-bx10BmxeOLc-l1oG3","-0-bx10BmxeOLc-l1oK3","-0-bx10BmxeOLc-l1oO3","-0-bx10BmxeOLc-l1oS3","-0-bx10BmxeOLc-l1oW3","-0-bx10BmxeOLc-l1oa3","-0-bx10BmxeOLc-l1oe3"]
Network time (mostly TTFB): 613.35ms

screen shot 2017-08-09 at 11 44 04 am

I know you discussed using terminate_after but am unsure of what it's set to. Looks like 10 here, am I right? We may need to increase a bit.

Probably unrelated, but I found a few bugs. Let me know if you'd prefer a new issue (my guess is yes, but I figured I'd try 😄 )

Special characters don't seem to be working correctly
screen shot 2017-08-09 at 10 52 29 am

No values in filter editor
screen shot 2017-08-09 at 10 53 08 am

Incorrect(?) filters in query DSL
screen shot 2017-08-09 at 10 53 24 am

Also do we need to surface .raw separately? It'd be nice to automatically provide suggestions regardless. If users don't want to use them, they can type in their own input.
screen shot 2017-08-09 at 10 52 44 am

Are we rendering the HTML? Look at headings.raw and how it's presented in the structured filter
screen shot 2017-08-09 at 11 36 11 am

@alexfrancoeur
Copy link

I tested this on cloud with 5.5.0 using a different unique ID but similar number of documents and similar string size. Two requests were made at around 16s each.

screen shot 2017-08-09 at 12 28 16 pm

Not the best apples to apples comparison but there are certainly noticeable performance improvements.

@lukasolson
Copy link
Member Author

I can't seem to find the missing additional parameters to modify the safeguard. Was that part of the PR or are we just providing a default and not allowing it to be configurable?

I've decided to stick with a simple enable/disable for now. If we get requests coming in to be able to tweak these options, we can look into it further, but I think this is simple yet effective.

I know you discussed using terminate_after but am unsure of what it's set to. Looks like 10 here, am I right? We may need to increase a bit.

Currently, it is being set to 100000. The number of results coming back in the request is 10 (which is the default). We can increase this if needed, but I think 10 should be fine for now, seeing as how you can type more and get more results.

Special characters don't seem to be working correctly.

Yeah, this is a bit tricky and I've noticed this before, but I can't find an issue for it. Feel free to open one.

Also do we need to surface .raw separately? It'd be nice to automatically provide suggestions regardless. If users don't want to use them, they can type in their own input.

We talked about this in the original PR, but decided to punt on it. Can you open an issue for this as well?

No values in filter editor
Incorrect(?) filters in query DSL

Can you elaborate more on how to reproduce those issues?

Thanks for the thorough feedback!

Copy link
Contributor

@Bargs Bargs left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Current code looks good.

A couple questions about new params:

I was wondering if you'd looked into Trevan's comment here #12692 (comment). It's interesting that the execution_hint hurt performance. We might want to get Adrien's thoughts on that before merging, if you haven't already pinged him about it.

I was also wondering how terminate_after might affect the results on high cardinality fields. So I reduced terminate_after to 1 with my makelogs dataset to simulate a similar ratio of terminate_after to hits that someone with millions of docs would have when using a default of 100,000. When I start typing in the prefix of an existing _id, I get no suggestions:

screen shot 2017-08-09 at 5 11 27 pm

That's a bit worrisome. If that's how it works, someone with a lot of docs is going to get empty suggestions pretty often. Adrien had mentioned that execution_hint: "map" would tell the regexp to evaluate lazily on the collected terms, so I thought maybe that was the problem. I removed that param, but still ran into the same problem. I think it'll be pretty confusing if a user gets no suggestions when he/she is typing in the start of a value that he/she can plainly see in the doc table.

Copy link
Member

@weltenwort weltenwort left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, it looks as if the regex is executed after the samples from the buckets are fetched. If that was not the case, the performance would probably be pretty terrible. But that also has the effect of not giving expected results when the diversity of values is high. Setting the terminate_after to 1 completely throws of the results though, so I am not sure it is an appropriate simplification of the "simulation".

Have you investigated using timeout instead of terminate_after? It might make for a better user experience (and an easier-to-understand advanced setting) to just time out after something like 1s.

Additionally, I noticed the values are not being escaped before insertion into the regexp. This can confuse users and leads to hidden "bad request" errors when the user enters something that is not produced by the regex grammar.

I am wondering about the reasoning behind the usage of map as an execution_hint. Could someone explain that to me?

In any case displaying a loading indicator (e.g. a grayed out loading... in the dropdown) during the request and a hint that the list of suggestions has been truncated due to performance reasons could be helpful.

@lukasolson
Copy link
Member Author

@weltenwort @Bargs Should be ready for review again.

@lukasolson
Copy link
Member Author

@trevan Would you mind taking a look at this PR and seeing if it is an acceptable solution to #12692?

Copy link
Member

@weltenwort weltenwort left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It feels a lot more responsive than before and the spinner is a nice touch!

One thing that doesn't quite work as expected is terms containing whitespace. As soon as I enter a character after a space (e.g. Mozilla/5.0 (), the suggestions come back empty. Escaping the space character as well seems to fix that for the query_string query.

Another thing is that suggestions for the _index and _type fields do not yield the expected suggestions. The former actually results in an error like [query_shard_exception] Can only use prefix queries on keyword and text fields - not on [_index] which is of type [_index] while the latter just does not return any results.

Since we are displaying suggestions only for aggregatable fields anyway, we could consider using the prefix term-level query, which would make the escaping unnecessary. It would still need special treatment for _id and potentially other meta fields though.

.replace(/\*/g, '\\*')
.replace(/\?/g, '\\?')
.replace(/\:/g, '\\:')
.replace(/\//g, '\\/');
Copy link
Member

@weltenwort weltenwort Aug 23, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about shortening this a bit into one regexp, e.g.

query.replace(/[\\+\-=&|><!(){}[\]^"~*?:/ ]/g, (match) => `\\${match}`)

.replace(/\{/g, '\\{')
.replace(/\}/g, '\\}')
.replace(/\[/g, '\\[')
.replace(/\[/g, '\\]')
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should be /\]/g

@Bargs
Copy link
Contributor

Bargs commented Aug 23, 2017

we could consider using the prefix term-level query

I'd try out match_phrase_prefix as well https://www.elastic.co/guide/en/elasticsearch/reference/current/query-dsl-match-query-phrase-prefix.html

I definitely think we should avoid the query string query unless it's absolutely necessary for some reason.

@Bargs
Copy link
Contributor

Bargs commented Aug 23, 2017

Just a thought for the future: if we knew when fields had both analyzed an non-analyzed versions, we could always search against the analyzed version and do terms against the non-analyzed. That's a neat advantage of using search instead of the "include" param on the terms agg since searching the analyzed version will return more matches.

suggestions: { terms }
suggestions: {
terms: { field }
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey @jpountz, do you see any gotchas with how we're using timeout here? We played around with terminate_after as you suggested. It helped a lot, but picking a value that would work for all users felt like a shot in the dark. timeout is nice because it allows us to control the user experience more explicitly (we can say "suggestions should take no longer than a second"). But I wanted to check with you, is there any reason why we shouldn't rely on timeout in this way?

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is a good idea! For the record, we made timeouts better recently: elastic/elasticsearch#25776.

@lukasolson
Copy link
Member Author

@weltenwort @Bargs @jpountz I've made the changes we talked about this morning. Please take another look!

Copy link

@jpountz jpountz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. I just left minor suggestions.

terms: {
field,
include: `${getEscapedQuery(query)}.*`,
execution_hint: 'map'
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since it's not obvious, I think it's worth adding a comment that the map execution hint helps ensure that the regexp is not evaluated eagerly against the terms dictionary. And then the terminate_after helps keep the number of buckets that need to be tracked at the shard level contained in case this is a high-cardinality field.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

also since we do not care about the accuracy of the counts, we could set the shard_sizeto 10 explicitly in order to reduce the amount of information that needs to be transmitted from shards to the coordinating node

Copy link
Contributor

@Bargs Bargs left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Pending @weltenwort's lgtm, let's try to get this merged tomorrow so it goes into beta2.

Copy link
Member

@weltenwort weltenwort left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looks good, except that entering a < character without a matching > results in a bad request ([parsing_exception] [terms] failed to parse field [include]). I guess we should escape the characters used for the optional features as well (# @ & < > ~).

@weltenwort
Copy link
Member

Feels like we should cover the escaping by an api test in case ES adds additional characters that require escaping in the future.

@trevan
Copy link
Contributor

trevan commented Aug 24, 2017

@lukasolson, you changed the query since you asked me to test. Do you want me to test with the current query or wait some more?

@lukasolson
Copy link
Member Author

@trevan Should be good to take a look now.

Copy link
Member

@weltenwort weltenwort left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM pending passing tests

@lukasolson lukasolson merged commit 19ac99a into elastic:master Aug 25, 2017
lukasolson added a commit that referenced this pull request Aug 25, 2017
* Re-enable filter editor suggestions

* Use search instead of include

* Escape query

* Show spinner

* Use include rather than search

* Add additional regex and explanation for parameters

* Add suggestions API test

* Make sure test actually runs

* Use send instead of query

* Fix suggestions API test
lukasolson added a commit that referenced this pull request Aug 25, 2017
* Re-enable filter editor suggestions

* Use search instead of include

* Escape query

* Show spinner

* Use include rather than search

* Add additional regex and explanation for parameters

* Add suggestions API test

* Make sure test actually runs

* Use send instead of query

* Fix suggestions API test
@lukasolson
Copy link
Member Author

6.x (6.1.0): 1625f64
6.0 (6.0.0): 043477e

@lukasolson lukasolson deleted the filterSuggestions branch March 27, 2018 21:09
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants