-
Notifications
You must be signed in to change notification settings - Fork 8.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Re-enable filter editor suggestions #13376
Conversation
/cc @alexfrancoeur |
@lukasolson I see the advanced option for I have about a little over 3 million events using makelogs. If I do a unique count of Initial request payload: I know you discussed using Probably unrelated, but I found a few bugs. Let me know if you'd prefer a new issue (my guess is yes, but I figured I'd try 😄 ) Special characters don't seem to be working correctly Incorrect(?) filters in query DSL Also do we need to surface Are we rendering the HTML? Look at headings.raw and how it's presented in the structured filter |
I've decided to stick with a simple enable/disable for now. If we get requests coming in to be able to tweak these options, we can look into it further, but I think this is simple yet effective.
Currently, it is being set to 100000. The number of results coming back in the request is 10 (which is the default). We can increase this if needed, but I think 10 should be fine for now, seeing as how you can type more and get more results.
Yeah, this is a bit tricky and I've noticed this before, but I can't find an issue for it. Feel free to open one.
We talked about this in the original PR, but decided to punt on it. Can you open an issue for this as well?
Can you elaborate more on how to reproduce those issues? Thanks for the thorough feedback! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Current code looks good.
A couple questions about new params:
I was wondering if you'd looked into Trevan's comment here #12692 (comment). It's interesting that the execution_hint hurt performance. We might want to get Adrien's thoughts on that before merging, if you haven't already pinged him about it.
I was also wondering how terminate_after
might affect the results on high cardinality fields. So I reduced terminate_after
to 1 with my makelogs dataset to simulate a similar ratio of terminate_after to hits that someone with millions of docs would have when using a default of 100,000. When I start typing in the prefix of an existing _id
, I get no suggestions:
That's a bit worrisome. If that's how it works, someone with a lot of docs is going to get empty suggestions pretty often. Adrien had mentioned that execution_hint: "map"
would tell the regexp to evaluate lazily on the collected terms, so I thought maybe that was the problem. I removed that param, but still ran into the same problem. I think it'll be pretty confusing if a user gets no suggestions when he/she is typing in the start of a value that he/she can plainly see in the doc table.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, it looks as if the regex is executed after the samples from the buckets are fetched. If that was not the case, the performance would probably be pretty terrible. But that also has the effect of not giving expected results when the diversity of values is high. Setting the terminate_after
to 1 completely throws of the results though, so I am not sure it is an appropriate simplification of the "simulation".
Have you investigated using timeout
instead of terminate_after
? It might make for a better user experience (and an easier-to-understand advanced setting) to just time out after something like 1s.
Additionally, I noticed the values are not being escaped before insertion into the regexp. This can confuse users and leads to hidden "bad request" errors when the user enters something that is not produced by the regex grammar.
I am wondering about the reasoning behind the usage of map
as an execution_hint
. Could someone explain that to me?
In any case displaying a loading indicator (e.g. a grayed out loading...
in the dropdown) during the request and a hint that the list of suggestions has been truncated due to performance reasons could be helpful.
@weltenwort @Bargs Should be ready for review again. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It feels a lot more responsive than before and the spinner is a nice touch!
One thing that doesn't quite work as expected is terms containing whitespace. As soon as I enter a character after a space (e.g. Mozilla/5.0 (
), the suggestions come back empty. Escaping the space character as well seems to fix that for the query_string
query.
Another thing is that suggestions for the _index
and _type
fields do not yield the expected suggestions. The former actually results in an error like [query_shard_exception] Can only use prefix queries on keyword and text fields - not on [_index] which is of type [_index]
while the latter just does not return any results.
Since we are displaying suggestions only for aggregatable fields anyway, we could consider using the prefix
term-level query, which would make the escaping unnecessary. It would still need special treatment for _id
and potentially other meta fields though.
.replace(/\*/g, '\\*') | ||
.replace(/\?/g, '\\?') | ||
.replace(/\:/g, '\\:') | ||
.replace(/\//g, '\\/'); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How about shortening this a bit into one regexp, e.g.
query.replace(/[\\+\-=&|><!(){}[\]^"~*?:/ ]/g, (match) => `\\${match}`)
.replace(/\{/g, '\\{') | ||
.replace(/\}/g, '\\}') | ||
.replace(/\[/g, '\\[') | ||
.replace(/\[/g, '\\]') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should be /\]/g
I'd try out match_phrase_prefix as well https://www.elastic.co/guide/en/elasticsearch/reference/current/query-dsl-match-query-phrase-prefix.html I definitely think we should avoid the query string query unless it's absolutely necessary for some reason. |
Just a thought for the future: if we knew when fields had both analyzed an non-analyzed versions, we could always search against the analyzed version and do terms against the non-analyzed. That's a neat advantage of using search instead of the "include" param on the terms agg since searching the analyzed version will return more matches. |
suggestions: { terms } | ||
suggestions: { | ||
terms: { field } | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hey @jpountz, do you see any gotchas with how we're using timeout
here? We played around with terminate_after
as you suggested. It helped a lot, but picking a value that would work for all users felt like a shot in the dark. timeout
is nice because it allows us to control the user experience more explicitly (we can say "suggestions should take no longer than a second"). But I wanted to check with you, is there any reason why we shouldn't rely on timeout
in this way?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this is a good idea! For the record, we made timeouts better recently: elastic/elasticsearch#25776.
@weltenwort @Bargs @jpountz I've made the changes we talked about this morning. Please take another look! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. I just left minor suggestions.
terms: { | ||
field, | ||
include: `${getEscapedQuery(query)}.*`, | ||
execution_hint: 'map' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since it's not obvious, I think it's worth adding a comment that the map
execution hint helps ensure that the regexp is not evaluated eagerly against the terms dictionary. And then the terminate_after
helps keep the number of buckets that need to be tracked at the shard level contained in case this is a high-cardinality field.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
also since we do not care about the accuracy of the counts, we could set the shard_size
to 10 explicitly in order to reduce the amount of information that needs to be transmitted from shards to the coordinating node
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
Pending @weltenwort's lgtm, let's try to get this merged tomorrow so it goes into beta2.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
looks good, except that entering a <
character without a matching >
results in a bad request ([parsing_exception] [terms] failed to parse field [include]
). I guess we should escape the characters used for the optional features as well (# @ & < > ~
).
Feels like we should cover the escaping by an api test in case ES adds additional characters that require escaping in the future. |
@lukasolson, you changed the query since you asked me to test. Do you want me to test with the current query or wait some more? |
@trevan Should be good to take a look now. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM pending passing tests
* Re-enable filter editor suggestions * Use search instead of include * Escape query * Show spinner * Use include rather than search * Add additional regex and explanation for parameters * Add suggestions API test * Make sure test actually runs * Use send instead of query * Fix suggestions API test
* Re-enable filter editor suggestions * Use search instead of include * Escape query * Show spinner * Use include rather than search * Add additional regex and explanation for parameters * Add suggestions API test * Make sure test actually runs * Use send instead of query * Fix suggestions API test
Replaces #13286.
Fixes #12692.
This PR re-enables filter editor suggestions by default, but also adds additional parameters to the request to safeguard against long-running queries caused by the filter editor suggestions requests.