Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Unified Search] Value autocomplete without field name specified #193608

Open
flash1293 opened this issue Sep 20, 2024 · 8 comments
Open

[Unified Search] Value autocomplete without field name specified #193608

flash1293 opened this issue Sep 20, 2024 · 8 comments
Labels
enhancement New value added to drive a business result Feature:Unified search Unified search related tasks Project:OneDiscover Enrich Discover with contextual awareness Team:DataDiscovery Discover, search (e.g. data plugin and KQL), data views, saved searches. For ES|QL, use Team:ES|QL. Team:obs-ux-logs Observability Logs User Experience Team Team:Visualizations Visualization editors, elastic-charts and infrastructure

Comments

@flash1293
Copy link
Contributor

The value-autocomplete functionality in the unified search bar for KQL is super helpful to search for more complex values like host names and similar:
Image

However, a big downside is that it requires the user to know the field name to get autocomplete - if they only know the prefix of the value they are search for, it will be difficult to search effectively for it. In this case, the flow currently looks like this:

  • Do a fieldless search for the prefix with *
  • Look through the results to see which fields match and search for the one they meant to search for
  • Refine the search accordingly

This manual process can be automated in the following way:
In case the user types in a string in the KQL bar without specifying a field name, search for

"query_string": {
  "query": "<typed string>*"
}

and return the first 100 documents. Within Kibana, search through all fields which have the typed string as a prefix. Use the matched fields as suggestions, suggesting both field name and value to search for. Add them to the regular suggestions in the dropdown, below the regular field suggestions:

CleanShot.2024-09-17.at.15.49.05.mp4

Considerations

Suggestions need to return quickly - while the search on all fields for a small amount of documents is more expensive than looking up regular autocomplete values for a single specified field, in tests with medium sized clusters the response time was still acceptable. Two measures can help:

  • Show regular field suggestions immediately while the request is running in the background, then extend the suggestions once the data comes in - this makes it a net-add that's not making any existing functionality harder to use
  • Make sure the query returns quickly even with partial results - in tests with realistic datasets, even a small number of matching documents already brought a lot of value

As these requests would be sent quite often while the user is typing, testing needs to be performed to gauge the additional load this would cause on the cluster. Using strategies like throttling and debouncing, it should be possible to fine tune the feature to balance churn and user experience.

@flash1293 flash1293 added the Team:DataDiscovery Discover, search (e.g. data plugin and KQL), data views, saved searches. For ES|QL, use Team:ES|QL. label Sep 20, 2024
@elasticmachine
Copy link
Contributor

Pinging @elastic/kibana-data-discovery (Team:DataDiscovery)

@flash1293 flash1293 added the enhancement New value added to drive a business result label Sep 20, 2024
@LucaWintergerst
Copy link
Contributor

Ideally we'd also get back a somewhat random set of documents, not necessarily the first 100 as they might have very similar values
We should look into random scoring, but only if we can apply this only to a subset of the documents

I.e. look at 1000 docs, randomly sort them, then return 100 of them
Otherwise we might get a lot of very similar documents that don't have as many unique values.

@kertal kertal added the Project:OneDiscover Enrich Discover with contextual awareness label Sep 23, 2024
@davismcphee
Copy link
Contributor

@lukasolson Where you likely have the most autocomplete experience on our team, just wondering if you have any thoughts on this one? There's more context in the latest One Discover sync recording around 25:00.

@lukasolson
Copy link
Member

Right now, the default method for value suggestions is the _terms_enum API (you can toggle between this and a terms aggregation in the _search API in the advanced settings). This API was developed in tandem with the Elasticsearch team to offer us the performance we wanted in addition to being able to filter by things like time range, data tier, etc.

Before switching to this API we had many reports of very large clusters being taken down completely solely because of autocomplete - granted we were doing aggregations that required visiting each shard. This might not be the case if we're just doing a simple query and returning the first n documents, but we need to be sure to do extensive testing with large clusters (see #46054, #58749, #12692).

Also, what if the first n documents don't contain the specific value you're looking for? This is another common report for value suggestions when we hit the terminate_after or timeout and the value isn't included in the list we show. It seems like this would be much worse when we don't even provide the field.

In general I think there are a lot of tweaks to get right in this sort of implementation. Given that we are focusing on ES|QL I would be very hesitant to prioritize this work.

@kertal kertal added Feature:Unified search Unified search related tasks Team:Visualizations Visualization editors, elastic-charts and infrastructure labels Oct 8, 2024
@elasticmachine
Copy link
Contributor

Pinging @elastic/kibana-visualizations (Team:Visualizations)

@flash1293
Copy link
Contributor Author

Thanks for the context @lukasolson - I agree that it won't be trivial to get this experience right.

but we need to be sure to do extensive testing with large clusters

Absolutely, as this will be called all the time in the background, we need to make sure it's fast and not too taxing on the cluster. As you mention, as we only search and don't do aggregations it seems to be possible to shape it in a way that it brings value and still fits the constraints.

This is another common report for value suggestions when we hit the terminate_after or timeout and the value isn't included in the list we show

I think this was mostly about cases where we show the value suggestions as a list to pick a value from - I agree that the quality of the data won't be good enough for that, but as a suggestion in autocomplete where we don't show any suggestion right now I think it would still bring a lot of value.

Given that we are focusing on ES|QL I would be very hesitant to prioritize this work.

The same kind of feature could also be valuable for ESQL - it would just be called in a different context.

@lukasolson
Copy link
Member

A couple of follow-up questions - is this something only desired for observability, or are you suggesting it be generic & included in the search bar by default?

Would we show some sort of message that the suggestions are not all-inclusive to avoid confusion for when the desired value isn't in the matching 100 documents?

@kertal kertal added the Team:obs-ux-logs Observability Logs User Experience Team label Nov 5, 2024
@elasticmachine
Copy link
Contributor

Pinging @elastic/obs-ux-logs-team (Team:obs-ux-logs)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New value added to drive a business result Feature:Unified search Unified search related tasks Project:OneDiscover Enrich Discover with contextual awareness Team:DataDiscovery Discover, search (e.g. data plugin and KQL), data views, saved searches. For ES|QL, use Team:ES|QL. Team:obs-ux-logs Observability Logs User Experience Team Team:Visualizations Visualization editors, elastic-charts and infrastructure
Projects
None yet
Development

No branches or pull requests

6 participants