-
Notifications
You must be signed in to change notification settings - Fork 24.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Consider support for an optional 'fallback query'. #51840
Comments
Pinging @elastic/es-search (:Search/Search) |
Some questions around this:
|
Presumably this rewrite is on the coordinating node once all results are returned rather than at a shard level as part of the initial search.
I assumed the rewrite would essentially be re-running the query as effectively |
I tried to clarify the example + description. As @cbuescher mentions, the idea is that the hits from |
I am wondering how can we ensure this, as potentially scores from fallback_query could be higher than scores from the original query? Or we are confident the scores from fallback_query are always lower? Potentially something relevant. We plant to implement compound query, that allows to combine scores from different queries. And one of the combining strategy is |
I'm not suggesting that the original scores from This part of the feature description may prove controversial or not be worth the complexity. We could certainly consider a simpler alternative like 'only run the fallback if there are too few results, and just return hits from the fallback query'. (However this alternative gives less of an advantage over just running multiple search requests). |
I think rather than one fallback query, it would be nice to be able to have an array of fallback queries. I think if there was only one, people would still end up issuing their own fallback queries if the first two didn't return enough results. If fallback_query did take an array, another thought is that query could support an array of queries, although maybe this would be too fundamental of a change or confusing. |
I've just accidentally found this issue when searching for "elasticsearch fallback query" as I just implemented this in my application (well, still a prototype) and thought about putting my 2 cents here. My use-case is similar to what @jtibshirani described - I trigger the second, "loosened" fallback query when the initial query brings no results. I could run fallback query to "backfill" the results from initial query if I get less results than requested, but:
That said I can imagine some people wanting to run the fallback query (or multiple queries) if they get less results than requested. Speaking of why it might be beneficial to add this functionality to Elasticsearch:
There are probably many cons of adding this functionality to ES that you are more aware of than I am, but I thought it might be worth sharing the pros with you. Speaking of possible implementation here are few ideas:
|
That's tricky because one would need to make sure that even the lowest score from first (initial) query is higher than largest score from second (fallback) one. |
It may be worth considering the logic for the 2 query adjustment strategies:
This issue is focused on strategy 1) and switching query choices based on a straight doc count (measuring recall). Rather than focusing purely on optimising recall we could consider some controls that help maintain precision with the introduction of sloppier queries. Picking a sweet spot along the |
One approach might be to send "sloppy" and "strict" queries as two separate searches in an msearch. The strict query is the one we pin our hopes on e.g. a phrase query and the sloppy query is the fallback we rely on if the strict query has no matches. The trick to avoiding wasted compute on the sloppy query is to introduce a new
This new query type would early-terminate the whole request if there was even one document matching the nested "if" query. To fail the responses from other shards too the allow_partial_search_results parameter should be set to false. This approach would allow one client request and also minimise the execution overhead of running the strict and sloppy queries. The client would use whichever response had the non-zero matches. While it may be seen as a disadvantage to have to define any aggregations twice (once in the sloppy search, once in the strict search) it is also an opportunity. A strict query might be certain of the subject matter and therefore want to use |
Interesting idea.
an idea with different aggregations seems neat.
But then, we would need to execute
|
It wouldn't run until completion though - it only needs to match 1 document and we abort the whole query, avoiding any further disk reads and collection. Throwing exceptions isn't particularly performant so we might need to avoid that way of exiting the search process early.
The challenge with that is users will want to provide a global total but internally we would be needing a shard-local limit to early-terminate any local collection. Maybe we should tie this into the canMatch phase by offering the option of an explicit "can_match" part of the
That sounds reasonable - we'd need a condition (probably a min doc_count), that has to be passed before the next approach is tried. The overall search latency might be worse though if there are multiple levels of fallback that need to be worked through in sequence. With my suggestion, based on the existing msearch, searches are parallelized so the strict and sloppy variations would run concurrently. |
I created a proof-of-concept change to msearch to support query relaxation. A new |
+1 I'm not convinced either. Furthermore I think that the actual decision tree that users will want will often be much more complex than just falling back to a query that can be known in advance. For instance I'm sure many users would like to run the query against a spell checker and return hits for the corrected query if it has more hits than the original query. This sort of logic is best left to the client side in my opinion. |
Pinging @elastic/es-search (Team:Search) |
Any news about this? Our use case involves the use of fuzzy search; specifically, there is an option to only use fuzzy if non-fuzzy searches produce 0 results (in which case we issue a second query with fuzzy enabled). Being able to do this in a single query would be neat! |
@nemphys no movement on this at the moment. The best way to do this still is via two search requests. |
Pinging @elastic/es-search-relevance (Team:Search Relevance) |
In some cases, it can helpful to fall back to a looser or 'expanded' query in case the original query matches too few results. This strategy avoids showing a search page with no or very few results.
While there are already ways that users can incorporate the results of an expanded query, we could consider adding direct support:
The
fallback_query
would only be run if the original query returned zero (or too few) results. The hits fromfallback_query
would always be listed after those of the originalquery
.The advantages over issuing a single query that contains both the original and the fallback (boosted less highly than the original):
The advantages over issuing multiple search requests:
query
only returns a small number of results. In this case the fallback results would always appear after the original small result set.Although the idea of a fallback query has come up in discuss forums and our own conversations around query expansion, I'm really not convinced that it would provide enough value over the alternative of issuing multiple search requests. We already closed the similar issue #6491 for example. I'm raising this issue to document our thinking and gather feedback from others.
The text was updated successfully, but these errors were encountered: