-
Notifications
You must be signed in to change notification settings - Fork 24.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add support for returning documents with completion suggester #19536
Add support for returning documents with completion suggester #19536
Conversation
} | ||
return groupedSuggestions; | ||
} | ||
|
||
public static List<Suggestion<? extends Entry<? extends Option>>> reduce(Map<String, List<Suggest.Suggestion>> groupedSuggestions) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we should extract the reduce logic from suggest, the suggest abstraction is not the right place for these, maybe somewhere in SearchPhraseController
? I think this will simplify Suggest
and help with nuking the crazy generics. maybe we should simplify this in a subsequent PR?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
++
f390633
to
e41f54f
Compare
@areek thanks so much! It took me a while to figure out what you did with the named suggestions etc. I do think there is an easier way to do what you wanted to do. I think we can, instead of use a I love the tests man thanks for that |
Can we improve the abstractions here? For example, the relatively straightforward |
8484eb3
to
748a036
Compare
@@ -74,7 +76,7 @@ | |||
protected final AtomicArray<FirstResult> firstResults; | |||
private volatile AtomicArray<ShardSearchFailure> shardFailures; | |||
private final Object shardFailuresMutex = new Object(); | |||
protected volatile ScoreDoc[] sortedShardList; | |||
protected volatile SortedDocs sortedShardList; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can we rename this to sortedShardDocs
@@ -35,6 +40,8 @@ | |||
import java.util.Map; | |||
import java.util.Set; | |||
|
|||
import static org.elasticsearch.search.suggest.Suggest.COMPARATOR; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the generic here are aweful. I regret it so much that I added them. We should really clean this up it makes stuff so complicated for no good reason
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
++, I will try to clean up the generics in Suggest
in subsequent PRs
this looks awesome @areek it's simplified so much from the original patch. All the extra classes and abstractions are gone! good job! I added some comments about sharing more code etc. I also think we should have more tests on the CompletionSuggest end, like simple unittests of filter methods etc. |
@clintongormley can you please check if we need to work more on docs here? |
e71e0ac
to
1d29080
Compare
Thanks @s1monw for the review! I have added some documentation about how fetching works in |
1d29080
to
b83b495
Compare
LGTM - I think CI failed so I guess you want to merge master in again and run it again! good job. |
b83b495
to
260cc3d
Compare
This commit enables completion suggester to return documents associated with suggestions. Now the document source is returned with every suggestion, which respects source filtering options. In case of suggest queries spanning more than one shard, the suggest is executed in two phases, where the last phase fetches the relevant documents from shards, implying executing suggest requests against a single shard is more performant due to the document fetch overhead when the suggest spans multiple shards.
260cc3d
to
fee013c
Compare
@clintongormley if you any feedback regarding the docs, let me know, will do a separate PR for it. |
The payload option was introduced with the new completion suggester implementation in v5, as a stop gap solution to return additional metadata with suggestions. Now we can return associated documents with suggestions (elastic#19536) through fetch phase using stored field (_source). The additional fetch phase ensures that we only fetch the _source for the global top-N suggestions instead of fetching _source of top results for each shard.
I'm using ES 1.7 now and the completion suggester is currently one of my largest memory bottlenecks, so thank you, I'm definitely looking forward to this! Is there documentation yet that describes this new feature? In my index, to save space, I do not have a _store, so if I want to use this feature, my understanding is that:
Do I have this generally correct? |
deprecate _ttl/_timestamp and remove them from our tests as per elastic/elasticsearch#18980 so that migrated 2.x indices do not have their code altered (just yet) explicit 5.x spec generation fix failing nodes test because is removed as per elastic/elasticsearch#19218 fixed failing integration tests due to lang no longer defaulting to groovy elastic/elasticsearch#19960 fields => stored fields, updated failing cathelp tests due to endpoint changing suggest response is now generic and gets _source returned in accordance with elastic/elasticsearch#19536 histogram key double not long source filtering include and exclude are now plural script fields tests did not explicitly specify groovy search's StoredFields still sent get task api tests wreaked havoc on the readonly tests scripted metric did not specify lang set script.max_compilations_per_minute on node fix top hits not setting groovy explicitly multi search now response 404 properly multitermvector tests making sure it took more then 0 is no longer reliable beta1 is too fast :) foreach put pipeline processors is no longer an array as per elastic/elasticsearch#19402 revert field=>stored_fields rename on update request remove propery name with dot failure assertion integration test, no longer valid since elastic/elasticsearch#19899 use existing elasticsearch node in test framework could still spawn a new java process revert field=>stored_fields rename on update request get pipeline api is now dictionary based as per elastic/elasticsearch#19685 xpack beta1 related fixes reindex tests not setting all waithandles and taking 3 minutes for no good reason missing fieldsecurity class fix post integration test failures unit test failures add back run as tests now that we send the right header in the beta1 world
* removed deleted file from csproj deprecate _ttl/_timestamp and remove them from our tests as per elastic/elasticsearch#18980 so that migrated 2.x indices do not have their code altered (just yet) explicit 5.x spec generation fix failing nodes test because is removed as per elastic/elasticsearch#19218 fixed failing integration tests due to lang no longer defaulting to groovy elastic/elasticsearch#19960 fields => stored fields, updated failing cathelp tests due to endpoint changing suggest response is now generic and gets _source returned in accordance with elastic/elasticsearch#19536 histogram key double not long source filtering include and exclude are now plural script fields tests did not explicitly specify groovy search's StoredFields still sent get task api tests wreaked havoc on the readonly tests scripted metric did not specify lang set script.max_compilations_per_minute on node fix top hits not setting groovy explicitly multi search now response 404 properly multitermvector tests making sure it took more then 0 is no longer reliable beta1 is too fast :) foreach put pipeline processors is no longer an array as per elastic/elasticsearch#19402 revert field=>stored_fields rename on update request remove propery name with dot failure assertion integration test, no longer valid since elastic/elasticsearch#19899 use existing elasticsearch node in test framework could still spawn a new java process revert field=>stored_fields rename on update request get pipeline api is now dictionary based as per elastic/elasticsearch#19685 xpack beta1 related fixes reindex tests not setting all waithandles and taking 3 minutes for no good reason missing fieldsecurity class fix post integration test failures unit test failures add back run as tests now that we send the right header in the beta1 world * make sure code is generated of master after mass picking commits of 5.x branch
Hello, Compared to the old completion suggester, do we still need to optimize indices to prevent duplicate results when we update some documents ? As quoted here : https://discuss.elastic.co/t/autocompletion-suggester-duplicate-record-in-suggestion-return/16950/4
Anyway awesome work on this new suggester. By any chance have you run any benchmark to compare suggestion speed 2.2 vs 5.0? |
Hey @trompx,
No we don't need to as the new completion suggester is near-real time and is expected to take into account deleted documents (even if the deleted document hasn't been merged away).
Thanks :), the benchmark for the new implementation can be found here |
This is a followup to #13576 (comment), enriching completion suggestions
with documents.
This commit enables completion suggester to return documents
associated with suggestions. Now the document source is returned
with every suggestion, which respects source filtering options.
In case of suggest queries spanning more than one shard, the
suggest is executed in two phases, where the last phase fetches
the relevant documents from shards, implying executing suggest
requests against a single shard is more performant due to the
document fetch overhead when the suggest spans multiple shards.
Example completion suggest response:
NOTE: now that we support returning documents with suggestions, we can remove the
payload
optionrelates #10746