Add support for returning documents with completion suggester #19536

areek · 2016-07-21T13:39:05Z

This is a followup to #13576 (comment), enriching completion suggestions
with documents.

This commit enables completion suggester to return documents
associated with suggestions. Now the document source is returned
with every suggestion, which respects source filtering options.

In case of suggest queries spanning more than one shard, the
suggest is executed in two phases, where the last phase fetches
the relevant documents from shards, implying executing suggest
requests against a single shard is more performant due to the
document fetch overhead when the suggest spans multiple shards.

Example completion suggest response:

{
  "song-suggest": [
    {
      "text": "nev",
      "offset": 0,
      "length": 4,
      "options": [ {
          "text": "Nevermind",
          "_index": "music",
          "_type": "song",
          "_id": "52",
          "_score": 52,
          "_source": {
              "song": "Nevermind",
              "artist": "Nirvana"
          }
        }
      ]
    }
  ]
}

NOTE: now that we support returning documents with suggestions, we can remove the payload option

relates #10746

areek · 2016-07-21T13:41:55Z

core/src/main/java/org/elasticsearch/search/suggest/Suggest.java

-        }
-        return groupedSuggestions;
-    }
-
    public static List<Suggestion<? extends Entry<? extends Option>>> reduce(Map<String, List<Suggest.Suggestion>> groupedSuggestions) {


we should extract the reduce logic from suggest, the suggest abstraction is not the right place for these, maybe somewhere in SearchPhraseController? I think this will simplify Suggest and help with nuking the crazy generics. maybe we should simplify this in a subsequent PR?

s1monw · 2016-07-21T15:08:47Z

@areek thanks so much! It took me a while to figure out what you did with the named suggestions etc. I do think there is an easier way to do what you wanted to do. I think we can, instead of use a name, scoredoc[] map we can just use offsets into the ScoreDoc[] which would allow us to keep most of the code as is and just resolve the actual documents once we finish up after the fact. ie. query would be implicitly using offset 0 and each of the suggestion will get an offset assigned in the beginning since we know the size of each suggestion and the query ahead of time.
The code that actually fetches the docs can still simply be a single array so that code can stay! We might need to add some offset, lenght parameters to stuff like #fillDocIdsToLoad but I think that is less intrusive than our current path?

I love the tests man thanks for that

rmuir · 2016-07-22T12:39:13Z

Can we improve the abstractions here? For example, the relatively straightforward ScoreDoc[] and IntArrayList get hidden behind less-obvious layers named SortedDocs and DocIdsToLoad, and neither of these have any javadocs. It makes it hard to tell what is going on with the code.

areek · 2016-07-22T14:02:51Z

@rmuir I am working on removing the DocIdsToLoad abstraction and changing the SortedDocs to store offsets instead of map of IntArrayList, as per @s1monw's comment. Will make sure to add javadocs, to clarify. Thanks for the feedback :)

areek · 2016-07-25T22:14:47Z

@s1monw @rmuir I updated the PR to represent search and suggest score docs array via an offsets array in SortedDocs abstraction. Would love to get feedback on the current approach!

s1monw · 2016-08-02T08:04:54Z

core/src/main/java/org/elasticsearch/action/search/AbstractSearchAsyncAction.java

@@ -74,7 +76,7 @@
    protected final AtomicArray<FirstResult> firstResults;
    private volatile AtomicArray<ShardSearchFailure> shardFailures;
    private final Object shardFailuresMutex = new Object();
-    protected volatile ScoreDoc[] sortedShardList;
+    protected volatile SortedDocs sortedShardList;


can we rename this to sortedShardDocs

s1monw · 2016-08-04T08:16:05Z

core/src/main/java/org/elasticsearch/search/suggest/completion/CompletionSuggestion.java

@@ -35,6 +40,8 @@
 import java.util.Map;
 import java.util.Set;

+import static org.elasticsearch.search.suggest.Suggest.COMPARATOR;


the generic here are aweful. I regret it so much that I added them. We should really clean this up it makes stuff so complicated for no good reason

++, I will try to clean up the generics in Suggest in subsequent PRs

s1monw · 2016-08-04T08:22:11Z

this looks awesome @areek it's simplified so much from the original patch. All the extra classes and abstractions are gone! good job! I added some comments about sharing more code etc. I also think we should have more tests on the CompletionSuggest end, like simple unittests of filter methods etc.
What I am really missing is a place where we document how the fetching works, that the order of the docIDs is crucial. I think we should add it where we create the docId array? I think we are super close here, can you remove the WIP label?

s1monw · 2016-08-04T08:22:45Z

@clintongormley can you please check if we need to work more on docs here?

areek · 2016-08-05T06:51:06Z

Thanks @s1monw for the review! I have added some documentation about how fetching works in SearchPhaseController#sortedDocs and what is expected by SearchPhaseController#merge, added simple unittests for Suggest and CompletionSuggester (more tests can be added later?).

s1monw · 2016-08-05T14:18:11Z

LGTM - I think CI failed so I guess you want to merge master in again and run it again! good job.

This commit enables completion suggester to return documents associated with suggestions. Now the document source is returned with every suggestion, which respects source filtering options. In case of suggest queries spanning more than one shard, the suggest is executed in two phases, where the last phase fetches the relevant documents from shards, implying executing suggest requests against a single shard is more performant due to the document fetch overhead when the suggest spans multiple shards.

areek · 2016-08-05T22:56:16Z

@clintongormley if you any feedback regarding the docs, let me know, will do a separate PR for it.

The payload option was introduced with the new completion suggester implementation in v5, as a stop gap solution to return additional metadata with suggestions. Now we can return associated documents with suggestions (elastic#19536) through fetch phase using stored field (_source). The additional fetch phase ensures that we only fetch the _source for the global top-N suggestions instead of fetching _source of top results for each shard.

speedplane · 2016-09-24T08:48:21Z

I'm using ES 1.7 now and the completion suggester is currently one of my largest memory bottlenecks, so thank you, I'm definitely looking forward to this! Is there documentation yet that describes this new feature?

In my index, to save space, I do not have a _store, so if I want to use this feature, my understanding is that:

I'll have to create a new index that has _store turned on.
During indexing, I put the payloads into this new index. Their ID can be keyed off of their value.
Also when indexing, I point the completion suggester to the documents in the new index.

Do I have this generally correct?

…ce with elastic/elasticsearch#19536

deprecate _ttl/_timestamp and remove them from our tests as per elastic/elasticsearch#18980 so that migrated 2.x indices do not have their code altered (just yet) explicit 5.x spec generation fix failing nodes test because is removed as per elastic/elasticsearch#19218 fixed failing integration tests due to lang no longer defaulting to groovy elastic/elasticsearch#19960 fields => stored fields, updated failing cathelp tests due to endpoint changing suggest response is now generic and gets _source returned in accordance with elastic/elasticsearch#19536 histogram key double not long source filtering include and exclude are now plural script fields tests did not explicitly specify groovy search's StoredFields still sent get task api tests wreaked havoc on the readonly tests scripted metric did not specify lang set script.max_compilations_per_minute on node fix top hits not setting groovy explicitly multi search now response 404 properly multitermvector tests making sure it took more then 0 is no longer reliable beta1 is too fast :) foreach put pipeline processors is no longer an array as per elastic/elasticsearch#19402 revert field=>stored_fields rename on update request remove propery name with dot failure assertion integration test, no longer valid since elastic/elasticsearch#19899 use existing elasticsearch node in test framework could still spawn a new java process revert field=>stored_fields rename on update request get pipeline api is now dictionary based as per elastic/elasticsearch#19685 xpack beta1 related fixes reindex tests not setting all waithandles and taking 3 minutes for no good reason missing fieldsecurity class fix post integration test failures unit test failures add back run as tests now that we send the right header in the beta1 world

* removed deleted file from csproj deprecate _ttl/_timestamp and remove them from our tests as per elastic/elasticsearch#18980 so that migrated 2.x indices do not have their code altered (just yet) explicit 5.x spec generation fix failing nodes test because is removed as per elastic/elasticsearch#19218 fixed failing integration tests due to lang no longer defaulting to groovy elastic/elasticsearch#19960 fields => stored fields, updated failing cathelp tests due to endpoint changing suggest response is now generic and gets _source returned in accordance with elastic/elasticsearch#19536 histogram key double not long source filtering include and exclude are now plural script fields tests did not explicitly specify groovy search's StoredFields still sent get task api tests wreaked havoc on the readonly tests scripted metric did not specify lang set script.max_compilations_per_minute on node fix top hits not setting groovy explicitly multi search now response 404 properly multitermvector tests making sure it took more then 0 is no longer reliable beta1 is too fast :) foreach put pipeline processors is no longer an array as per elastic/elasticsearch#19402 revert field=>stored_fields rename on update request remove propery name with dot failure assertion integration test, no longer valid since elastic/elasticsearch#19899 use existing elasticsearch node in test framework could still spawn a new java process revert field=>stored_fields rename on update request get pipeline api is now dictionary based as per elastic/elasticsearch#19685 xpack beta1 related fixes reindex tests not setting all waithandles and taking 3 minutes for no good reason missing fieldsecurity class fix post integration test failures unit test failures add back run as tests now that we send the right header in the beta1 world * make sure code is generated of master after mass picking commits of 5.x branch

trompx · 2016-11-03T11:45:20Z

Hello,

Compared to the old completion suggester, do we still need to optimize indices to prevent duplicate results when we update some documents ?

As quoted here : https://discuss.elastic.co/t/autocompletion-suggester-duplicate-record-in-suggestion-return/16950/4

The main reason, why you see the suggestion twice is, that even though a
document is deleted and cannot be found anymore, the suggest data
structures are only cleaned up during merges/optimizations. Running
optimize should fix this.

Anyway awesome work on this new suggester. By any chance have you run any benchmark to compare suggestion speed 2.2 vs 5.0?

areek · 2016-11-03T16:03:10Z

Hey @trompx,

Compared to the old completion suggester, do we still need to optimize indices to prevent duplicate results when we update some documents ?

No we don't need to as the new completion suggester is near-real time and is expected to take into account deleted documents (even if the deleted document hasn't been merged away).

Anyway awesome work on this new suggester. By any chance have you run any benchmark to compare suggestion speed 2.2 vs 5.0?

Thanks :), the benchmark for the new implementation can be found here

areek added >feature release highlight :Search Relevance/Suggesters "Did you mean" and suggestions as you type v5.0.0-alpha5 labels Jul 21, 2016

areek reviewed Jul 21, 2016
View reviewed changes

areek added the review label Jul 21, 2016

areek force-pushed the enhancement/completion_suggester_documents branch from f390633 to e41f54f Compare July 21, 2016 13:53

areek added the WIP label Jul 22, 2016

areek force-pushed the enhancement/completion_suggester_documents branch 10 times, most recently from 8484eb3 to 748a036 Compare July 25, 2016 22:08

clintongormley added v5.0.0-beta1 and removed v5.0.0-alpha5 labels Jul 28, 2016

s1monw reviewed Aug 2, 2016
View reviewed changes

s1monw reviewed Aug 4, 2016
View reviewed changes

areek removed the WIP label Aug 4, 2016

areek force-pushed the enhancement/completion_suggester_documents branch 4 times, most recently from e71e0ac to 1d29080 Compare August 5, 2016 06:40

areek force-pushed the enhancement/completion_suggester_documents branch from 1d29080 to b83b495 Compare August 5, 2016 06:54

areek force-pushed the enhancement/completion_suggester_documents branch from b83b495 to 260cc3d Compare August 5, 2016 18:56

areek force-pushed the enhancement/completion_suggester_documents branch from 260cc3d to fee013c Compare August 5, 2016 21:52

areek merged commit 469eb25 into elastic:master Aug 5, 2016

areek mentioned this pull request Aug 8, 2016

Remove payload option from completion suggester #19877

Merged

areek mentioned this pull request Aug 15, 2016

Completion Suggester: Support returning documents with completions #13576

Closed

clintongormley mentioned this pull request Aug 25, 2016

v5 Completion suggester error #20160

Closed

Mpdreamz added a commit to elastic/elasticsearch-net that referenced this pull request Sep 27, 2016

suggest response is now generic and gets _source returned in accordan…

d7a79b5

…ce with elastic/elasticsearch#19536

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add support for returning documents with completion suggester #19536

Add support for returning documents with completion suggester #19536

areek commented Jul 21, 2016 •

edited

Loading

areek Jul 21, 2016

s1monw Aug 2, 2016

s1monw commented Jul 21, 2016

rmuir commented Jul 22, 2016

areek commented Jul 22, 2016

areek commented Jul 25, 2016

s1monw Aug 2, 2016

s1monw Aug 4, 2016

areek Aug 5, 2016

s1monw commented Aug 4, 2016

s1monw commented Aug 4, 2016

areek commented Aug 5, 2016

s1monw commented Aug 5, 2016

areek commented Aug 5, 2016

speedplane commented Sep 24, 2016

trompx commented Nov 3, 2016

areek commented Nov 3, 2016

Add support for returning documents with completion suggester #19536

Add support for returning documents with completion suggester #19536

Conversation

areek commented Jul 21, 2016 • edited Loading

areek Jul 21, 2016

Choose a reason for hiding this comment

s1monw Aug 2, 2016

Choose a reason for hiding this comment

s1monw commented Jul 21, 2016

rmuir commented Jul 22, 2016

areek commented Jul 22, 2016

areek commented Jul 25, 2016

s1monw Aug 2, 2016

Choose a reason for hiding this comment

s1monw Aug 4, 2016

Choose a reason for hiding this comment

areek Aug 5, 2016

Choose a reason for hiding this comment

s1monw commented Aug 4, 2016

s1monw commented Aug 4, 2016

areek commented Aug 5, 2016

s1monw commented Aug 5, 2016

areek commented Aug 5, 2016

speedplane commented Sep 24, 2016

trompx commented Nov 3, 2016

areek commented Nov 3, 2016

areek commented Jul 21, 2016 •

edited

Loading