-
Notifications
You must be signed in to change notification settings - Fork 24.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Autocomplete suggestion no longer removes duplicate entries as in ES 2.3 !! #22912
Comments
The "Duplicate filtering" feature outlined in the following article is no more in ES 5 I'll stick with ES 2.3 till a resolution for this missing feature is found. |
@seme1 this change was a requested feature. The design of the new completion suggester is significantly different to what existed before - a suggestion maps to a single document now. This isn't going to change. I'm afraid your only option is to remain on 2.3 until you've had time to rework your application. |
How to "rework my application" ? I am relying on elasticsearch to give users suggestions on what terms/phrases to type when they search. I don't want the auto complete suggestion to give results pointing to individual documents, but rather to the most common/important terms/phrases. |
This was never the intention of the completion suggester - it has never favoured the most common words etc. It is a prefix suggester: it looks at the prefix you've typed in and finds completions from a finite list of strings with exactly that prefix. If you want to use the suggester to suggest search terms which may be duplicated across documents, then you need to index those search terms in a separate index and handle the deduplication yourself. |
Sorry this is still a desaster! Is there no way to add deduplication back? This prevents some customers (and myself) to update to Elasticsearch 5. Collecting all suggest items (deduplicated) into a separate index is not going to work, as it again prevents easy updates. |
If anybody is interested: I wrote a plugin with a hack that restores the old Completion Suggester. The trick was to add a new field type "legacy_completion" that behaves identical to the 2.x version of this suggester. It is a bit of hack, but works. I will post a GIST soon to show what it does - it is included into my own plugin so I have to extract this first. Nevertheless: As all this works fine, why not keep the old completion suggester alive for people that are not interested in "document" suggestions, but want Google-like term autocompletion, but still want to have all in one index. The old suggester is perfect for that and the fact that deletions are not taken into account is not a problem at all (for this use case). In addition the deduplication works fine and is fast enough! Payloads are not required for this use case. I'd suggest to add a field type like my plugin "legacy_completion", but without hacks. I'd suggest to remove payloads for this use-case, too. |
Indeed the new suggester (called the document suggester in Lucene) is document based and does not have any ability to remove dups today. There was some discussion early on about duplicates: #22912 (comment) but I don't think it led to any duplicate removal being added. @areek can you confirm? I suppose we (or users) could add a while loop to query for suggestions, and then iterate (with a larger N) if too many of the top N on the first try were duplicates. |
I'd be happy to keep the old suggester available with own field type, like my plugin is doing. I will post the plugin code tomorrow. It is quite simple, but a hack that breaks easily if internals change or the old suggester code is completely removed in ES 6. That is what I would like to prevent. |
I think we can also add duplicate handling to this suggester as an option in Lucene? I'll open a Lucene issue to explore it; I think it may not be so difficult. |
OK, I opened https://issues.apache.org/jira/browse/LUCENE-7686 to add optional deduplication to the document suggester, and iterated to a working patch I think. |
Hi Mike,
My suggestion would be to use the new suggester's dictionary, but just allow it to be run without return document suggestions: Just find the terms/phrases from dictionary that match and return them as suggestions. The score could be document frequency. That would help both worlds. The alternative would be to keep the old suggester alive as alternative (as said before). To me this would work like the still missing Solr "terms component" in ES. |
@uschindler The way I have things work now to deal with the deletions is that every night I simply delete the suggestion index, recreate it and then re-index all the documents. |
Thanks @uschindler for a nice summary of suggester use cases and pros/cons of the document based vs term based suggesters. Confusingly, there is also I like your idea to use the new suggester's dictionary (its FST): this can make dedup w/ the new suggester very low cost, because the FST has effectively already dedup'd. I'll try to rework my patch to do this ... then we don't need the deduplicating collector. |
Thanks @uschindler for the explanation. I have exactly the same problem. In my case the suggestion can be anything one could actually search for (tags, keywords, names, titles, cities, ...). Each document has those fields, so it's normal that there will be some duplicates. The old suggester was really helpful deduplicating the results for easy access. Now having duplicates I am aggregating them myself, but this only works because I don't have that much data. As the data increases it would take too long to give suggestions. |
OK https://issues.apache.org/jira/browse/LUCENE-7686 is now fixed for Lucene 6.5.0; once we upgrade then we can expose the option in ES. |
Thanks Mike, looks great. We just need some DSL changes to allow to pass "dedup" to suggester. I am still thinking about a good solution for previous "outputs" (if you have multiple suggestions per document, it is not easy to correlate the correct "output" with the suggestion). But with some JSON tricks this might be easy to solve. |
What you describe here would better fit in the |
That's exactly my problem! And the example to suggest author names of books is the exact use-case I (and many others) seem to have. |
There is another related issue I'm facing. Is it possible to save more than one suggestion and display only one based on a certain attribute I pass to the suggestion engine ? For example, the author names may be written differently in different languages. I can have the author names saved in different languages within the suggester engine. Based on the current app interface language, I want the suggested author name to match that of the user display language. |
I have a similar problem. The old suggester allowed to attach the "output" to the suggest input term. With the new one, you can no longer correlate the "input" term with the "output" in form of the "_source" document (you would need something like a "highlighter" to do this!). Of course, you can use different suggest fields in parallel (one for each language), but its then still hard to do the right suggestions if you do "term normalization", because you only get documents back. In my case, the autocompleter suggests also terms based on abbreviations (e.g., user starts to type in an abbreviation like "Pb", the suggester autocompletes "Lead" -- maybe bad example but it should just explain). With the outputs on previous completion this was possible, but no longer with the new one. It is impossible to guess from the autocompleted term which output from "_source" you would choose. Especially if you have many suggested terms (authors example)/document. IMHO, the whole thing should be solved like mentioned by @jimczi : Use the phrase suggester but allow to use it as an "autocompleter" instead of "did you mean". As a side-effect the old completion suggester offered that, including the "weights" (because it defaulted to Suggest-TF if no weights were given). |
I would really like to keep the old completion suggester with a different field type (which is possible, otherwise ES could not read old indexes). And my plugin verifies it: I can still use the old suggester with a new field type "legacy_completion". HACK ALARM: I think I should publish it for download! |
@uschindler I'm relying on AWS ElasticSearch service. So, implemnting the hack is not currently an option. Besides, I have abandoned the upgrade to ES 5.1, and I'm still using ES 2.3. @jimczi |
Hello @uschindler , Could you share your plugin so I can use it? Thanks, |
@TheFireCookie: I was just waiting for somebody who asked! I will post it in my github account the next days. I just have to add license headers and extract it from my "Eierlegendewollmilchsau"-Plugin. Uwe |
Hi @TheFireCookie, https://github.com/uschindler/es-legacy-completion-plugin |
+1 Would love to see this fixed in an upcoming release. |
Hi, no updates on this issue? Is is something that will eventually get addressed? Is there a more current discussion on this topic? |
Maybe I'm missing something but this completely breaks (my) idea of the completion suggester. Very common scenario; store an array of 'tags' on a document and use the completion suggester to provide autocomplete when entering/searching tags on a UI. If a single document has multiple tags that are similar and match the query, only one result will be returned. Example: If I'm missing something, please let me know. |
Any updates from the ES team on this issue ?? |
Any news? The new Lucene version mentioned above is already merged. |
The Lucene option mentioned above will be available in 6.1: This doesn't change the design of the completion suggester which will remain document based. |
Nevertheless: My initial request in this issue was the following: Let the "new" completion suggester live as it is and keep it document based. My proposal was to add "the old behaviour" available as a separate field type. My plugin located at https://github.com/uschindler/es-legacy-completion-plugin is exactly doing this. It allows to define a field with type "legacy-completion" and this is then indexed using the old codec. But it relies on the codec available inside Elasticsearch (the plugin is just a "hack" to make the old index format accessible to users, the query side is working automatically, because the existence of the codec automatically triggers the old suggester code). But to have it as 2 separate field types with 2 separate handlers would be way better. Of course if the old codec will go away in Elasticsearch 6, this is fatal for the plugin. Are there any plans to just keep the old legacy complation codec available? E.g., I am unable to migrate to Elasticsearch 6 (I have not yet tried). Based on the usage/download statistics on the plugin, a lot of people use it now in their ES 5.x installations. It was already forked and adapted to several ES version from 5.x series. So my only wish is: Keep the old codec available, so indexes using it can be still used and created.To allow this add another field type to explicitely use the "legacy suggester". The current version of the legacy completion is perfectly fine for many use cases where you cannot use a separate index, e.g. if you are tagging your documents and want to deliver the tags as autocompletion. E.g., I don't care about deleted documents - and many other do not, too. |
I don't understand why you could not use a separate index. Just create a It's also not true to think that the |
I'm having a problem with Completion Suggester i.e - i have normally indexed all my necessary fields using https://www.elastic.co/guide/en/elasticsearch/reference/current/search-suggesters-completion.html .So i created an index called autoidx and had to manually update all my id's with |
@riemannzeta1191 could you please ask your question on the discuss forum. This issue is closed and should not be used to answer general questions about the completion suggester. |
Alright, that didn't work. Looks like `output` was removed in ES 5.0, which made suggesters more document-oriented. We'll use `_source` to retrieve the titles instead. My only worry is that we might get dupe results if multiple name variants match for a given doc. More info: https://www.elastic.co/guide/en/elasticsearch/reference/5.3/breaking_50_suggester.html elastic/elasticsearch#22912 If this turns out to be an issue, ES 6.1 added `skip_duplicates`: https://www.elastic.co/guide/en/elasticsearch/reference/6.1/search-suggesters-completion.html#skip_duplicates
…s have the same photo shoot. Default behaviour changed since Elastic 1. elastic/elasticsearch#22912
@uschindler @jimczi I went through this issue and like many others, I am looking for an autocomplete solution that suggests terms/phrases from few fields in my index (which is very big). The problem with making the fields unique offline and putting it in a seperate index is we might have to reindex all the documents every n days (to handle deletion and avoid stale terms), which turns out to be too expensive when the index size is very big. Another approach I can think of is to keep count of the unique terms which proves to be expensive operation as well in a distributed indexing system. It would be ideal if we could handle both duplicates and deletion in an optimal manner.
I am curious to know whether we can make the phrase/term suggester work for autocomplete (as mentioned in the comment above)? Can it be done with some code change? Does anyone have any pointers on how to approach that? I can try it out and create a PR. |
I relied on the autocomplete suggester in 2.3 to remove duplicate entries and provide unique suggestions of words/phrases. After upgrading to ES 5, I realized that the suggester is now document-oriented. This changes the logic of the how it can be used with already-built systems. Also, it seems to me that a feature was removed from ES 2.3. In other words, this is not the upgrade I was expecting !!
It also seems that I'm not the only one suffering from this !!
#21676
http://stackoverflow.com/questions/41744712/word-oriented-completion-suggester-elasticsearch-5-x
The text was updated successfully, but these errors were encountered: