-
Notifications
You must be signed in to change notification settings - Fork 25k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Deprecate fielddata on _id
/_uid
#25240
Comments
We discussed in FixitFriday and agreed to cut over Consistent sorting on different point-in-time snapshots was more controversial. Here are some opinions:
|
I don't see how this is any different than the way it used to be, using the docid. This is very trappy, to have a random score with the capability of setting a seed, but the seed does not actually guarantee anything. Why not expose the ability to set what field is used, and make it required if using a seed. Then the default can be a truly random score (using docid), but if a seed+field is set, that is used. |
I like that idea. |
…ion. We currently use fielddata on the `_id` field which is trappy, especially as we do it implicitly. This changes the `random_score` function to use doc ids when no seed is provided and to suggest a field when a seed is provided. For now the change only emits a deprecation warning when no field is supplied but this should be replaced by a strict check on 7.0. Closes elastic#25240
…ion. (#25594) We currently use fielddata on the `_id` field which is trappy, especially as we do it implicitly. This changes the `random_score` function to use doc ids when no seed is provided and to suggest a field when a seed is provided. For now the change only emits a deprecation warning when no field is supplied but this should be replaced by a strict check on 7.0. Closes #25240
…, `_shard`]. `_id` requires fielddata to be loaded into memory for sorting, which does not scale on large clusters. Adding doc values to the `_id` field proved controversial so instead, we are removing use-cases for fieddata on the `_uid` or `_id` fields. Relates elastic#25240
Some features like
random_score
use fielddata on_id
and our documentation sometimes recommends to sort on_id
in order to have a stable sort across pages (eg. https://www.elastic.co/guide/en/elasticsearch/reference/master/search-request-search-after.html). However fielddata on such unique fields has a huge cost if you have significant amounts of data, we should move away from it.We probably have two options here, which are to either add doc values to
_id
(#11887) but this proved controversial due to the overhead it adds to the index, or switch to_doc
but this has the drawback of not being comparable across shards or after merges.The text was updated successfully, but these errors were encountered: