Deprecate fielddata on `_id`/`_uid` #25240

jpountz · 2017-06-15T07:18:38Z

Some features like random_score use fielddata on _id and our documentation sometimes recommends to sort on _id in order to have a stable sort across pages (eg. https://www.elastic.co/guide/en/elasticsearch/reference/master/search-request-search-after.html). However fielddata on such unique fields has a huge cost if you have significant amounts of data, we should move away from it.

We probably have two options here, which are to either add doc values to _id (#11887) but this proved controversial due to the overhead it adds to the index, or switch to _doc but this has the drawback of not being comparable across shards or after merges.

The text was updated successfully, but these errors were encountered:

jpountz · 2017-06-16T13:35:25Z

We discussed in FixitFriday and agreed to cut over random_score to use sequence numbers for random score generation. The only drawback compared to today is that documents would get a different score after they are updated, even if the same seed is used.

Consistent sorting on different point-in-time snapshots was more controversial. Here are some opinions:

there is nothing we can do in that case, we should set expectations and just remove the recommendation to use the _uid as a tie breaker
we should recommend using _seq_no as a tie-breaker for that purpose rather than _uid. It would not be stable in case of updates but is the best we can provide which works in practice.
we should automatically append _seq_no to every sort internally. Same reasoning as the previous idea but without exposing _seq_no which is quite an expert/internal feature.

rjernst · 2017-06-22T21:01:27Z

The only drawback compared to today is that documents would get a different score after they are updated

I don't see how this is any different than the way it used to be, using the docid. This is very trappy, to have a random score with the capability of setting a seed, but the seed does not actually guarantee anything.

Why not expose the ability to set what field is used, and make it required if using a seed. Then the default can be a truly random score (using docid), but if a seed+field is set, that is used.

jpountz · 2017-07-04T08:55:43Z

I like that idea.

…ion. We currently use fielddata on the `_id` field which is trappy, especially as we do it implicitly. This changes the `random_score` function to use doc ids when no seed is provided and to suggest a field when a seed is provided. For now the change only emits a deprecation warning when no field is supplied but this should be replaced by a strict check on 7.0. Closes elastic#25240

…ion. (#25594) We currently use fielddata on the `_id` field which is trappy, especially as we do it implicitly. This changes the `random_score` function to use doc ids when no seed is provided and to suggest a field when a seed is provided. For now the change only emits a deprecation warning when no field is supplied but this should be replaced by a strict check on 7.0. Closes #25240

…, `_shard`]. `_id` requires fielddata to be loaded into memory for sorting, which does not scale on large clusters. Adding doc values to the `_id` field proved controversial so instead, we are removing use-cases for fieddata on the `_uid` or `_id` fields. Relates elastic#25240

jpountz added blocker discuss v6.0.0 labels Jun 15, 2017

jpountz mentioned this issue Jun 15, 2017

[context view] using the _uid field as a tiebreaker consumes a lot of fielddata memory elastic/kibana#11925

Closed

colings86 added the :Fielddata label Jun 23, 2017

rjernst mentioned this issue Jul 6, 2017

Add a simple random sampling query #25561

Closed

jpountz mentioned this issue Jul 7, 2017

Index ids in binary form. #25352

Merged

jpountz mentioned this issue Jul 7, 2017

Require a field when a seed is provided to the random_score function. #25594

Merged

jpountz closed this as completed in #25594 Jul 19, 2017

jpountz mentioned this issue Jul 19, 2017

Change the recommended tie-breaking fields from [_id] to [_seq_no, _shard]. #25797

Closed

clintongormley added v6.0.0-beta1 and removed v6.0.0 labels Jul 25, 2017

clintongormley added :Search/Search Search-related issues that do not fall into other categories and removed :Fielddata labels Feb 14, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Deprecate fielddata on `_id`/`_uid` #25240

Deprecate fielddata on `_id`/`_uid` #25240

jpountz commented Jun 15, 2017

jpountz commented Jun 16, 2017

rjernst commented Jun 22, 2017

jpountz commented Jul 4, 2017

Deprecate fielddata on _id/_uid #25240

Deprecate fielddata on _id/_uid #25240

Comments

jpountz commented Jun 15, 2017

jpountz commented Jun 16, 2017

rjernst commented Jun 22, 2017

jpountz commented Jul 4, 2017

Deprecate fielddata on `_id`/`_uid` #25240

Deprecate fielddata on `_id`/`_uid` #25240