-
Notifications
You must be signed in to change notification settings - Fork 24.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Consider storing _id through doc values. #60778
Comments
Pinging @elastic/es-search (:Search/Search) |
There may be connections with #48699. If we dropped ids from indices it would be problematic to rely on them for tie-breaking. On the other hand, removing ids from an index would be more efficient if they were stored in a doc-value field as it would avoid having to decompress and then compress again all stored fields. |
Some notes from our team discussion:
I'll leave this open for a bit longer, but will close if there's no more interest or feedback. |
Pinging @elastic/es-analytics-geo (:Analytics/Aggregations) |
I just had to make a change in enterprise search related to this. We're working on changing how we store documents and I was hoping to not have to copy |
@davishmcclurg this is helpful feedback. For better context, could you describe some use cases where users want to sort on the |
Internally, we sort on ID (using |
Just adding a necrocomment: from what i remember, this has already been discussed and reasoning behind not adding doc_values was that cardinality is as high as number of documents, while doc_values expect something more selective. This may be currently a complete lie since that discussion was from years ago and my memory isn't 100% reliable. |
I talked to @jtibshirani and she's ok closing this as something that we're not going to have time to do soon. So I'll do that. But! TSDB (#74660) is not going to store |
I am sad to see this being closed. My use cases (in the same system) are exactly the one described above:
|
One option could be that one could optionally enable doc_values on |
I wanted to revisit the idea of storing
_id
as a doc value field. To avoid duplicating data, we would also stop storing_id
as a stored field. During the fetch phase,_id
would be retrieved from doc values instead of stored fields as it is now.We previously discussed this in #11887, but the trade-offs may be different now that we have compression for binary doc values. @jpountz recently ran an experiment that showed switching
_id
from a stored to binary doc value field didn't increase index size.Some advantages to having doc values for
_id
:_id
would work without loading on-heap 'fielddata'._id
, but do not load detailed data like_source
. These searches could be more efficient, since we would no longer decompress all stored fields (including_source
) just to retrieve_id
.One question I have is whether sorting on
_id
would still be useful after we introduce search contexts, with a built-in tiebreaker (#56828). And is there a use case for aggregating on_id
?The text was updated successfully, but these errors were encountered: