-
Notifications
You must be signed in to change notification settings - Fork 24.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Rethink how to index nested documents when types are gone #24362
Comments
I think this is a fair tradeoff. Especially since it would still allow the addition of additional nested fields, so it would only affect indices where no nested fields have been defined. |
Discussed in Fixit-Friday: we will benchmark the overhead of a field that always has the same value. If it does not matter, we can add a meta-data field that identifies nested documents, otherwise we can either:
|
I looked at this yesterday and after a closer look I think we can get the best of both worlds without indexing / adding anything to the documents. Today we index // In the case of nested docs, let's fill nested docs with seqNo=1 and
// primaryTerm=0 so that Lucene doesn't write a Bitset for documents
// that don't have the field. This is consistent with the default value
// for efficiency. This is a good reason for at least @bleskes I wonder what you think if this is a feasible solution? |
This change stops indxing the `_primary_term` field for nested documents to allow fast retrieval of parent documents. Today we create a docvalues field for children to ensure we have a dense datastructure on disk. Yet, since we only use the primary term to tie-break on when we see the same seqID on indexing having a dense datastructure is less important. We can use this now to improve the nested docs performance and it's memory footprint. Relates to elastic#24362
This change stops indexing the `_primary_term` field for nested documents to allow fast retrieval of parent documents. Today we create a docvalues field for children to ensure we have a dense datastructure on disk. Yet, since we only use the primary term to tie-break on when we see the same seqID on indexing having a dense datastructure is less important. We can use this now to improve the nested docs performance and it's memory footprint. Relates to #24362
This change stops indexing the `_primary_term` field for nested documents to allow fast retrieval of parent documents. Today we create a docvalues field for children to ensure we have a dense datastructure on disk. Yet, since we only use the primary term to tie-break on when we see the same seqID on indexing having a dense datastructure is less important. We can use this now to improve the nested docs performance and it's memory footprint. Relates to #24362
@elastic/es-search-aggs |
We discussed it, this issue incorporated two things:
We agreed to rename |
Currently nested documents repurpose the _type field to store their nested paths. This commit adds a dedicated _nested_path field instead, which decouples this information from types and will allow the removal of the _type field entirely further down the line. To preserve backwards compatibility, references to this field are mediated via methods that take an index settings object, and indexes created before 8x still use the _type field. Relates to #41059 Closes #24362
Currently nested documents reuse the
_type
field in order to identify root documents from children: the value of thetype
field starts with__
for children. However with types going away, we would probably like to stop indexing the_type
field so we would need to find other ways to identify root/children.For efficiency reasons, we need a fast query that identifies root documents. One way we could do that would be by adding a special field/value pair to root documents. However, we would probably only want to do that when there are nested mappings, which means we would have to reject adding
nested
objects to existing mappings.The text was updated successfully, but these errors were encountered: