Rethink how to index nested documents when types are gone #24362

jpountz · 2017-04-27T10:27:55Z

Currently nested documents reuse the _type field in order to identify root documents from children: the value of the type field starts with __ for children. However with types going away, we would probably like to stop indexing the _type field so we would need to find other ways to identify root/children.

For efficiency reasons, we need a fast query that identifies root documents. One way we could do that would be by adding a special field/value pair to root documents. However, we would probably only want to do that when there are nested mappings, which means we would have to reject adding nested objects to existing mappings.

The text was updated successfully, but these errors were encountered:

martijnvg · 2017-04-28T08:01:05Z

I think this is a fair tradeoff. Especially since it would still allow the addition of additional nested fields, so it would only affect indices where no nested fields have been defined.

jpountz · 2017-05-05T12:48:49Z

Discussed in Fixit-Friday: we will benchmark the overhead of a field that always has the same value. If it does not matter, we can add a meta-data field that identifies nested documents, otherwise we can either:

enforce nested mappings to be configured at index-creation time and only add the metadata field for root docs when there are nested mappings
or add an option to disable the fied that identifies root documents so that users who do not need the feature can get some performance back

s1monw · 2017-11-15T10:04:53Z

I looked at this yesterday and after a closer look I think we can get the best of both worlds without indexing / adding anything to the documents. Today we index _version, _seq_id, and _primary_term for every nested document. While this is unnecessary and we can omit it we can use a DocValuesFieldExistsQuery once we don't index this for nested document. The values for these fields are all dummy values anyway so they have no real value for the user. Yet, when we look at the place where these dummy values are added we see that there is a reason for this:

https://github.com/elastic/elasticsearch/blob/master/core/src/main/java/org/elasticsearch/index/mapper/SeqNoFieldMapper.java#L247

// In the case of nested docs, let's fill nested docs with seqNo=1 and
// primaryTerm=0 so that Lucene doesn't write a Bitset for documents
// that don't have the field. This is consistent with the default value
// for efficiency.

This is a good reason for at least _seq_id and _version but given that we only access _primary_term rarely it might be ok to drop the value for nested docs in such a case. This would allow us to find parents very easily without paying any extra cost on indexing.

@bleskes I wonder what you think if this is a feasible solution?

This change stops indxing the `_primary_term` field for nested documents to allow fast retrieval of parent documents. Today we create a docvalues field for children to ensure we have a dense datastructure on disk. Yet, since we only use the primary term to tie-break on when we see the same seqID on indexing having a dense datastructure is less important. We can use this now to improve the nested docs performance and it's memory footprint. Relates to elastic#24362

This change stops indexing the `_primary_term` field for nested documents to allow fast retrieval of parent documents. Today we create a docvalues field for children to ensure we have a dense datastructure on disk. Yet, since we only use the primary term to tie-break on when we see the same seqID on indexing having a dense datastructure is less important. We can use this now to improve the nested docs performance and it's memory footprint. Relates to #24362

javanna · 2018-03-16T10:41:52Z

@elastic/es-search-aggs

jpountz · 2018-09-04T13:36:04Z

We discussed it, this issue incorporated two things:

the need to identify root documents efficiently, which @s1monw addressed as explained above
the removal of the _type field, which still needs to be addressed

We agreed to rename _type to _nested to address the second point.

Currently nested documents repurpose the _type field to store their nested paths. This commit adds a dedicated _nested_path field instead, which decouples this information from types and will allow the removal of the _type field entirely further down the line. To preserve backwards compatibility, references to this field are mediated via methods that take an index settings object, and indexes created before 8x still use the _type field. Relates to #41059 Closes #24362

jpountz added :Search Foundations/Mapping Index mappings, including merging and defining field types discuss labels Apr 27, 2017

s1monw mentioned this issue Nov 21, 2017

Use the primary_term field to identify parent documents #27469

Merged

colings86 added the >non-issue label Apr 24, 2018

jpountz added team-discuss and removed discuss labels Aug 24, 2018

jpountz added help wanted adoptme and removed team-discuss labels Sep 4, 2018

jeffreynscrbdee mentioned this issue Sep 25, 2018

Extra Lucene DocValueExistsQuery fired - due to nested mapping and primary_terms #34067

Closed

jtibshirani mentioned this issue Apr 15, 2019

Types removal in 8.0 #41059

Closed

66 tasks

romseygeek mentioned this issue Dec 18, 2019

Don't use _type field for nested paths #50312

Closed

romseygeek mentioned this issue Jan 16, 2020

Add NestedPathFieldMapper to store nested path information #51100

Merged

romseygeek closed this as completed in #51100 Jan 22, 2020

javanna added the Team:Search Foundations Meta label for the Search Foundations team in Elasticsearch label Jul 16, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Rethink how to index nested documents when types are gone #24362

Rethink how to index nested documents when types are gone #24362

jpountz commented Apr 27, 2017

martijnvg commented Apr 28, 2017

jpountz commented May 5, 2017

s1monw commented Nov 15, 2017

javanna commented Mar 16, 2018

jpountz commented Sep 4, 2018

Rethink how to index nested documents when types are gone #24362

Rethink how to index nested documents when types are gone #24362

Comments

jpountz commented Apr 27, 2017

martijnvg commented Apr 28, 2017

jpountz commented May 5, 2017

s1monw commented Nov 15, 2017

javanna commented Mar 16, 2018

jpountz commented Sep 4, 2018