-
Notifications
You must be signed in to change notification settings - Fork 24.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add doc values support to _parent field data #6107
Comments
+1! The tricky bit here is that the |
Doc values for parent child isn't so tricky. The parent child doc values field should just contain the values of both the The |
This sounds good to me! |
Will this actually address the problem raised in #3516 (as it was closed in favor of this one)? This seems to imply a reduction in the space needed for _parent, but our problem is that the id_cache is growing linearly with the number of parents, even if there are no children. |
@ostersc the id_cache has been removed and instead the p/c data is stored in the fielddata cache. it still uses up memory. this issue is about moving the p/c data to disk, which will save a lot of memory. |
@clintongormley thanks for clarifying. yes, we are seeing this issue manifest in the fielddata_breaker, but in verifying it was related to parent/child issues, we ran |
@ostersc yes, that's just for bwc. the actual store is in the fielddata. |
@clintongormley gotcha. Is there any known work around for this? I was quite surprised to find the field cache grow for each parent doc (even if there are no children), so we are looking needing to move away from using parent-child as we need to support hundreds of millions of parent docs with sparsely populated children. |
@ostersc Have a read of this chapter: http://www.elasticsearch.org/guide/en/elasticsearch/guide/current/relations.html |
On indices created on or after 1.4.0 will store the _parent field also as doc values. Also added `index._parent.doc_values` option which controls whether doc values are used for parent/child field data, if set to false parent/child field data will be created on the fly based on _parent field inverted index. The `index._parent.doc_values` defaults to true. Closes elastic#6107 Closes elastic#6511
+1 |
This a breaking change: 1) A parent type needs be marked as parent in the _parent field mapping of the parent type. 2) top_children query will be removed. The top_children query was somewhat an alternative to has_child when it came to speed, but it isn't accurate and wasn't always faster. Indices created before 2.0 will use field data and the old way of executing queries, but indices created on or after 2.0 will use the Lucene join. Closes elastic#6107 Closes elastic#6511 Closes elastic#8134
This a breaking change: 1) A parent type needs be marked as parent in the _parent field mapping of the parent type. 2) top_children query will be removed. The top_children query was somewhat an alternative to has_child when it came to speed, but it isn't accurate and wasn't always faster. Indices created before 2.0 will use field data and the old way of executing queries, but indices created on or after 2.0 will use the Lucene join. Closes elastic#6107 Closes elastic#6511 Closes elastic#8134
This a breaking change: 1) A parent type needs be marked as parent in the _parent field mapping of the parent type. 2) top_children query will be removed. The top_children query was somewhat an alternative to has_child when it came to speed, but it isn't accurate and wasn't always faster. Indices created before 2.0 will use field data and the old way of executing queries, but indices created on or after 2.0 will use the Lucene join. Closes elastic#6107 Closes elastic#6511 Closes elastic#8134
This a breaking change: 1) A parent type needs be marked as parent in the _parent field mapping of the parent type. 2) top_children query will be removed. The top_children query was somewhat an alternative to has_child when it came to speed, but it isn't accurate and wasn't always faster. Indices created before 2.0 will use field data and the old way of executing queries, but indices created on or after 2.0 will use the Lucene join. Closes elastic#6107 Closes elastic#6511 Closes elastic#8134
This a breaking change: 1) A parent type needs be marked as parent in the _parent field mapping of the parent type. 2) top_children query will be removed. The top_children query was somewhat an alternative to has_child when it came to speed, but it isn't accurate and wasn't always faster. Indices created before 2.0 will use field data and the old way of executing queries, but indices created on or after 2.0 will use the Lucene join. Closes elastic#6107 Closes elastic#6511 Closes elastic#8134
This a breaking change: 1) A parent type needs be marked as parent in the _parent field mapping of the parent type. 2) The has_child and has_parent queries can't be used in index aliases any more, because during query parse time it requires the search context to be set. During normal _search api usage this is the case, but not when adding an index alias. Indices created before 2.0 will use field data and the old way of executing queries, but indices created on or after 2.0 will use the Lucene join and encode the parent/child relation at index time in a special join doc values field. Closes elastic#6107 Closes elastic#6511 Closes elastic#8134
This a breaking change: 1) A parent type needs be marked as parent in the _parent field mapping of the parent type. 2) The has_child and has_parent queries can't be used in index aliases any more, because during query parse time it requires the search context to be set. During normal _search api usage this is the case, but not when adding an index alias. Indices created before 2.0 will use field data and the old way of executing queries, but indices created on or after 2.0 will use the Lucene join and encode the parent/child relation at index time in a special join doc values field. Closes elastic#6107 Closes elastic#6511 Closes elastic#8134
This a breaking change: 1) A parent type needs be marked as parent in the _parent field mapping of the parent type. 2) The has_child and has_parent queries can't be used in index aliases any more, because during query parse time it requires the search context to be set. During normal _search api usage this is the case, but not when adding an index alias. Indices created before 2.0 will use field data and the old way of executing queries, but indices created on or after 2.0 will use the Lucene join and encode the parent/child relation at index time in a special join doc values field. Closes elastic#6107 Closes elastic#6511 Closes elastic#8134
This a breaking change: 1) A parent type needs be marked as parent in the _parent field mapping of the parent type. 2) The has_child and has_parent queries can't be used in index aliases any more, because during query parse time it requires the search context to be set. During normal _search api usage this is the case, but not when adding an index alias. Indices created before 2.0 will use field data and the old way of executing queries, but indices created on or after 2.0 will use the Lucene join and encode the parent/child relation at index time in a special join doc values field. Closes elastic#6107 Closes elastic#6511 Closes elastic#8134
* Cut the `has_child` and `has_parent` queries over to use Lucene's query time global ordinal join. The main benefit of this change is that parent/child queries can now efficiently execute if parent/child queries are wrapped in a bigger boolean query. If the rest of the query only hit a few documents both has_child and has_parent queries don't need to evaluate all parent or child documents any more. * Cut the `_parent` field over to use doc values. This significantly reduces the on heap memory footprint of parent/child, because the parent id values are never loaded into memory. Breaking changes: * The `type` option on the `_parent` field can only point to a parent type that doesn't exist yet, so this means that an existing type/mapping can't become a parent type any longer. * The `has_child` and `has_parent` queries can no longer be use in alias filters. All these changes, improvements and breaks in compatibility only apply for indices created with ES version 2.0 or higher. For indices creates with ES <= 2.0 the older implementation is used. It is highly recommended to re-index all your indices with parent and child documents to benefit from all the improvements that come with this refactoring. The easiest way to achieve this is by using the scan and bulk apis using a simple script. Closes elastic#6107 Closes elastic#8134
Is there any estimate on when the 2.0 version will be released? We will likely have to remove usage of parent / child relationships, until they are moved to doc-values. The in memory relationships are not scaling well with our application. |
As you can see from #9970 there are 3 remaining boxes to tick and good progress is being made these days, so the first release candidate should happen pretty soon. However it might still take time between the release candidate and the GA depending on feedback. |
Thanks for the quick response! I will run some testing on the master branch. |
Is this still happening for 2.0? |
@alexkavon yes, doc values support for parent/child will be included in the first 2.0 release. If you want to try it out, just make sure that you create a new index once upgraded to 2.0. The doc values support is only enabled on indices created on or after version 2.0. Indices that existed before the upgrade to 2.0 will remain to work and perform in the same way they did on previous 1.x releases. |
So a reindexing should take care of this then? |
@alexkavon Yes, once upgraded to 2.0.x a reindex would use the new p/c implementation. |
The
_parent
field can easily have a high cardinality, as a consequence field data for this field can take a lot of memory. It would be useful to have the ability to store this mapping on disk using Lucene doc values.Doc values proved to perform very well for aggregations in combination with global ordinals (#5672). So now that parent/child queries use global ordinals as well (#5846) I think doc values could even be the default for the
_parent
field?The text was updated successfully, but these errors were encountered: