Skip to content

Commit

Permalink
Parent/child: refactored _parent field mapper and parent/child queries
Browse files Browse the repository at this point in the history
* Cut the `has_child` and `has_parent` queries over to use Lucene's query time global ordinal join. The main benefit of this change is that parent/child queries can now efficiently execute if parent/child queries are wrapped in a bigger boolean query. If the rest of the query only hit a few documents both has_child and has_parent queries don't need to evaluate all parent or child documents any more.
* Cut the `_parent` field over to use doc values. This significantly reduces the on heap memory footprint of parent/child, because the parent id values are never loaded into memory.

Breaking changes:
* The `type` option on the `_parent` field can only point to a parent type that doesn't exist yet, so this means that an existing type/mapping can't become a parent type any longer.
* The `has_child` and `has_parent` queries can no longer be use in alias filters.

All these changes, improvements and breaks in compatibility only apply for indices created with ES version 2.0 or higher. For indices creates with ES <= 2.0 the older implementation is used.

It is highly recommended to re-index all your indices with parent and child documents to benefit from all the improvements that come with this refactoring. The easiest way to achieve this is by using the scan and bulk apis using a simple script.

Closes elastic#6107
Closes elastic#8134
  • Loading branch information
martijnvg committed May 29, 2015
1 parent bd381b8 commit 59f58e7
Show file tree
Hide file tree
Showing 25 changed files with 927 additions and 424 deletions.
37 changes: 33 additions & 4 deletions docs/reference/mapping/fields/parent-field.asciidoc
Original file line number Diff line number Diff line change
@@ -1,6 +1,9 @@
[[mapping-parent-field]]
=== `_parent`

TIP: It is highly recommend to reindex all indices with `_parent` field created before version 2.x.
The reason for this is to gain from all the optimizations added with the 2.0 release.

The parent field mapping is defined on a child mapping, and points to
the parent type this child relates to. For example, in case of a `blog`
type and a `blog_tag` type child document, the mapping for `blog_tag`
Expand All @@ -20,8 +23,34 @@ should be:
The mapping is automatically stored and indexed (meaning it can be
searched on using the `_parent` field notation).

==== Field data loading
==== Limitations

The `_parent.type` setting can only point to a type that doesn't exist yet.
This means that a type can't become a parent type after is has been created.

The `parent.type` setting can't point to itself. This means self referential
parent/child isn't supported.

Parent/child queries (`has_child` & `has_parent`) can't be used in index aliases.

==== Global ordinals

Parent-child uses <<index-modules-fielddata-global-ordinals,global ordinals>> to speed up joins and global ordinals need to be rebuilt after any change to a shard.
The more parent id values are stored in a shard, the longer it takes to rebuild global ordinals for the `_parent` field.

Global ordinals, by default, are built lazily: the first parent-child query or aggregation after a refresh will trigger building of global ordinals.
This can introduce a significant latency spike for your users. You can use <<index-modules-fielddata-fielddata-loading,eager_global_ordinals>> to shift the cost of building global ordinals
from query time to refresh time, by mapping the _parent field as follows:

Contrary to other fields the fielddata loading is not `lazy`, but `eager`. The reason for this is that when this
field has been enabled it is going to be used in parent/child queries, which heavily relies on field data to perform
efficiently. This can already be observed during indexing after refresh either automatically or manually has been executed.
==== Memory usage

The only on heap memory used by parent/child is the global ordinals for the `_parent` field.

How much memory is used for the global ordianls for the `_parent` field in the fielddata cache
can be checked via the <<indices-stats,indices stats>> or <<cluster-nodes-stats,nodes stats>>
APIS, eg:

[source,js]
--------------------------------------------------
curl -XGET "http://localhost:9200/_stats/fielddata?pretty&human&fielddata_fields=_parent"
--------------------------------------------------
16 changes: 16 additions & 0 deletions docs/reference/migration/migrate_2_0.asciidoc
Original file line number Diff line number Diff line change
Expand Up @@ -626,3 +626,19 @@ anymore, it will only highlight fields that were queried.
The `match` query with type set to `match_phrase_prefix` is not supported by the
postings highlighter. No highlighted snippets will be returned.

[float]
=== Parent/child

Parent/child has been rewritten completely to reduce memory usage and to execute
`has_child` and `has_parent` queries faster and more efficient. The `_parent` field
uses doc values by default. The refactored and improved implementation is only active
for indices created on or after version 2.0.

In order to benefit for all performance and memory improvements we recommend to reindex all
indices that have the `_parent` field created before was upgraded to 2.0.

The following breaks in backwards compatability have been made on indices with the `_parent` field
created on or after clusters with version 2.0:
* The `type` option on the `_parent` field can only point to a parent type that doesn't exist yet,
so this means that an existing type/mapping can no longer become a parent type.
* The `has_child` and `has_parent` queries can no longer be use in alias filters.
18 changes: 0 additions & 18 deletions docs/reference/query-dsl/has-child-query.asciidoc
Original file line number Diff line number Diff line change
Expand Up @@ -72,21 +72,3 @@ a match:

The `min_children` and `max_children` parameters can be combined with
the `score_mode` parameter.

[float]
=== Memory Considerations

In order to support parent-child joins, all of the (string) parent IDs
must be resident in memory (in the <<index-modules-fielddata,field data cache>>.
Additionally, every child document is mapped to its parent using a long
value (approximately). It is advisable to keep the string parent ID short
in order to reduce memory usage.

You can check how much memory is being used by the `_parent` field in the fielddata cache
using the <<indices-stats,indices stats>> or <<cluster-nodes-stats,nodes stats>>
APIS, eg:

[source,js]
--------------------------------------------------
curl -XGET "http://localhost:9200/_stats/fielddata?pretty&human&fielddata_fields=_parent"
--------------------------------------------------
20 changes: 0 additions & 20 deletions docs/reference/query-dsl/has-parent-query.asciidoc
Original file line number Diff line number Diff line change
Expand Up @@ -47,23 +47,3 @@ matching parent document. The score type can be specified with the
}
}
--------------------------------------------------

[float]
=== Memory Considerations

In order to support parent-child joins, all of the (string) parent IDs
must be resident in memory (in the <<index-modules-fielddata,field data cache>>.
Additionally, every child document is mapped to its parent using a long
value (approximately). It is advisable to keep the string parent ID short
in order to reduce memory usage.

You can check how much memory is being used by the `_parent` field in the fielddata cache
using the <<indices-stats,indices stats>> or <<cluster-nodes-stats,nodes stats>>
APIS, eg:

[source,js]
--------------------------------------------------
curl -XGET "http://localhost:9200/_stats/fielddata?pretty&human&fielddata_fields=_parent"
--------------------------------------------------


Original file line number Diff line number Diff line change
Expand Up @@ -22,6 +22,7 @@
import com.google.common.collect.Lists;
import com.google.common.collect.Maps;
import com.google.common.collect.Sets;
import org.elasticsearch.Version;
import org.elasticsearch.action.ActionListener;
import org.elasticsearch.action.admin.indices.mapping.put.PutMappingClusterStateUpdateRequest;
import org.elasticsearch.cluster.AckedClusterStateUpdateTask;
Expand Down Expand Up @@ -386,9 +387,23 @@ public ClusterState execute(final ClusterState currentState) throws Exception {
if (mergeResult.hasConflicts()) {
throw new MergeMappingException(mergeResult.buildConflicts());
}
} else {
// TODO: can we find a better place for this validation?
// The reason this validation is here is that the mapper service doesn't learn about
// new types all at once , which can create a false error.

// For example in MapperService we can't distinguish between a create index api call
// and a put mapping api call, so we don't which type did exist before.
// Also the order of the mappings may be backwards.
if (Version.indexCreated(indexService.getIndexSettings()).onOrAfter(Version.V_2_0_0) && newMapper.parentFieldMapper().active()) {
if (indexService.mapperService().types().contains(newMapper.parentFieldMapper().type())) {
throw new IllegalArgumentException("can't add a _parent field that points to an already existing type");
}
}
}
}


newMappers.put(index, newMapper);
if (existingMapper != null) {
existingMappers.put(index, existingMapper);
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -53,6 +53,10 @@ public Index index() {
return this.index;
}

public Settings indexSettings() {
return indexSettings;
}

public String nodeName() {
return indexSettings.get("name", "");
}
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -21,16 +21,10 @@

import com.carrotsearch.hppc.ObjectObjectHashMap;
import com.carrotsearch.hppc.cursors.ObjectObjectCursor;
import com.google.common.collect.ImmutableSet;
import com.google.common.collect.ImmutableSortedSet;

import org.apache.lucene.index.IndexReader;
import org.apache.lucene.index.LeafReader;
import org.apache.lucene.index.LeafReaderContext;
import org.apache.lucene.index.*;
import org.apache.lucene.index.MultiDocValues.OrdinalMap;
import org.apache.lucene.index.PostingsEnum;
import org.apache.lucene.index.SortedDocValues;
import org.apache.lucene.index.Terms;
import org.apache.lucene.index.TermsEnum;
import org.apache.lucene.search.DocIdSetIterator;
import org.apache.lucene.util.Accountable;
import org.apache.lucene.util.BytesRef;
Expand All @@ -39,6 +33,7 @@
import org.apache.lucene.util.packed.PackedInts;
import org.apache.lucene.util.packed.PackedLongValues;
import org.elasticsearch.ElasticsearchException;
import org.elasticsearch.Version;
import org.elasticsearch.common.Nullable;
import org.elasticsearch.common.breaker.CircuitBreaker;
import org.elasticsearch.common.collect.ImmutableOpenMap;
Expand All @@ -47,14 +42,8 @@
import org.elasticsearch.common.settings.Settings;
import org.elasticsearch.common.unit.TimeValue;
import org.elasticsearch.index.Index;
import org.elasticsearch.index.fielddata.AtomicOrdinalsFieldData;
import org.elasticsearch.index.fielddata.AtomicParentChildFieldData;
import org.elasticsearch.index.fielddata.FieldDataType;
import org.elasticsearch.index.fielddata.IndexFieldData;
import org.elasticsearch.index.fielddata.*;
import org.elasticsearch.index.fielddata.IndexFieldData.XFieldComparatorSource.Nested;
import org.elasticsearch.index.fielddata.IndexFieldDataCache;
import org.elasticsearch.index.fielddata.IndexParentChildFieldData;
import org.elasticsearch.index.fielddata.RamAccountingTermsEnum;
import org.elasticsearch.index.fielddata.fieldcomparator.BytesRefFieldComparatorSource;
import org.elasticsearch.index.fielddata.ordinals.Ordinals;
import org.elasticsearch.index.fielddata.ordinals.OrdinalsBuilder;
Expand All @@ -70,17 +59,7 @@
import org.elasticsearch.search.MultiValueMode;

import java.io.IOException;
import java.util.ArrayList;
import java.util.Arrays;
import java.util.Collection;
import java.util.Collections;
import java.util.HashMap;
import java.util.HashSet;
import java.util.List;
import java.util.Map;
import java.util.NavigableSet;
import java.util.Set;
import java.util.TreeSet;
import java.util.*;
import java.util.concurrent.TimeUnit;

/**
Expand All @@ -89,7 +68,7 @@
*/
public class ParentChildIndexFieldData extends AbstractIndexFieldData<AtomicParentChildFieldData> implements IndexParentChildFieldData, DocumentTypeListener {

private final NavigableSet<BytesRef> parentTypes;
private final NavigableSet<String> parentTypes;
private final CircuitBreakerService breakerService;

// If child type (a type with _parent field) is added or removed, we want to make sure modifications don't happen
Expand All @@ -100,7 +79,7 @@ public ParentChildIndexFieldData(Index index, @IndexSettings Settings indexSetti
FieldDataType fieldDataType, IndexFieldDataCache cache, MapperService mapperService,
CircuitBreakerService breakerService) {
super(index, indexSettings, fieldNames, fieldDataType, cache);
parentTypes = new TreeSet<>(BytesRef.getUTF8SortedAsUnicodeComparator());
parentTypes = new TreeSet<>();
this.breakerService = breakerService;
for (DocumentMapper documentMapper : mapperService.docMappers(false)) {
beforeCreate(documentMapper);
Expand All @@ -114,15 +93,60 @@ public XFieldComparatorSource comparatorSource(@Nullable Object missingValue, Mu
}

@Override
public ParentChildAtomicFieldData loadDirect(LeafReaderContext context) throws Exception {
public AtomicParentChildFieldData load(LeafReaderContext context) {
if (Version.indexCreated(indexSettings).onOrAfter(Version.V_2_0_0)) {
final LeafReader reader = context.reader();
final NavigableSet<String> parentTypes;
synchronized (lock) {
parentTypes = ImmutableSortedSet.copyOf(this.parentTypes);
}
return new AbstractAtomicParentChildFieldData() {

public Set<String> types() {
return parentTypes;
}

@Override
public SortedDocValues getOrdinalsValues(String type) {
try {
return DocValues.getSorted(reader, ParentFieldMapper.joinField(type));
} catch (IOException e) {
throw new IllegalStateException("cannot load join doc values field for type [" + type + "]", e);
}
}

@Override
public long ramBytesUsed() {
// unknown
return 0;
}

@Override
public Collection<Accountable> getChildResources() {
return Collections.emptyList();
}

@Override
public void close() throws ElasticsearchException {
}
};
} else {
return super.load(context);
}
}

@Override
public AbstractAtomicParentChildFieldData loadDirect(LeafReaderContext context) throws Exception {
LeafReader reader = context.reader();
final float acceptableTransientOverheadRatio = fieldDataType.getSettings().getAsFloat(
"acceptable_transient_overhead_ratio", OrdinalsBuilder.DEFAULT_ACCEPTABLE_OVERHEAD_RATIO
);

final NavigableSet<BytesRef> parentTypes;
final NavigableSet<BytesRef> parentTypes = new TreeSet<>();
synchronized (lock) {
parentTypes = ImmutableSortedSet.copyOf(BytesRef.getUTF8SortedAsUnicodeComparator(), this.parentTypes);
for (String parentType : this.parentTypes) {
parentTypes.add(new BytesRef(parentType));
}
}
boolean success = false;
ParentChildAtomicFieldData data = null;
Expand Down Expand Up @@ -192,7 +216,7 @@ public void beforeCreate(DocumentMapper mapper) {
if (parentFieldMapper.active()) {
// A _parent field can never be added to an existing mapping, so a _parent field either exists on
// a new created or doesn't exists. This is why we can update the known parent types via DocumentTypeListener
if (parentTypes.add(new BytesRef(parentFieldMapper.type()))) {
if (parentTypes.add(parentFieldMapper.type())) {
clear();
}
}
Expand Down Expand Up @@ -320,11 +344,9 @@ public OrdinalMapAndAtomicFieldData(OrdinalMap ordMap, AtomicParentChildFieldDat
@Override
public IndexParentChildFieldData localGlobalDirect(IndexReader indexReader) throws Exception {
final long startTime = System.nanoTime();
final Set<String> parentTypes = new HashSet<>();
final Set<String> parentTypes;
synchronized (lock) {
for (BytesRef type : this.parentTypes) {
parentTypes.add(type.utf8ToString());
}
parentTypes = ImmutableSet.copyOf(this.parentTypes);
}

long ramBytesUsed = 0;
Expand Down Expand Up @@ -352,7 +374,7 @@ public IndexParentChildFieldData localGlobalDirect(IndexReader indexReader) thro
);
}

return new GlobalFieldData(indexReader, fielddata, ramBytesUsed);
return new GlobalFieldData(indexReader, fielddata, ramBytesUsed, perType);
}

private static class GlobalAtomicFieldData extends AbstractAtomicParentChildFieldData {
Expand Down Expand Up @@ -436,16 +458,18 @@ public void close() {

}

private class GlobalFieldData implements IndexParentChildFieldData, Accountable {
public class GlobalFieldData implements IndexParentChildFieldData, Accountable {

private final AtomicParentChildFieldData[] fielddata;
private final IndexReader reader;
private final long ramBytesUsed;
private final Map<String, OrdinalMapAndAtomicFieldData> ordinalMapPerType;

GlobalFieldData(IndexReader reader, AtomicParentChildFieldData[] fielddata, long ramBytesUsed) {
GlobalFieldData(IndexReader reader, AtomicParentChildFieldData[] fielddata, long ramBytesUsed, Map<String, OrdinalMapAndAtomicFieldData> ordinalMapPerType) {
this.reader = reader;
this.ramBytesUsed = ramBytesUsed;
this.fielddata = fielddata;
this.ordinalMapPerType = ordinalMapPerType;
}

@Override
Expand Down Expand Up @@ -514,4 +538,20 @@ public IndexParentChildFieldData localGlobalDirect(IndexReader indexReader) thro

}

/**
* Returns the global ordinal map for the specified type
*/
// TODO: OrdinalMap isn't expose in the field data framework, because it is an implementation detail.
// However the JoinUtil works directly with OrdinalMap, so this is a hack to get access to OrdinalMap
// I don't think we should expose OrdinalMap in IndexFieldData, because only parent/child relies on it and for the
// rest of the code OrdinalMap is an implementation detail, but maybe we can expose it in IndexParentChildFieldData interface?
public static MultiDocValues.OrdinalMap getOrdinalMap(IndexParentChildFieldData indexParentChildFieldData, String type) {
if (indexParentChildFieldData instanceof ParentChildIndexFieldData.GlobalFieldData) {
return ((GlobalFieldData) indexParentChildFieldData).ordinalMapPerType.get(type).ordMap;
} else {
// one segment, local ordinals are global
return null;
}
}

}
Original file line number Diff line number Diff line change
Expand Up @@ -402,6 +402,10 @@ private void addFieldMappers(Collection<FieldMapper> fieldMappers) {
mapperService.addFieldMappers(fieldMappers);
}

public boolean isParent(String type) {
return mapperService.getParentTypes().contains(type);
}

private void addObjectMappers(Collection<ObjectMapper> objectMappers) {
assert mappingLock.isWriteLockedByCurrentThread();
MapBuilder<String, ObjectMapper> builder = MapBuilder.newMapBuilder(this.objectMappers);
Expand Down
Loading

0 comments on commit 59f58e7

Please sign in to comment.