-
Notifications
You must be signed in to change notification settings - Fork 25k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Enable index-time sorting #24055
Enable index-time sorting #24055
Conversation
@@ -164,6 +171,23 @@ public XContentBuilder toXContent(XContentBuilder builder, Params params) throws | |||
return builder; | |||
} | |||
|
|||
static void toXContent(XContentBuilder builder, Sort sort) throws IOException { | |||
builder.startArray(Fields.SORT); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I believe we've been moving away from these Fields
objects in general and just naming the constants or even using "sort"
, depending on the context.
return missing; | ||
} | ||
|
||
final String[] fields; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why package private instead of private?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it is also worth leaving a comment about how this is stored like this for easy reading from the settings. It looks funny to my java-accustomed eye.
fields = new String[0]; | ||
} | ||
if (fields.length > 0 && indexSettings.getIndexVersionCreated().before(Version.V_6_0_0_alpha1_UNRELEASED)) { | ||
throw new IllegalArgumentException("unsupported index.version.created:" + indexSettings.getIndexVersionCreated() + |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How would we have gotten here? Would they need to use the test plugin to set the version? I'm not sure this is worth checking.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure either but this is how we would handle mixed cluster if we allow rolling upgrades for major releases ? I know it's not possible to have a mixed cluster with 5.x and 6.x nodes so maybe just paranoid statement.
fields = INDEX_SORT_FIELD_SETTING.get(settings) | ||
.toArray(new String[0]); | ||
} else { | ||
fields = new String[0]; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Strings.EMPTY_ARRAY
might be worth using here.
throw new IllegalArgumentException("unknown index sort field:[" + fields[i] + "]"); | ||
} | ||
boolean reverse = orders[i] == null ? false : (orders[i] == SortOrder.DESC); | ||
MultiValueMode mode = |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This might be easier to read as
MultiValueMode mode = modes[i];
if (mode == null) {
mode = reverse ? MultiValueMode.MAX : MultiValueMode.MIN;
}
MergePolicy mergePolicy, | ||
@Nullable IndexWriterFactory indexWriterFactory, | ||
@Nullable Supplier<SequenceNumbersService> sequenceNumbersServiceSupplier, | ||
@Nullable Sort indexSort) throws IOException { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we use the old method and put null
all the places that don't use sorting?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't get it. You suggest to change all the call to createEngine
with an explicit null
value ? What would that change ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, I mean add @Nullable Sort indexSort
to one of the old ctors and change all the call sites that don't need a sort to provide null
. Or maybe a random one? I'm not sure about that.
The `index.sort.*` settings define which fields should be used to sort the documents inside each Segment. | ||
|
||
[WARNING] | ||
`nested` fields uses the original sort of the Segment to work which is why they |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nested
fields are not compatible with index sorting because they rely on the default doc_id sorting. An error will be thrown if index sorting is activated on an index that contains nested
fields.
{ | ||
"settings" : { | ||
"index" : { | ||
"sort.field" : ["_type", "date"], <1> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If type is going away maybe we don't want to advertise it here?
- do: | ||
indices.create: | ||
index: test | ||
wait_for_active_shards: 1 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We usually don't have this setting in these tests. If it isn't needed I'd drop it.
settings: | ||
number_of_shards: 1 | ||
number_of_replicas: 1 | ||
index.sort.field: _type |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe it'd be nicer to do it on a field just so we don't rely on type.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you sort on _id
? That'd make the example pretty simple.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I did a first quick pass to understand how things work. I'm wondering whether you considered configuring the index sort in the mappings rather than the settings?
builder.field("mode", ((SortedSetSortField) field).getSelector().toString()); | ||
} | ||
builder.field("missing", field.getMissingValue()); | ||
builder.field("missing", field.getReverse()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
s/missing/reverse/
// The sort order is validated right after the merge of the mapping later in the process. | ||
this.indexSortSupplier = () -> indexSettings.getIndexSortConfig().buildIndexSort( | ||
(name) -> mapperService.fullName(name), | ||
(ft) -> indexFieldData.getForField(ft) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
let's use method references instead?
.toArray(FieldSortSpec[]::new); | ||
} else { | ||
sortSpecs = new FieldSortSpec[0]; | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the if/else is not needed as the code in the if block would work in all cases?
builder.field("mode", ((SortedNumericSortField) field).getSelector().toString()); | ||
} else if (field instanceof SortedSetSortField) { | ||
builder.field("mode", ((SortedSetSortField) field).getSelector().toString()); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should we lowercase the modes?
IndexSortConfig::validateMissingValue, Setting.Property.IndexScope, Setting.Property.Final); | ||
|
||
private static String validateMissingValue(String missing) { | ||
if ("_last".equals(missing) == false && "_first".equals(missing) == false) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
not specific to that PR, but we should create constants for _first and _last
Thanks @jpountz and @nik9000 for reviewing.
I did but currently the mapping is per type and I did not find an easy way to define something at the mapping level rather than the type level. I am not saying we should not do it but it would require some non-trivial changes in how we treat mappings. Maybe we could revisit this when we remove _type entirely ? Defining the index sort in the settings felt natural to me so I followed that path, it requires some validation between the mapping and the settings but I think the change is not that big. WDYT ? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM.
My previous comment about configuring the index sort in the mappings rather than in the settings is not practical. We might want to reconsider when types are gone, but for now I think settings are the way to go.
Can you please add experimental
tags to this feature in the docs saying that we might change the way that the index sort is configured?
|
||
When creating a new index in elasticsearch it is possible to configure how the Segments | ||
inside each Shard will be sorted. By default Lucene does not apply any sort and uses the | ||
internal _doc_id_ to do the ordering. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think saying that segments are ordered by doc id is a bit confusing, it rather works the other way: the ordering of documents inside a segment defines doc ids? Maybe just keep it to a minimum, eg. By default Lucene does not apply any sort.
.
The `index.sort.*` settings define which fields should be used to sort the documents inside each Segment. | ||
|
||
[WARNING] | ||
nested fields are not compatible with index sorting because they rely on the default doc_id sorting. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
s/nested/Nested/ and maybe s/on the default doc_id sorting/on the assumption that nested documents are stored in contiguous doc ids, which can be broken by index sorting/
?
<2> ... in ascending order for the `username` field and in descending order for the `date` field. | ||
|
||
|
||
Index sorting supports the following setting: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
s/setting/settings/
This change adds an index setting to define how the documents should be sorted inside each Segment. It allows any numeric, date, boolean or keyword field inside a mapping to be used to sort the index on disk. It is not allowed to use a `nested` fields inside an index that defines an index sorting since `nested` fields relies on the original sort of the index. This change does not add early termination capabilities in the search layer. This will be added in a follow up. Relates #6720
Thanks @jpountz ! |
* master: Add BucketMetricValue interface (elastic#24188) Enable index-time sorting (elastic#24055) Clarify elasticsearch user uid:gid mapping in Docker docs Update field-names-field.asciidoc (elastic#24178) ElectMasterService.hasEnoughMasterNodes should return false if no masters were found Remove Ubuntu 12.04 (elastic#24161) [Test] Add unit tests for InternalHDRPercentilesTests (elastic#24157) Replicate write failures (elastic#23314) Rename variable in translog simple commit test Strengthen translog commit with open view test Stronger check in translog prepare and commit test Fix translog prepare commit and commit test ingest-node.asciidoc - Clarify json processor (elastic#21876) Painless: more testing for script_stack (elastic#24168)
This change adds an index setting to define how the documents should be sorted inside each Segment.
It allows any numeric, date, boolean or keyword field inside a mapping to be used to sort the index on disk.
It is not allowed to use a
nested
fields inside an index that defines an index sorting sincenested
fields relies on the original sort of the index.This change does not add early termination capabilities in the search layer. This will be added in a follow up.
Relates #6720