-
Notifications
You must be signed in to change notification settings - Fork 25k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix sort on nanosecond date fields with missing values #74760
Conversation
For missing values on date fields we use Long.MIN_VALUE by default. This is okay when the resolution of the field is milliseconds. For nanoseconds though, negative values can lead to IllegalArgumentExpections when we try to format them internally. This change fixes this by explicitely setting the minimum value to 0L (which corresponds to 1970-01-01T00:00:00.000 for nanosecond resolution) when no other explicit missing value is defined and the target numeric type is a nanosecond type (this is true for nanosecond fields and when "numeric_type" is explicitely set). This way we correct the behaviour for single typed indices and cases where we are sorting across multiple indices with mixed "date" and "date_nanos" type where "numeric_type" is set in the sort definition. Closes elastic#73763
Pinging @elastic/es-search (Team:Search) |
@elasticmachine update branch |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I left some comments.
final boolean min = sortMissingFirst(missingValue) ^ reverse; | ||
missingValue = min ? 0L : Long.MAX_VALUE; | ||
} | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This logic could move to IndexNumericFieldData#missingObject
? IndexFieldData#missingObject
doesn't need to be final, we need to pick the value depending on the targetNumericType
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I looked into this a bit and one problem I couldn't solve with this is that "missingObject" is part of XFieldComparatorSource
, not IndexFieldData
. That probably woudl mean overwriting it in special versions of LongValuesComparatorSource, and for the things I tried it wasn't always clear how to ther the "targetNumericType" from the sort (e.g. reduceType() returns SortField.Type
that doesn't even differentiate between Long, Data and Datenanos like IndexNumericFieldData.NumericType
does). Maybe I'm missing something but from what I tried this current location seems less complex to me.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The targetNumericType
could be added to the LongValuesSource
and checked in missingObject
? It's not really about complexity here, missingObject
is called in several contexts so it needs a fix.
SearchResponse searchResponse = client().prepareSearch(index) | ||
.addSort(SortBuilders.fieldSort("mydate").order(SortOrder.ASC).setFormat(format).setNumericType("date_nanos")) | ||
.get(); | ||
assertHitsInOrder(searchResponse, new String[] { "1", "2", "4", "5", "3", "6" }); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A tiebreaker should be used to ensure that 3
is always sorted before 6
(same value otherwise).
.setNumericType("date_nanos") | ||
) | ||
.get(); | ||
assertHitsInOrder(searchResponse, new String[] { "3", "6", "1", "2", "4", "5" }); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same here
@jimczi thanks for the review, I improved the tests and left a comment regarding moving the implementation logic elsewhere and the problems I encountered. If you have other suggestions please let me know, otherwise I hope its also okay if we leave it where it is. |
@jimczi I pushed another commit that adresses your latest comment, should be ready for another look I think |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I left some minor comments, LGTM otherwise
@Nullable Object missingValue, | ||
MultiValueMode sortMode, | ||
Nested nested, | ||
NumericType targetNumericType |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The targetNumericType is implicit here ?
@Nullable Object missingValue, | ||
MultiValueMode sortMode, | ||
Nested nested, | ||
NumericType targetNumericType |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The targetNumericType
is implicit here, no need to add it.
// special case to prevent negative values that would cause invalid nanosecond ranges | ||
if (sortMissingFirst(missingValue) || sortMissingLast(missingValue)) { | ||
final boolean min = sortMissingFirst(missingValue) ^ reversed; | ||
return min ? 0L : Long.MAX_VALUE; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should use DateUtils#MAX_NANOSECOND_INSTANT
as the max
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We need the converted nanoseconds value of that Instant though I think, which should be the same as Long.MAX_VALUE? I introduced another constant (8c6df48#diff-09ba5f62fc8de8923f01213874ef48657920b2cc1428ff4f27f5f06169ee8f1aR207) in my last commit to not have to do that coversion every time, just if you want to check.
@elasticmachine update branch |
For missing values on date fields we use Long.MIN_VALUE by default. This is okay when the resolution of the field is milliseconds. For nanoseconds though, negative values can lead to IllegalArgumentExpections when we try to format them internally. This change fixes this by explicitely setting the minimum value to 0L (which corresponds to 1970-01-01T00:00:00.000 for nanosecond resolution) when no other explicit missing value is defined and the target numeric type is a nanosecond type (this is true for nanosecond fields and when "numeric_type" is explicitely set). This way we correct the behaviour for single typed indices and cases where we are sorting across multiple indices with mixed "date" and "date_nanos" type where "numeric_type" is set in the sort definition. Closes elastic#73763
For missing values on date fields we use Long.MIN_VALUE by default. This is okay when the resolution of the field is milliseconds. For nanoseconds though, negative values can lead to IllegalArgumentExpections when we try to format them internally. This change fixes this by explicitely setting the minimum value to 0L (which corresponds to 1970-01-01T00:00:00.000 for nanosecond resolution) when no other explicit missing value is defined and the target numeric type is a nanosecond type (this is true for nanosecond fields and when "numeric_type" is explicitely set). This way we correct the behaviour for single typed indices and cases where we are sorting across multiple indices with mixed "date" and "date_nanos" type where "numeric_type" is set in the sort definition. Closes elastic#73763
#75064) For missing values on date fields we use Long.MIN_VALUE by default. This is okay when the resolution of the field is milliseconds. For nanoseconds though, negative values can lead to IllegalArgumentExpections when we try to format them internally. This change fixes this by explicitely setting the minimum value to 0L (which corresponds to 1970-01-01T00:00:00.000 for nanosecond resolution) when no other explicit missing value is defined and the target numeric type is a nanosecond type (this is true for nanosecond fields and when "numeric_type" is explicitely set). This way we correct the behaviour for single typed indices and cases where we are sorting across multiple indices with mixed "date" and "date_nanos" type where "numeric_type" is set in the sort definition. Closes #73763
…#75065) For missing values on date fields we use Long.MIN_VALUE by default. This is okay when the resolution of the field is milliseconds. For nanoseconds though, negative values can lead to IllegalArgumentExpections when we try to format them internally. This change fixes this by explicitely setting the minimum value to 0L (which corresponds to 1970-01-01T00:00:00.000 for nanosecond resolution) when no other explicit missing value is defined and the target numeric type is a nanosecond type (this is true for nanosecond fields and when "numeric_type" is explicitely set). This way we correct the behaviour for single typed indices and cases where we are sorting across multiple indices with mixed "date" and "date_nanos" type where "numeric_type" is set in the sort definition. Closes #73763
For missing values on date fields we use Long.MIN_VALUE by default. This is okay
when the resolution of the field is milliseconds. For nanoseconds though,
negative values can lead to IllegalArgumentExpections when we try to format them
internally.
This change fixes this by explicitely setting the minimum value to 0L (which
corresponds to 1970-01-01T00:00:00.000 for nanosecond resolution) when no other
explicit missing value is defined and the target numeric type is a nanosecond
type (this is true for nanosecond fields and when "numeric_type" is explicitely
set). This way we correct the behaviour for single typed indices and cases where
we are sorting across multiple indices with mixed "date" and "date_nanos" type
where "numeric_type" is set in the sort definition.
Closes #73763