Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix a DST error in date_histogram #52016

Merged
merged 6 commits into from
Feb 11, 2020
Merged

Fix a DST error in date_histogram #52016

merged 6 commits into from
Feb 11, 2020

Conversation

nik9000
Copy link
Member

@nik9000 nik9000 commented Feb 6, 2020

When date_histogram attempts to optimize itself it for a particular
time zone it checks to see if the entire shard is within the same
"transition". Most time zone transition once every size months or
thereabouts so the optimization can usually kicks in.

But it crashes when you attempt feed it a time zone who's last DST
transition was before epoch. The reason for this is a little twisted:
before this patch it'd find the next and previous transitions in
milliseconds since epoch. Then it'd cast them to Longs and pass them
into the DateFieldType to check if the shard's contents were within
the range. The trouble is they are then converted to Strings which are
then parsed back to Instants which are then convertd to longs. And
the parser doesn't like most negative numbers. And everything before
epoch is negative.

This change removes the
long -> Long -> String -> Instant -> long chain in favor of
passing the longs directly into the DateFieldType. Sadly, this means
that MappedFieldType gets a long specialized implementation of
isFieldWithinQuery which is a little lame, but given that we always
call it with a long in this context it feels like the lesser of two
evils. Not because it is more efficient. It is a little. No. This is the
less evil way to solve this problem because it is less confusing to
trace the execution. The parsing code is fairly complex and we can just
skip it entirely because we already have longs.

Closes #50265

When `date_histogram` attempts to optimize itself it for a particular
time zone it checks to see if the entire shard is within the same
"transition". Most time zone transition once every size months or
thereabouts so the optimization can usually kicks in.

*But* it crashes when you attempt feed it a time zone who's last DST
transition was before epoch. The reason for this is a little twisted:
before this patch it'd find the next and previous transitions in
milliseconds since epoch. Then it'd cast them to `Long`s and pass them
into the `DateFieldType` to check if the shard's contents were within
the range. The trouble is they are then converted to `String`s which are
*then* parsed back to `Instant`s which are then convertd to `long`s. And
the parser doesn't like most negative numbers. And everything before
epoch is negative.

This change removes the
`long` -> `Long` -> `String` -> `Instant` -> `long` chain in favor of
passing the `long`s directly into the `DateFieldType`. Sadly, this means
that `MappedFieldType` gets a `long` specialized implementation of
`isFieldWithinQuery` which is a little lame, but given that we always
call it with a long in this context it feels like the lesser of two
evils. Not because it is more efficient. It is a little. No. This is the
less evil way to solve this problem because it is less confusing to
trace the execution. The parsing code is fairly complex and we can just
skip it entirely because we already *have* longs.

Closes elastic#50265
@elasticmachine
Copy link
Collaborator

Pinging @elastic/es-analytics-geo (:Analytics/Aggregations)

@nik9000 nik9000 requested review from jpountz and abdonpijpelink and removed request for abdonpijpelink February 6, 2020 21:44
Copy link
Member

@not-napoleon not-napoleon left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Nice work finding this one. I like the solution, not going through some weird coercion chain seems like a win.

* Use this instead of
* {@link #isFieldWithinQuery(IndexReader, Object, Object, boolean, boolean, ZoneId, DateMathParser, QueryRewriteContext)}
* when you *know* the {@code fromInclusive} and {@code toInclusive} are always
* floats.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you mean they're always longs?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1

@nik9000
Copy link
Member Author

nik9000 commented Feb 6, 2020

LGTM. Nice work finding this one. I like the solution, not going through some weird coercion chain seems like a win.

@jimczi actually found it! He just didn't get a chance to fix it before going on a long trip. He did mention that @jpountz may want to look at it so I added him as a reviewer too.

Copy link
Contributor

@jpountz jpountz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd really like to avoid adding the method to MappedFieldType, can we have it only live in DateFieldType and do an instanceof check in the aggregator, and disable the optimization if the instanceof check fails? I think we also need javadocs to clarify that the longs are a number of millis, which is not obvious at all if the field is a date_millis. By the way we might want to have an explicit test for date_millis to make sure it does the right thing with this new method?

@nik9000
Copy link
Member Author

nik9000 commented Feb 10, 2020

I'd really like to avoid adding the method to MappedFieldType, can we have it only live in DateFieldType and do an instanceof check in the aggregator, and disable the optimization if the instanceof check fails? I think we also need javadocs to clarify that the longs are a number of millis, which is not obvious at all if the field is a date_millis. By the way we might want to have an explicit test for date_millis to make sure it does the right thing with this new method?

I can do all those things. I'm not a fan of instanceof for this sort of thing but if you feel like it is better I'm happy to do it. I agree having a method that only makes sense for dates on MappedFieldType is a bit strange.

@nik9000
Copy link
Member Author

nik9000 commented Feb 10, 2020

Good call on the date_nanos test, and the resolution of the interface.

@nik9000
Copy link
Member Author

nik9000 commented Feb 11, 2020

@jpountz, I've added something to properly handle nanos. It feels properly paranoid to me now.

@nik9000
Copy link
Member Author

nik9000 commented Feb 11, 2020

@elasticmachine run elasticsearch-ci/packaging-sample-matrix-unix

Copy link
Contributor

@jpountz jpountz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks Nik, this looks good to me now. As you guessed, I meant date_nanos, not date_millis. :) I also like the way you refactored the rewrite of the time zone, I find it easier to read now.

@nik9000
Copy link
Member Author

nik9000 commented Feb 11, 2020

I also like the way you refactored the rewrite of the time zone, I find it easier to read now.

@nik9000
Copy link
Member Author

nik9000 commented Feb 11, 2020

Thanks @not-napoleon and @jpountz !

@nik9000 nik9000 merged commit da2b67d into elastic:master Feb 11, 2020
nik9000 added a commit to nik9000/elasticsearch that referenced this pull request Feb 11, 2020
When `date_histogram` attempts to optimize itself it for a particular
time zone it checks to see if the entire shard is within the same
"transition". Most time zone transition once every size months or
thereabouts so the optimization can usually kicks in.

*But* it crashes when you attempt feed it a time zone who's last DST
transition was before epoch. The reason for this is a little twisted:
before this patch it'd find the next and previous transitions in
milliseconds since epoch. Then it'd cast them to `Long`s and pass them
into the `DateFieldType` to check if the shard's contents were within
the range. The trouble is they are then converted to `String`s which are
*then* parsed back to `Instant`s which are then convertd to `long`s. And
the parser doesn't like most negative numbers. And everything before
epoch is negative.

This change removes the
`long` -> `Long` -> `String` -> `Instant` -> `long` chain in favor of
passing the `long` -> `Instant` -> `long` which avoids the fairly complex
parsing code and handles a bunch of interesting edge cases around
epoch. And other edge cases around `date_nanos`.

Closes elastic#50265
nik9000 added a commit that referenced this pull request Feb 12, 2020
When `date_histogram` attempts to optimize itself it for a particular
time zone it checks to see if the entire shard is within the same
"transition". Most time zone transition once every size months or
thereabouts so the optimization can usually kicks in.

*But* it crashes when you attempt feed it a time zone who's last DST
transition was before epoch. The reason for this is a little twisted:
before this patch it'd find the next and previous transitions in
milliseconds since epoch. Then it'd cast them to `Long`s and pass them
into the `DateFieldType` to check if the shard's contents were within
the range. The trouble is they are then converted to `String`s which are
*then* parsed back to `Instant`s which are then convertd to `long`s. And
the parser doesn't like most negative numbers. And everything before
epoch is negative.

This change removes the
`long` -> `Long` -> `String` -> `Instant` -> `long` chain in favor of
passing the `long` -> `Instant` -> `long` which avoids the fairly complex
parsing code and handles a bunch of interesting edge cases around
epoch. And other edge cases around `date_nanos`.

Closes #50265
nik9000 added a commit to nik9000/elasticsearch that referenced this pull request Feb 12, 2020
When `date_histogram` attempts to optimize itself it for a particular
time zone it checks to see if the entire shard is within the same
"transition". Most time zone transition once every size months or
thereabouts so the optimization can usually kicks in.

*But* it crashes when you attempt feed it a time zone who's last DST
transition was before epoch. The reason for this is a little twisted:
before this patch it'd find the next and previous transitions in
milliseconds since epoch. Then it'd cast them to `Long`s and pass them
into the `DateFieldType` to check if the shard's contents were within
the range. The trouble is they are then converted to `String`s which are
*then* parsed back to `Instant`s which are then convertd to `long`s. And
the parser doesn't like most negative numbers. And everything before
epoch is negative.

This change removes the
`long` -> `Long` -> `String` -> `Instant` -> `long` chain in favor of
passing the `long` -> `Instant` -> `long` which avoids the fairly complex
parsing code and handles a bunch of interesting edge cases around
epoch. And other edge cases around `date_nanos`.

Closes elastic#50265
nik9000 added a commit that referenced this pull request Feb 13, 2020
When `date_histogram` attempts to optimize itself it for a particular
time zone it checks to see if the entire shard is within the same
"transition". Most time zone transition once every size months or
thereabouts so the optimization can usually kicks in.

*But* it crashes when you attempt feed it a time zone who's last DST
transition was before epoch. The reason for this is a little twisted:
before this patch it'd find the next and previous transitions in
milliseconds since epoch. Then it'd cast them to `Long`s and pass them
into the `DateFieldType` to check if the shard's contents were within
the range. The trouble is they are then converted to `String`s which are
*then* parsed back to `Instant`s which are then convertd to `long`s. And
the parser doesn't like most negative numbers. And everything before
epoch is negative.

This change removes the
`long` -> `Long` -> `String` -> `Instant` -> `long` chain in favor of
passing the `long` -> `Instant` -> `long` which avoids the fairly complex
parsing code and handles a bunch of interesting edge cases around
epoch. And other edge cases around `date_nanos`.

Closes #50265
nik9000 added a commit to nik9000/elasticsearch that referenced this pull request Feb 13, 2020
Now that we've backported elastic#52016 we can run its tests when we're
performance backwards compatibility testing.
nik9000 added a commit to nik9000/elasticsearch that referenced this pull request Feb 13, 2020
Now that we've backported elastic#52016 we can run its tests when we're
performance backwards compatibility testing.
nik9000 added a commit that referenced this pull request Feb 13, 2020
Now that we've backported #52016 we can run its tests when we're
performance backwards compatibility testing.
nik9000 added a commit that referenced this pull request Feb 13, 2020
Now that we've backported #52016 we can run its tests when we're
performance backwards compatibility testing.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

date_histogram parse_exception in time_zone America/Phoenix
5 participants