Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make range query rounding consistent #50237

Merged
merged 14 commits into from
Jan 31, 2020

Conversation

cbuescher
Copy link
Member

Currently the rounding for date math used in range queries behaves differently
for date and date_range as explained in #50009. The behaviour on date
fields is the one we document in
https://www.elastic.co/guide/en/elasticsearch/reference/current/query-dsl-range-query.html#range-query-date-math-rounding.
This change adapts the rounding behaviour for RangeType.DATE so it uses the same
logic as the date type.

Closes #50009

Currently the rounding for date math used in range queries behaves differently
for `date` and `date_range` as explained in elastic#50009. The behaviour on `date`
fields is the one we document in
https://www.elastic.co/guide/en/elasticsearch/reference/current/query-dsl-range-query.html#range-query-date-math-rounding.
This change adapts the rounding behaviour for RangeType.DATE so it uses the same
logic as the `date` type.

Closes elastic#50009
@cbuescher cbuescher added >breaking WIP :Search/Search Search-related issues that do not fall into other categories v8.0.0 labels Dec 16, 2019
@elasticmachine
Copy link
Collaborator

Pinging @elastic/es-search (:Search/Search)

@cbuescher
Copy link
Member Author

Opening as WIP to illustrate the issue better. Original issue needs to be discussed to decide whether we see the current behaviour as a bug, in which case this could be considered a bugfix, or as the standing behaviour, in which case this should be documented as a breaking issue if we want to change it to make the date/date_range behaviour consistent.

@cbuescher cbuescher removed the WIP label Dec 17, 2019
@cbuescher
Copy link
Member Author

@elasticmachine update branch

@cbuescher
Copy link
Member Author

@jpountz thanks for the review, after you took a look I still extended testing a bit by adding some randomized testing to make sure we treat bounds the same on date and date_range fields with this change now. I also added a note to the breaking search changes. I'm not sure if I should go into more details on that note or if there is a better place for that, just in case here's another attempt at summarizing this change and what to do in case users want to keep the "old" behaviour:


Before this change, a range query with an upper bound not specified up to milliseconds like e.g. ”lte”:”2019-12-09” or when using date math day rounding like ”lte”:”2019-12-09||/d” would translate to an included upper bound of 2019-12-09T00:00:00.000.
With this change, the above will be rounded up to include the full day up to 2019-12-09T23:59:59.999

The reverse will happen for lower bounds specified with “gt”. Both ”gt”:”2019-12-09” and ”gt”:”2019-12-09||/d” are currently translated to 2019-12-09T00:00:00. With this change we will translate to start of next day 2019-12-10T00:00:00
Both of these cases are already handled like this on a plain date field.

In order for user keep the current behaviour, they will need to change their range queries on date_range field using lte or gt from e.g.

"lte”:”2019-12-09" -> ”lt”:”2019-12-09” (or fully specify ”lte”:”2019-12-09T00:00:00.000)
"gt”:”2019-12-09" -> ”gte”:”2019-12-09” (or fully specify ”gt”:”2019-12-09T00:00:00.000)


Can you take another look at the test changes and if you think the phrasing in the migration docs is sufficient?
I remember we talked about the possibility of backporting this to 7.x but currently I'd refrain from that because the behavioural changes of the query are subtle and easy to miss. Let me know if you know of any other good other place to document this.

Copy link
Contributor

@jpountz jpountz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The change to src/main look good, but I'd like to avoid relying on string representations in tests. Could we instead test a rounded range against the expected non-rounded range, e.g. something like assertEquals(range on "gt 2020-01-20", range on "gt 2020-01-20 23:59:59.999Z")?

Since this is a bug fix, I think this should get backported to 7.x.


[float]
==== Consistent rounding of range queries on dates
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Rephrase this title, and maybe the paragraph below as well to make it clear that only date ranges changed behavior? A majority of users are using date fields but only a minority uses date_range fields, so it would help if users could notice that this change doesn't apply to them only by looking at the title.

@cbuescher
Copy link
Member Author

@jpountz thanks for the review, I pushed updates to the migration notes and found a way of avoiding relying on string representations in the tests comparing the bounds with the behaviour on the plain date field. Do you want to take another look at those changes or do you think this is good to go?

cbuescher pushed a commit to cbuescher/elasticsearch that referenced this pull request Jan 28, 2020
Looking into elastic#50237 I realized that two of the examples given in the
documentation around date math rounding for range queries on date fields using
`gt` and `lt` is slightly off by a nanosecond. This PR changes this to the
bounds that are currently parsed using these parameters.
Copy link
Contributor

@jpountz jpountz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Two small comments, LGTM otherwise. Thanks for fixing it @cbuescher.

docs/reference/migration/migrate_8_0/search.asciidoc Outdated Show resolved Hide resolved

- do:
search:
rest_total_hits_as_int: true
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this parameter is supposed to only exist for the 6.x -> 7.x transition, let's use track_total_hits: true instead?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just saw that “rest_total_hits_as_int” is still used all over the place in the yml tests, also in other tests in same test suite. I agree it would be good to get rid of it when its supposed to go away, but I’d like to keep it for consistency with the rest atm. Maybe we should open an issue to clean up all of those in a separate PR?

@cbuescher cbuescher merged commit 7cec5f9 into elastic:master Jan 31, 2020
cbuescher pushed a commit to cbuescher/elasticsearch that referenced this pull request Jan 31, 2020
Currently the rounding used in range queries can behave differently for `date`
and `date_range` as explained in elastic#50009. The behaviour on `date` fields is
the one we document in https://www.elastic.co/guide/en/elasticsearch/reference/current/query-dsl-range-query.html#range-query-date-math-rounding.
This change adapts the rounding behaviour for RangeType.DATE so it uses the
same logic as the `date` for the `date_range` type.

Closes elastic#50009
cbuescher pushed a commit that referenced this pull request Jan 31, 2020
)

Currently the rounding used in range queries can behave differently for `date`
and `date_range` as explained in #50009. The behaviour on `date` fields is
the one we document in https://www.elastic.co/guide/en/elasticsearch/reference/current/query-dsl-range-query.html#range-query-date-math-rounding.
This change adapts the rounding behaviour for RangeType.DATE so it uses the
same logic as the `date` for the `date_range` type.

Backport of #50237
cbuescher pushed a commit that referenced this pull request Jan 31, 2020
After backport of #50237 in #51741 the skip version in this test can be lowered.
cbuescher pushed a commit that referenced this pull request Apr 6, 2020
Looking into #50237 I realized that two of the examples given in the
documentation around date math rounding for range queries on date fields using
`gt` and `lt` is slightly off by a nanosecond. This PR changes this to the
bounds that are currently parsed using these parameters.
cbuescher pushed a commit that referenced this pull request Apr 6, 2020
Looking into #50237 I realized that two of the examples given in the
documentation around date math rounding for range queries on date fields using
`gt` and `lt` is slightly off by a nanosecond. This PR changes this to the
bounds that are currently parsed using these parameters.
cbuescher pushed a commit that referenced this pull request Apr 6, 2020
Looking into #50237 I realized that two of the examples given in the
documentation around date math rounding for range queries on date fields using
`gt` and `lt` is slightly off by a nanosecond. This PR changes this to the
bounds that are currently parsed using these parameters.
cbuescher pushed a commit that referenced this pull request Apr 6, 2020
Looking into #50237 I realized that two of the examples given in the
documentation around date math rounding for range queries on date fields using
`gt` and `lt` is slightly off by a nanosecond. This PR changes this to the
bounds that are currently parsed using these parameters.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
>breaking >bug :Search/Search Search-related issues that do not fall into other categories v7.7.0 v8.0.0-alpha1
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Range query rounding error when used with date_range fields
4 participants