Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

time_zone option makes date histograms much slower #28727

Closed
jpountz opened this issue Feb 19, 2018 · 10 comments
Closed

time_zone option makes date histograms much slower #28727

jpountz opened this issue Feb 19, 2018 · 10 comments

Comments

@jpountz
Copy link
Contributor

jpountz commented Feb 19, 2018

I was looking at a slow query where removing the timezone option made the query 4x faster: 8s on average without the time_zone parameter and 32s on average with a time_zone.

This query filters one week of data in February with Europe/Berlin as a timezone (so all documents are on the same side of the daylight saving time boundary) and there are more than 1B matches.

Can we speed this up?

For the record this is less of an issue for timezones that do not implement daylight savings, so users might want to consider switching to Etc/GMT-1 instead of Europe/Berlin if that works for them.

cc @elastic/es-search-aggs

@DaveCTurner
Copy link
Contributor

Joda does really quite a lot of work in trying to find the previous time zone transition, which I'd guess to be the expensive bit (and it appears on the stack trace in the investigation you were working on):

https://github.com/JodaOrg/joda-time/blob/master/src/main/java/org/joda/time/tz/DateTimeZoneBuilder.java#L595

java.time looks to be a lot smarter - it sorts out the transitions by year and involves a cache:

http://hg.openjdk.java.net/jdk8/jdk8/jdk/file/687fd7c7986d/src/share/classes/java/time/zone/ZoneRules.java#l720

Although I'm sure it'd be possible to put a cache around Joda, given that we're working on #27330 I think it would be a good idea to postpone any in-depth work on this until we can see what the effects of that would be.

@jpountz
Copy link
Contributor Author

jpountz commented Feb 19, 2018

This is good to know, thanks for sharing. I suspect that the Rounding class would be one of the easier ones to migrate so we might not have to wait long to know how much java.time helps.

@DaveCTurner
Copy link
Contributor

I suspect that the Rounding class would be one of the easier ones to migrate...

Sounds like you're volunteering!

@Bargs
Copy link

Bargs commented Mar 7, 2018

FYI we have a few issues related to this in Kibana. It's been on my todo list to look into possible solutions for awhile. I'm stoked to see the root problem may be solvable in ES.

@Retenodus
Copy link

Did something break/changed between 5.X and 6.X ? I run three clusters in 5.1, 5.6.8 and 6.1 and DateHistogram aggregations are basically unusable for me with the 6.X cluster. When I put the same data in 5.X clusters, I don't have any performance issues.

@nik9000
Copy link
Member

nik9000 commented Mar 23, 2018

Did something break/changed between 5.X and 6.X ? I run three clusters in 5.1, 5.6.8 and 6.1 and DateHistogram aggregations are basically unusable for me with the 6.X cluster. When I put the same data in 5.X clusters, I don't have any performance issues.

I don't think this is the right place to comment about this. If you can make a bash script that reproduces the issue against a clean cluster I'd file it as a separate issue. If you can't I'd take it to http://discuss.elastic.co/ .

@timroes
Copy link
Contributor

timroes commented May 6, 2018

Since this pops up rather often, I wanted to add another possible performance improvement suggestion.

Before starting aggregating the date histogram, we could actually check if the query had an overall date range filter applied. We could than check on the start and end date in that date range. If both of these dates are within the same daylight saving time period, we could actually use the offset this time zone had during that period as an absolute fixed time zone (e.g. if I am just doing a date histogram with the timezone Europe/Berlin for an overall date range of April 1st, 2018 to July 1st 2018, I could safely rewrite the aggregation to use UTC+2/Etc/GMT-2 timezone instead).

This would of course not solve the performance issue, when doing a date histogram over a period of time, that contained a DST switch, but already would improve for a lot of users, usually looking at smaller date ranges.

See also elastic/kibana#18853 for a detailed meta issue on the Kibana side.

@jpountz
Copy link
Contributor Author

jpountz commented May 9, 2018

I agree this is a good idea. Unfortunately today queries and aggregations are kept completely unaware of each other so this would be hard to implement without adding unwanted dependencies.

Something less efficient than your proposal but that should cover a number of cases already would be to look at the min/max values that exist within the current shard and apply the optimization that you describe if all times within this interval have the same offset. With eg. daily indices, this optimization would still apply in most cases.

jpountz added a commit to jpountz/elasticsearch that referenced this issue May 11, 2018
Date histograms on non-fixed timezones such as `Europe/Paris` proved much slower
than histograms on fixed timezones in elastic#28727. This change mitigates the issue by
using a fixed time zone instead when shard data doesn't cross a transition so
that all timestamps share the same fixed offset. This should be a common case
with daily indices.

NOTE: Rewriting the aggregation doesn't work since the timezone is then also
used on the coordinating node to create empty buckets, which might be out of the
range of data that exists on the shard.

NOTE: In order to be able to get a shard context in the tests, I reused code
from the base query test case by creating a new parent test case for both
queries and aggregations: `AbstractBuilderTestCase`.

Mitigates elastic#28727
jpountz added a commit that referenced this issue May 16, 2018
Date histograms on non-fixed timezones such as `Europe/Paris` proved much slower
than histograms on fixed timezones in #28727. This change mitigates the issue by
using a fixed time zone instead when shard data doesn't cross a transition so
that all timestamps share the same fixed offset. This should be a common case
with daily indices.

NOTE: Rewriting the aggregation doesn't work since the timezone is then also
used on the coordinating node to create empty buckets, which might be out of the
range of data that exists on the shard.

NOTE: In order to be able to get a shard context in the tests, I reused code
from the base query test case by creating a new parent test case for both
queries and aggregations: `AbstractBuilderTestCase`.

Mitigates #28727
jpountz added a commit that referenced this issue May 16, 2018
Date histograms on non-fixed timezones such as `Europe/Paris` proved much slower
than histograms on fixed timezones in #28727. This change mitigates the issue by
using a fixed time zone instead when shard data doesn't cross a transition so
that all timestamps share the same fixed offset. This should be a common case
with daily indices.

NOTE: Rewriting the aggregation doesn't work since the timezone is then also
used on the coordinating node to create empty buckets, which might be out of the
range of data that exists on the shard.

NOTE: In order to be able to get a shard context in the tests, I reused code
from the base query test case by creating a new parent test case for both
queries and aggregations: `AbstractBuilderTestCase`.

Mitigates #28727
ywelsch pushed a commit to ywelsch/elasticsearch that referenced this issue May 23, 2018
…30534)

Date histograms on non-fixed timezones such as `Europe/Paris` proved much slower
than histograms on fixed timezones in elastic#28727. This change mitigates the issue by
using a fixed time zone instead when shard data doesn't cross a transition so
that all timestamps share the same fixed offset. This should be a common case
with daily indices.

NOTE: Rewriting the aggregation doesn't work since the timezone is then also
used on the coordinating node to create empty buckets, which might be out of the
range of data that exists on the shard.

NOTE: In order to be able to get a shard context in the tests, I reused code
from the base query test case by creating a new parent test case for both
queries and aggregations: `AbstractBuilderTestCase`.

Mitigates elastic#28727
@polyfractal
Copy link
Contributor

Working my way through agg issues. @jpountz, is this closeable now that #30534 merged, or was that just a partial solution to the slowdown?

@jpountz
Copy link
Contributor Author

jpountz commented Jun 4, 2018

It is partial, but I think it's good enough to close this issue. Thanks for the ping.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

7 participants