`time_zone` option makes date histograms much slower #28727

jpountz · 2018-02-19T15:28:56Z

I was looking at a slow query where removing the timezone option made the query 4x faster: 8s on average without the time_zone parameter and 32s on average with a time_zone.

This query filters one week of data in February with Europe/Berlin as a timezone (so all documents are on the same side of the daylight saving time boundary) and there are more than 1B matches.

Can we speed this up?

For the record this is less of an issue for timezones that do not implement daylight savings, so users might want to consider switching to Etc/GMT-1 instead of Europe/Berlin if that works for them.

cc @elastic/es-search-aggs

The text was updated successfully, but these errors were encountered:

DaveCTurner · 2018-02-19T16:06:02Z

Joda does really quite a lot of work in trying to find the previous time zone transition, which I'd guess to be the expensive bit (and it appears on the stack trace in the investigation you were working on):

https://github.com/JodaOrg/joda-time/blob/master/src/main/java/org/joda/time/tz/DateTimeZoneBuilder.java#L595

java.time looks to be a lot smarter - it sorts out the transitions by year and involves a cache:

http://hg.openjdk.java.net/jdk8/jdk8/jdk/file/687fd7c7986d/src/share/classes/java/time/zone/ZoneRules.java#l720

Although I'm sure it'd be possible to put a cache around Joda, given that we're working on #27330 I think it would be a good idea to postpone any in-depth work on this until we can see what the effects of that would be.

jpountz · 2018-02-19T17:46:04Z

This is good to know, thanks for sharing. I suspect that the Rounding class would be one of the easier ones to migrate so we might not have to wait long to know how much java.time helps.

DaveCTurner · 2018-02-19T18:04:19Z

I suspect that the Rounding class would be one of the easier ones to migrate...

Sounds like you're volunteering!

Bargs · 2018-03-07T20:16:14Z

FYI we have a few issues related to this in Kibana. It's been on my todo list to look into possible solutions for awhile. I'm stoked to see the root problem may be solvable in ES.

Retenodus · 2018-03-23T10:09:34Z

Did something break/changed between 5.X and 6.X ? I run three clusters in 5.1, 5.6.8 and 6.1 and DateHistogram aggregations are basically unusable for me with the 6.X cluster. When I put the same data in 5.X clusters, I don't have any performance issues.

nik9000 · 2018-03-23T14:21:12Z

Did something break/changed between 5.X and 6.X ? I run three clusters in 5.1, 5.6.8 and 6.1 and DateHistogram aggregations are basically unusable for me with the 6.X cluster. When I put the same data in 5.X clusters, I don't have any performance issues.

I don't think this is the right place to comment about this. If you can make a bash script that reproduces the issue against a clean cluster I'd file it as a separate issue. If you can't I'd take it to http://discuss.elastic.co/ .

timroes · 2018-05-06T11:53:06Z

Since this pops up rather often, I wanted to add another possible performance improvement suggestion.

Before starting aggregating the date histogram, we could actually check if the query had an overall date range filter applied. We could than check on the start and end date in that date range. If both of these dates are within the same daylight saving time period, we could actually use the offset this time zone had during that period as an absolute fixed time zone (e.g. if I am just doing a date histogram with the timezone Europe/Berlin for an overall date range of April 1st, 2018 to July 1st 2018, I could safely rewrite the aggregation to use UTC+2/Etc/GMT-2 timezone instead).

This would of course not solve the performance issue, when doing a date histogram over a period of time, that contained a DST switch, but already would improve for a lot of users, usually looking at smaller date ranges.

See also elastic/kibana#18853 for a detailed meta issue on the Kibana side.

jpountz · 2018-05-09T16:59:57Z

I agree this is a good idea. Unfortunately today queries and aggregations are kept completely unaware of each other so this would be hard to implement without adding unwanted dependencies.

Something less efficient than your proposal but that should cover a number of cases already would be to look at the min/max values that exist within the current shard and apply the optimization that you describe if all times within this interval have the same offset. With eg. daily indices, this optimization would still apply in most cases.

Date histograms on non-fixed timezones such as `Europe/Paris` proved much slower than histograms on fixed timezones in elastic#28727. This change mitigates the issue by using a fixed time zone instead when shard data doesn't cross a transition so that all timestamps share the same fixed offset. This should be a common case with daily indices. NOTE: Rewriting the aggregation doesn't work since the timezone is then also used on the coordinating node to create empty buckets, which might be out of the range of data that exists on the shard. NOTE: In order to be able to get a shard context in the tests, I reused code from the base query test case by creating a new parent test case for both queries and aggregations: `AbstractBuilderTestCase`. Mitigates elastic#28727

Date histograms on non-fixed timezones such as `Europe/Paris` proved much slower than histograms on fixed timezones in #28727. This change mitigates the issue by using a fixed time zone instead when shard data doesn't cross a transition so that all timestamps share the same fixed offset. This should be a common case with daily indices. NOTE: Rewriting the aggregation doesn't work since the timezone is then also used on the coordinating node to create empty buckets, which might be out of the range of data that exists on the shard. NOTE: In order to be able to get a shard context in the tests, I reused code from the base query test case by creating a new parent test case for both queries and aggregations: `AbstractBuilderTestCase`. Mitigates #28727

…30534) Date histograms on non-fixed timezones such as `Europe/Paris` proved much slower than histograms on fixed timezones in elastic#28727. This change mitigates the issue by using a fixed time zone instead when shard data doesn't cross a transition so that all timestamps share the same fixed offset. This should be a common case with daily indices. NOTE: Rewriting the aggregation doesn't work since the timezone is then also used on the coordinating node to create empty buckets, which might be out of the range of data that exists on the shard. NOTE: In order to be able to get a shard context in the tests, I reused code from the base query test case by creating a new parent test case for both queries and aggregations: `AbstractBuilderTestCase`. Mitigates elastic#28727

polyfractal · 2018-06-04T14:49:14Z

Working my way through agg issues. @jpountz, is this closeable now that #30534 merged, or was that just a partial solution to the slowdown?

jpountz · 2018-06-04T14:57:23Z

It is partial, but I think it's good enough to close this issue. Thanks for the ping.

jpountz added >enhancement :Analytics/Aggregations Aggregations labels Feb 19, 2018

jpountz mentioned this issue Feb 19, 2018

Add a benchmark for date histograms with a time_zone elastic/rally-tracks#38

Closed

timroes mentioned this issue May 6, 2018

Date histograms can be very slow due to time_zone elastic/kibana#18853

Closed

jpountz mentioned this issue May 11, 2018

Mitigate date histogram slowdowns with non-fixed timezones. #30534

Merged

jpountz closed this as completed Jun 4, 2018

timroes mentioned this issue Mar 6, 2019

Default search query at "Discover" view is very slow elastic/kibana#32328

Closed

$@polyfractal$ polyfractal mentioned this issue Sep 10, 2019

Slow date_histogram after upgrading to 7.3.0 #45702

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

`time_zone` option makes date histograms much slower #28727

`time_zone` option makes date histograms much slower #28727

jpountz commented Feb 19, 2018 •

edited

Loading

DaveCTurner commented Feb 19, 2018

jpountz commented Feb 19, 2018

DaveCTurner commented Feb 19, 2018

Bargs commented Mar 7, 2018

Retenodus commented Mar 23, 2018

nik9000 commented Mar 23, 2018

timroes commented May 6, 2018

jpountz commented May 9, 2018

polyfractal commented Jun 4, 2018

jpountz commented Jun 4, 2018

time_zone option makes date histograms much slower #28727

time_zone option makes date histograms much slower #28727

Comments

jpountz commented Feb 19, 2018 • edited Loading

DaveCTurner commented Feb 19, 2018

jpountz commented Feb 19, 2018

DaveCTurner commented Feb 19, 2018

Bargs commented Mar 7, 2018

Retenodus commented Mar 23, 2018

nik9000 commented Mar 23, 2018

timroes commented May 6, 2018

jpountz commented May 9, 2018

polyfractal commented Jun 4, 2018

jpountz commented Jun 4, 2018

`time_zone` option makes date histograms much slower #28727

`time_zone` option makes date histograms much slower #28727

jpountz commented Feb 19, 2018 •

edited

Loading