Skip to content

Commit

Permalink
Optimize date_histograms across daylight savings time (elastic#55559)
Browse files Browse the repository at this point in the history
Rounding dates on a shard that contains a daylight savings time transition
is currently something like 1400% slower than when a shard contains dates
only on one side of the DST transition. And it makes a ton of short lived
garbage. This replaces that implementation with one that benchmarks to
having around 30% overhead instead of the 1400%. And it doesn't generate
any garbage per search hit.

Some background:
There are two ways to round in ES:
* Round to the nearest time unit (Day/Hour/Week/Month/etc)
* Round to the nearest time *interval* (3 days/2 weeks/etc)

I'm only optimizing the first one in this change and plan to do the second
in a follow up. It turns out that rounding to the nearest unit really *is*
two problems: when the unit rounds to midnight (day/week/month/year) and
when it doesn't (hour/minute/second). Rounding to midnight is consistently
about 25% faster and rounding to individual hour or minutes.

This optimization relies on being able to *usually* figure out what the
minimum and maximum dates are on the shard. This is similar to an existing
optimization where we rewrite time zones that aren't fixed
(think America/New_York and its daylight savings time transitions) into
fixed time zones so long as there isn't a daylight savings time transition
on the shard (UTC-5 or UTC-4 for America/New_York). Once I implement
time interval rounding the time zone rewriting optimization *should* no
longer be needed.

This optimization doesn't come into play for `composite` or
`auto_date_histogram` aggs because neither have been migrated to the new
`DATE` `ValuesSourceType` which is where that range lookup happens. When
they are they will be able to pick up the optimization without much work.
I expect this to be substantial for `auto_date_histogram` but less so for
`composite` because it deals with fewer values.

Note: My 30% overhead figure comes from small numbers of daylight savings
time transitions. That overhead gets higher when there are more
transitions in logarithmic fashion. When there are two thousand years
worth of transitions my algorithm ends up being 250% slower than rounding
without a time zone, but java time is 47000% slower at that point,
allocating memory as fast as it possibly can.
  • Loading branch information
nik9000 authored May 7, 2020
1 parent fb86184 commit 0097a86
Show file tree
Hide file tree
Showing 23 changed files with 1,999 additions and 357 deletions.
4 changes: 2 additions & 2 deletions benchmarks/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -53,9 +53,9 @@ To get realistic results, you should exercise care when running benchmarks. Here
`performance` CPU governor.
* Vary the problem input size with `@Param`.
* Use the integrated profilers in JMH to dig deeper if benchmark results to not match your hypotheses:
* Run the generated uberjar directly and use `-prof gc` to check whether the garbage collector runs during a microbenchmarks and skews
* Add `-prof gc` to the options to check whether the garbage collector runs during a microbenchmarks and skews
your results. If so, try to force a GC between runs (`-gc true`) but watch out for the caveats.
* Use `-prof perf` or `-prof perfasm` (both only available on Linux) to see hotspots.
* Add `-prof perf` or `-prof perfasm` (both only available on Linux) to see hotspots.
* Have your benchmarks peer-reviewed.

### Don't
Expand Down
Original file line number Diff line number Diff line change
@@ -0,0 +1,117 @@
/*
* Licensed to Elasticsearch under one or more contributor
* license agreements. See the NOTICE file distributed with
* this work for additional information regarding copyright
* ownership. Elasticsearch licenses this file to you under
* the Apache License, Version 2.0 (the "License"); you may
* not use this file except in compliance with the License.
* You may obtain a copy of the License at
*
* http://www.apache.org/licenses/LICENSE-2.0
*
* Unless required by applicable law or agreed to in writing,
* software distributed under the License is distributed on an
* "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
* KIND, either express or implied. See the License for the
* specific language governing permissions and limitations
* under the License.
*/

package org.elasticsearch.common;

import org.elasticsearch.common.time.DateFormatter;
import org.openjdk.jmh.annotations.Benchmark;
import org.openjdk.jmh.annotations.BenchmarkMode;
import org.openjdk.jmh.annotations.Fork;
import org.openjdk.jmh.annotations.Measurement;
import org.openjdk.jmh.annotations.Mode;
import org.openjdk.jmh.annotations.OutputTimeUnit;
import org.openjdk.jmh.annotations.Param;
import org.openjdk.jmh.annotations.Scope;
import org.openjdk.jmh.annotations.Setup;
import org.openjdk.jmh.annotations.State;
import org.openjdk.jmh.annotations.Warmup;
import org.openjdk.jmh.infra.Blackhole;

import java.time.ZoneId;
import java.util.concurrent.TimeUnit;
import java.util.function.Supplier;

@Fork(2)
@Warmup(iterations = 10)
@Measurement(iterations = 5)
@BenchmarkMode(Mode.AverageTime)
@OutputTimeUnit(TimeUnit.NANOSECONDS)
@State(Scope.Benchmark)
public class RoundingBenchmark {
private static final DateFormatter FORMATTER = DateFormatter.forPattern("date_optional_time");

@Param({
"2000-01-01 to 2020-01-01", // A super long range
"2000-10-01 to 2000-11-01", // A whole month which is pretty believable
"2000-10-29 to 2000-10-30", // A date right around daylight savings time.
"2000-06-01 to 2000-06-02" // A date fully in one time zone. Should be much faster than above.
})
public String range;

@Param({ "java time", "es" })
public String rounder;

@Param({ "UTC", "America/New_York" })
public String zone;

@Param({ "MONTH_OF_YEAR", "HOUR_OF_DAY" })
public String timeUnit;

@Param({ "1", "10000", "1000000", "100000000" })
public int count;

private long min;
private long max;
private long[] dates;
private Supplier<Rounding.Prepared> rounderBuilder;

@Setup
public void buildDates() {
String[] r = range.split(" to ");
min = FORMATTER.parseMillis(r[0]);
max = FORMATTER.parseMillis(r[1]);
dates = new long[count];
long date = min;
long diff = (max - min) / dates.length;
for (int i = 0; i < dates.length; i++) {
if (date >= max) {
throw new IllegalStateException("made a bad date [" + date + "]");
}
dates[i] = date;
date += diff;
}
Rounding rounding = Rounding.builder(Rounding.DateTimeUnit.valueOf(timeUnit)).timeZone(ZoneId.of(zone)).build();
switch (rounder) {
case "java time":
rounderBuilder = rounding::prepareJavaTime;
break;
case "es":
rounderBuilder = () -> rounding.prepare(min, max);
break;
default:
throw new IllegalArgumentException("Expectd rounder to be [java time] or [es]");
}
}

@Benchmark
public void round(Blackhole bh) {
Rounding.Prepared rounder = rounderBuilder.get();
for (int i = 0; i < dates.length; i++) {
bh.consume(rounder.round(dates[i]));
}
}

@Benchmark
public void nextRoundingValue(Blackhole bh) {
Rounding.Prepared rounder = rounderBuilder.get();
for (int i = 0; i < dates.length; i++) {
bh.consume(rounder.nextRoundingValue(dates[i]));
}
}
}
Loading

0 comments on commit 0097a86

Please sign in to comment.