Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support offset in composite aggs #50609

Merged
merged 13 commits into from
Jan 7, 2020
Merged

Conversation

nik9000
Copy link
Member

@nik9000 nik9000 commented Jan 3, 2020

Adds support for the offset parameter to the date_histogram source
of composite aggs. The offset parameter is supported by the normal
date_histogram aggregation and is useful for folks that need to
measure things from, say, 6am one day to 6am the next day.

This is implemented by creating a new Rounding that knows how to
handle offsets and delegates to other rounding implementations. That
implementation doesn't fully implement the Rounding contract, namely
nextRoundingValue. That method isn't used by composite aggs so I can't
be sure that any implementation that I add will be correct. I propose to
leave it throwing UnsupportedOperationException until I need it.

Closes #48757

Adds support for the `offset` parameter to the `date_histogram` source
of composite aggs. The `offset` parameter is supported by the normal
`date_histogram` aggregation and is useful for folks that need to
measure things from, say, 6am one day to 6am the next day.

This is implemented by creating a new `Rounding` that knows how to
handle offsets and delegates to other rounding implementations. That
implementation doesn't fully implement the `Rounding` contract, namely
`nextRoundingValue`. That method isn't used by composite aggs so I can't
be sure that any implementation that I add will be correct. I propose to
leave it throwing `UnsupportedOperationException` until I need it.

Closes elastic#48757
@elasticmachine
Copy link
Collaborator

Pinging @elastic/es-docs (>docs)

@elasticmachine
Copy link
Collaborator

Pinging @elastic/es-analytics-geo (:Analytics/Aggregations)

@nik9000
Copy link
Member Author

nik9000 commented Jan 3, 2020

Please have a close look at this one! It has wire changes and it has been more than a year since I made any of those....


@Override
public void innerWriteTo(StreamOutput out) throws IOException {
if (out.getVersion().before(Version.V_8_0_0)) {
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We don't actually serialize this Rounding but I believe I've got this bit right so I kept it. I'd like to move the other offset code over to this Rounding which means that we will eventually serialize it. At that point we'll get exhaustive tests for this code.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Modulo the version check which I discuss below.

@@ -80,12 +91,18 @@ protected DateHistogramValuesSourceBuilder(StreamInput in) throws IOException {
super(in);
dateHistogramInterval = new DateIntervalWrapper(in);
timeZone = in.readOptionalZoneId();
if (in.getVersion().after(Version.V_8_0_0)) {
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is the way to do this, but it has been a long time. I want to land this change in master and 7.x, but these tests have no change of passing the bwc tests until I backport it.

Now that I think about it, maybe this version check should be V_7_6_0 actually. Help!

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The tests caught an error - this at least be onOrAfter but I'll switch it to V_7_6_0 while I'm there.

@@ -287,6 +287,72 @@ Time zones may either be specified as an ISO 8601 UTC offset (e.g. `+01:00` or
`-08:00`) or as a timezone id, an identifier used in the TZ database like
`America/Los_Angeles`.

*Offset*

include::datehistogram-aggregation.asciidoc[tag=offset-explanation]
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I figured because the option is the same I should just include the docs from the normal date histogram.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

@nik9000
Copy link
Member Author

nik9000 commented Jan 3, 2020

Hurray! Integration test failure! Will debug. Good robots, catching things.

@nik9000
Copy link
Member Author

nik9000 commented Jan 3, 2020

@elasticmachine, run elasticsearch-ci/2

@nik9000
Copy link
Member Author

nik9000 commented Jan 3, 2020

The default distro failure is due to the thing not being backported. I'll find a way to get this working.

@nik9000
Copy link
Member Author

nik9000 commented Jan 3, 2020

The default distro failure is due to the thing not being backported. I'll find a way to get this working.

I believe I've got this sorted thanks to @DaveCTurner and @cbuescher.

Copy link
Contributor

@polyfractal polyfractal left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Left a few comments/questions, but I think this looks good. Main question was around the new Rounding class, most of the rest is cosmetic :)

@@ -287,6 +287,72 @@ Time zones may either be specified as an ISO 8601 UTC offset (e.g. `+01:00` or
`-08:00`) or as a timezone id, an identifier used in the TZ database like
`America/Los_Angeles`.

*Offset*

include::datehistogram-aggregation.asciidoc[tag=offset-explanation]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

@@ -556,19 +567,73 @@ public boolean equals(Object obj) {
}
}

static class OffsetRounding extends Rounding {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is the intention to migrate the existing users of offsets (regular date histo agg, etc) over to this wrapper? E.g. it seems a bit heavyweight to create a whole new rounding, but if the idea is to followup with changes to the existing users of offset (rather than baking the logic into the aggs) it makes sense to me.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, we should probably add a javadoc explaining why/when to use this class

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I did intend to migrate the rest over of the uses of offset over to it.

I added javadoc on the public interface to the class which is what the rest of the classes in this file do. Do you think I should duplicate it onto this one? Or add something maybe?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, missed that it was documented on the public method. Since the rest of the class does it that way, fine with me :)

Thanks for the explanation!


OffsetRounding(StreamInput in) throws IOException {
delegate = Rounding.read(in);
offset = in.readLong();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe read/write ZLong instead? I suspect offsets will be small-to-medium'ish sized and either positive or negative, so zlong might be a win?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In other places offsets are written as long and I just copied it. But I'd be fine with zlong.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pushed the switch to zlong.

byte id = in.readByte();
switch (id) {
case TimeUnitRounding.ID:
rounding = new TimeUnitRounding(in);
break;
return new TimeUnitRounding(in);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🙏 for the cleanup here :)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure everyone would consider that cleaner, but I sure do!

@@ -195,6 +195,16 @@ public void testTimeUnitRoundingDST() {
assertThat(tzRounding_chg.round(time("2014-11-02T06:01:01", chg)), isDate(time("2014-11-02T06:00:00", chg), chg));
}

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, do you know if we have any serialization tests for Rounding? I was looking to see if we test that the id() bytes are correct and don't accidentally change (similar to what we do with AbstractWriteableEnumTestCase enum tests)... but couldn't find any serialization tests at all.

If we have them somewhere, let's add a test for the id byte. If not, probably too much to add to this PR but we should file a ticket so we don't forget to add some tests... makes me uneasy that such a widespread class doesn't have serialization tests :)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe we end up testing serialization as part of testing things like extended bounds bucket response. Adding a unit test for just this class's serialization makes sense to me though. I figured I'd wait until I used it in a context where we serialized it but since I'm writing the serialization code now I probably ought to write the test now.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I pushed a wire test case.

@@ -195,6 +195,16 @@ public void testTimeUnitRoundingDST() {
assertThat(tzRounding_chg.round(time("2014-11-02T06:01:01", chg)), isDate(time("2014-11-02T06:00:00", chg), chg));
}

public void testOffsetRounding() {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's add another test that does negative offsets too?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure!

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I pushed a test with negative offsets.

@nik9000
Copy link
Member Author

nik9000 commented Jan 3, 2020

@polyfractal, I've pushed everything you asked for.

@nik9000
Copy link
Member Author

nik9000 commented Jan 3, 2020

I reworked the serialization a bit, moving the write methods so they are near the read methods and replacing writeString(zone.getId()) with writeZoneId(zone). More importantly, I replaced DateUtil.of(readString()) with readZoneId. The former changes the zone on load, making the hashcode and equals inconsistent.

@@ -418,24 +434,16 @@ public boolean equals(Object obj) {
return false;
}
TimeUnitRounding other = (TimeUnitRounding) obj;
return Objects.equals(unit, other.unit) && Objects.equals(timeZone, other.timeZone);
return unit == other.unit && timeZone.equals(other.timeZone);
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll revert this change. It snuck in because I thought the comparison was wrong.


@Override
public String toString() {
return "Rounding[" + interval + " in " + timeZone + "]";
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The toString implementations weren't consistent which made reading the error messages hard. So I changed them.

@nik9000 nik9000 merged commit 326d696 into elastic:master Jan 7, 2020
nik9000 added a commit to nik9000/elasticsearch that referenced this pull request Jan 9, 2020
Adds support for the `offset` parameter to the `date_histogram` source
of composite aggs. The `offset` parameter is supported by the normal
`date_histogram` aggregation and is useful for folks that need to
measure things from, say, 6am one day to 6am the next day.

This is implemented by creating a new `Rounding` that knows how to
handle offsets and delegates to other rounding implementations. That
implementation doesn't fully implement the `Rounding` contract, namely
`nextRoundingValue`. That method isn't used by composite aggs so I can't
be sure that any implementation that I add will be correct. I propose to
leave it throwing `UnsupportedOperationException` until I need it.

Closes elastic#48757
@nik9000
Copy link
Member Author

nik9000 commented Jan 9, 2020

Thanks for the reviews @polyfractal and @jimczi !

nik9000 added a commit that referenced this pull request Jan 9, 2020
Adds support for the `offset` parameter to the `date_histogram` source
of composite aggs. The `offset` parameter is supported by the normal
`date_histogram` aggregation and is useful for folks that need to
measure things from, say, 6am one day to 6am the next day.

This is implemented by creating a new `Rounding` that knows how to
handle offsets and delegates to other rounding implementations. That
implementation doesn't fully implement the `Rounding` contract, namely
`nextRoundingValue`. That method isn't used by composite aggs so I can't
be sure that any implementation that I add will be correct. I propose to
leave it throwing `UnsupportedOperationException` until I need it.

Closes #48757
cbuescher pushed a commit to cbuescher/elasticsearch that referenced this pull request Jan 10, 2020
When deserializing time zones in the Rounding classes we used to include a tiny
normalization step via `DateUtils.of(in.readString())` that was lost in elastic#50609.
Its at least necessary for some tests, e.g. the cause of elastic#50827 is that when
sending the default time zone ZoneOffset.UTC on a stream pre 7.0 we convert it
to a "UTC" string id via `DateUtils.zoneIdToDateTimeZone`. This gets then read
back as a UTC ZoneRegion, which should behave the same but fails the equality
tests in our serialization tests. Reverting to the previous behaviour with an
additional normalization step on 7.x.
nik9000 added a commit to nik9000/elasticsearch that referenced this pull request Jan 10, 2020
When I implemented elastic#50609 I suppressed a few backwards compatibility
checks until I finished the backport. I've now finished the backport so
this enables the tests.
nik9000 added a commit that referenced this pull request Jan 10, 2020
When I implemented #50609 I suppressed a few backwards compatibility
checks until I finished the backport. I've now finished the backport so
this enables the tests.
nik9000 added a commit to nik9000/elasticsearch that referenced this pull request Jan 10, 2020
We added a new rounding in elastic#50609 that handles offsets to the start and
end of the rounding so that we could support `offset` in the `composite`
aggregation. This starts moving `date_histogram` to that new offset.
cbuescher pushed a commit that referenced this pull request Jan 13, 2020
When deserializing time zones in the Rounding classes we used to include a tiny
normalization step via `DateUtils.of(in.readString())` that was lost in #50609.
Its at least necessary for some tests, e.g. the cause of #50827 is that when
sending the default time zone ZoneOffset.UTC on a stream pre 7.0 we convert it
to a "UTC" string id via `DateUtils.zoneIdToDateTimeZone`. This gets then read
back as a UTC ZoneRegion, which should behave the same but fails the equality
tests in our serialization tests. Reverting to the previous behaviour with an
additional normalization step on 7.x.

Co-authored-by: Nik Everett <[email protected]>

Closes #50827
nik9000 added a commit that referenced this pull request Jan 14, 2020
We added a new rounding in #50609 that handles offsets to the start and
end of the rounding so that we could support `offset` in the `composite`
aggregation. This starts moving `date_histogram` to that new offset.
nik9000 added a commit to nik9000/elasticsearch that referenced this pull request Jan 14, 2020
…0873)

We added a new rounding in elastic#50609 that handles offsets to the start and
end of the rounding so that we could support `offset` in the `composite`
aggregation. This starts moving `date_histogram` to that new offset.
nik9000 added a commit that referenced this pull request Jan 14, 2020
…50978)

We added a new rounding in #50609 that handles offsets to the start and
end of the rounding so that we could support `offset` in the `composite`
aggregation. This starts moving `date_histogram` to that new offset.
SivagurunathanV pushed a commit to SivagurunathanV/elasticsearch that referenced this pull request Jan 23, 2020
Adds support for the `offset` parameter to the `date_histogram` source
of composite aggs. The `offset` parameter is supported by the normal
`date_histogram` aggregation and is useful for folks that need to
measure things from, say, 6am one day to 6am the next day.

This is implemented by creating a new `Rounding` that knows how to
handle offsets and delegates to other rounding implementations. That
implementation doesn't fully implement the `Rounding` contract, namely
`nextRoundingValue`. That method isn't used by composite aggs so I can't
be sure that any implementation that I add will be correct. I propose to
leave it throwing `UnsupportedOperationException` until I need it.

Closes elastic#48757
SivagurunathanV pushed a commit to SivagurunathanV/elasticsearch that referenced this pull request Jan 23, 2020
When I implemented elastic#50609 I suppressed a few backwards compatibility
checks until I finished the backport. I've now finished the backport so
this enables the tests.
SivagurunathanV pushed a commit to SivagurunathanV/elasticsearch that referenced this pull request Jan 23, 2020
We added a new rounding in elastic#50609 that handles offsets to the start and
end of the rounding so that we could support `offset` in the `composite`
aggregation. This starts moving `date_histogram` to that new offset.
nik9000 added a commit that referenced this pull request Jan 27, 2020
We added a new rounding in #50609 that handles offsets to the start and
end of the rounding so that we could support `offset` in the `composite`
aggregation. This starts moving `date_histogram` to that new offset.

This is a redo of #50873 with more integration tests.

This reverts commit d114c9d.
nik9000 added a commit to nik9000/elasticsearch that referenced this pull request Jan 27, 2020
)

We added a new rounding in elastic#50609 that handles offsets to the start and
end of the rounding so that we could support `offset` in the `composite`
aggregation. This starts moving `date_histogram` to that new offset.

This is a redo of elastic#50873 with more integration tests.

This reverts commit d114c9d.
nik9000 added a commit that referenced this pull request Jan 27, 2020
…51495)

We added a new rounding in #50609 that handles offsets to the start and
end of the rounding so that we could support `offset` in the `composite`
aggregation. This starts moving `date_histogram` to that new offset.

This is a redo of #50873 with more integration tests.

This reverts commit d114c9d.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[Feature Request] Support offset in composite date_histogram aggregation
5 participants