Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Esql compare nanos and millis #118027

Merged
merged 26 commits into from
Dec 6, 2024

Conversation

not-napoleon
Copy link
Member

Resolves #116281

Introduces support for comparing millisecond dates with nanosecond dates, without the need for casting. Millisecond dates outside of the nanosecond date range are handled correctly.

@not-napoleon not-napoleon added >enhancement auto-backport Automatically create backport pull requests when merged :Analytics/ES|QL AKA ESQL v9.0.0 v8.18.0 labels Dec 4, 2024
@not-napoleon not-napoleon requested a review from a team as a code owner December 4, 2024 20:41
Copy link
Contributor

github-actions bot commented Dec 4, 2024

Documentation preview:

@elasticsearchmachine elasticsearchmachine added the Team:Analytics Meta label for analytical engine team (ESQL/Aggs/Geo) label Dec 4, 2024
@elasticsearchmachine
Copy link
Collaborator

Hi @not-napoleon, I've created a changelog YAML for you.

@elasticsearchmachine
Copy link
Collaborator

Pinging @elastic/es-analytical-engine (Team:Analytics)

@@ -293,6 +293,17 @@ public static long toMilliSeconds(long nanoSecondsSinceEpoch) {
return nanoSecondsSinceEpoch / 1_000_000;
}

public static int compareNanosToMillis(long nanos, long millis) {
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is the main implementation. All the specific operators are implemented in terms of this method.

@@ -726,7 +726,7 @@ public static Literal randomLiteral(DataType type) {
case UNSIGNED_LONG, LONG, COUNTER_LONG -> randomLong();
case DATE_PERIOD -> Period.of(randomIntBetween(-1000, 1000), randomIntBetween(-13, 13), randomIntBetween(-32, 32));
case DATETIME -> randomMillisUpToYear9999();
case DATE_NANOS -> randomLong();
case DATE_NANOS -> randomLongBetween(0, Long.MAX_VALUE);
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is unrelated, but has been wrong for a while. I changed how we were logging dates in tests, which then triggered an issue with negative values here, thus needing to fix it now.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit, could be simplified to randomNonNegativeLong()

Copy link
Member

@rjernst rjernst left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A couple comments on the date utils method.

@@ -293,6 +293,17 @@ public static long toMilliSeconds(long nanoSecondsSinceEpoch) {
return nanoSecondsSinceEpoch / 1_000_000;
}

public static int compareNanosToMillis(long nanos, long millis) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this comparing sub-second durations, or epoch times? I think it's the former, but please add javadocs to make clear the expectations for input and output.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Javadoc added. The intent is in fact to compare epoch dates in the two different formats. The intent here is to support an ES|QL use case comparing fields of different types. This seemed like a good place to put it, since we have a bunch of other functions for dealing with ES date_nanos here, but I can move it if this class isn't appropriate.

@@ -293,6 +293,17 @@ public static long toMilliSeconds(long nanoSecondsSinceEpoch) {
return nanoSecondsSinceEpoch / 1_000_000;
}

public static int compareNanosToMillis(long nanos, long millis) {
if (millis < 0) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If this is sub-second duration (see question above), how can millis be negative? If it's epoch times, why would negative millis the nanos are always greater?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Elasticsearch doesn't allow nanosecond resolution dates before epoch, so a millisecond date before epoch will, by definition, be before any valid nanosecond date. See, for example, clampToNanosRange.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We didn't support passing negative epoch millis until recently. Rather than make this assumption, why not make the method correctly compare the two?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I didn't do that because we then need to deal with the long overflow case in the comparison. Right now, nanos - (millis * 1_000_000) will never overflow, but if we allow negative values there, then overflow becomes a possibility. We can deal with that, but this seemed more straight forward. I've added some test cases and comments to make this more explicit.

The assumption that nanosecond dates will never be negative is fairly deeply baked into the system, and we would need to change a lot of things before we could support that, not just this method. I could change this to read if (millis < 0 && nanos >= 0) {, even though the second clause will always be true, if you want. I'd rather not add further logic for dealing with currently impossible negative nanos cases though, since none of the other tooling supports it and I'd have to hand-roll conversions (and difficulty in those conversions was why we decided not to support pre-epoch nanos in the first place, IIRC).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you add an assert that nanos >= 0? That would then break if we try adding negative support in the future, so modifying this method is not missed.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

good idea. Done.

if (millis < 0) {
return 1;
}
if (millis > MAX_NANOSECOND_IN_MILLIS) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems like it should be a precondition. Why would we have millis outside the expected input bounds?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The intent here is to support comparing user data stored in the two different resolutions. So in this case, the millis aren't outside the expected input bounds, as any long-representable datetime should be a valid input. A user might have a query like WHERE nanos_date < millis_date, and millis_date could be past the maximum long-representable nanosecond date, in which case we would still want that expression to evaluate to true. We can optimize that away if the milliseconds are a constant, but if they're in a field, we actually need to compare and get a sensible result.

) {
this(source, left, right, operation, null, evaluatorMap);
this(source, left, right, operation, null, evaluatorMap, nanosToMillisEvaluator, millisToNanosEvaluator);
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

After consideration, passing in two special case evaluator factories seems like the least bad way to implement this. We need two functions because we need the ordering information for the operator, but also need to rely on parameter order to know which argument is milliseconds and which is nanoseconds.

I thought about changing the evaluator map to be keyed by the ordered pair of input types, but that would add a lot of needless cruft to the common cases. We could try something like using the single type keyed map if commonType returns a non-null value, and using a two keyed map if it doesn't, but that seemed like more complexity than necessary.

I'm open to other suggestions on how to manage dispatching these operations.

Copy link
Member

@rjernst rjernst left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

DateUtils changes LGTM

@not-napoleon not-napoleon merged commit 7cd17d2 into elastic:main Dec 6, 2024
16 checks passed
@elasticsearchmachine
Copy link
Collaborator

💚 Backport successful

Status Branch Result
8.x

not-napoleon added a commit to not-napoleon/elasticsearch that referenced this pull request Dec 6, 2024
Resolves elastic#116281

Introduces support for comparing millisecond dates with nanosecond dates, without the need for casting. Millisecond dates outside of the nanosecond date range are handled correctly.
elasticsearchmachine pushed a commit that referenced this pull request Dec 6, 2024
Resolves #116281

Introduces support for comparing millisecond dates with nanosecond dates, without the need for casting. Millisecond dates outside of the nanosecond date range are handled correctly.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
:Analytics/ES|QL AKA ESQL auto-backport Automatically create backport pull requests when merged >enhancement Team:Analytics Meta label for analytical engine team (ESQL/Aggs/Geo) v8.18.0 v9.0.0
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Allow for mixed comparisons between millis and nanos
5 participants