-
Notifications
You must be signed in to change notification settings - Fork 24.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
SQL: Make parsing of date more lenient #52137
Conversation
Pinging @elastic/es-search (:Search/SQL) |
Make the parsing of date more lenient - as an escaped literal: `{d '2020-02-10[[T| ]10:20[:30][.123456789][tz]]'}` - cast a string to a date: `CAST(2020-02-10[[T| ]10:20[:30][.123456789][tz]]' AS DATE)` Closes: elastic#49379
I'm not sure about the escaped literal enhancement, since MS-SQL docs only mention the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
// double check back and forth conversion | ||
assertEquals(date(1581292800000L), conversion.convert("2020-02-10 10:20")); | ||
assertEquals(date(-125908819200000L), conversion.convert("-2020-02-10 10:20:30.123")); | ||
assertEquals(date(1581292800000L), conversion.convert("2020-02-10 10:20:30.123456789")); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For sake of completeness, I'd suggest a test without the duration separator, but with a timezone (or a Z
).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. Left some further small comments.
@@ -42,6 +42,30 @@ | |||
.appendLiteral('T') | |||
.append(ISO_LOCAL_TIME) | |||
.toFormatter().withZone(UTC); | |||
private static final DateTimeFormatter DATE_ESCAPED_LITERAL_FORMATTER_WHITESPACE = new DateTimeFormatterBuilder() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the name of the formatters could be more descriptive? The fact that time can be added, but it's optional, imo should be reflected in the formatter's name. Also, not to abuse the length of the names, how about DATE_OPTIONAL_TIME_FORMATTER_WHITESPACE? (ie get rid of ESCAPED_LITERAL
keep the FORMATTER and mention the whitespace/'T') Same idea could be applied to the rest of the formatters: `DATE_OPTIONAL_OPTIONAL_TIME_FORMATTER_T, ISO_DATE_OPTIONAL_TIME_FORMATTER_WHITESPACE, ISO_DATE_OPTIONAL_TIME_FORMATTER_T.....
return LocalDate.parse(dateFormat, ISO_LOCAL_DATE).atStartOfDay(UTC); | ||
int separatorIdx = dateFormat.indexOf('-'); | ||
if (separatorIdx == 0) { // negative year | ||
separatorIdx = dateFormat.indexOf('-', separatorIdx + 1); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since separatorIdx == 0
then the indexOf
call should be indexOf('-', 1)
right?
} | ||
separatorIdx = dateFormat.indexOf('-', separatorIdx + 1) + 3; | ||
// Avoid index out of bounds - it will lead to DateTimeParseException anyways | ||
if (separatorIdx >= dateFormat.length() || dateFormat.charAt(separatorIdx) == 'T') { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In which situation is the >
valid here? (I refer to separatorIdx >= dateFormat.length()
) Isn't ==
enough to detect a date-only string?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
2 lines above we do +3
to move directly after the day part in the string.
If the string passed is something like: 2020-12-
then we have moved passed the string's length and dateFormat.charAt()
will throw OutOfBounds exception.
I forgot to add a test for that, will do.
Make the parsing of date more lenient
{d '2020-02-10[[T| ]10:20[:30][.123456789][tz]]'}
CAST(2020-02-10[[T| ]10:20[:30][.123456789][tz]]' AS DATE)
Closes: #49379