-
Notifications
You must be signed in to change notification settings - Fork 24.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
SQL: Implement DATE_TRUNC function #46473
Conversation
DATE_TRUNC(<truncate field>, <date/datetime>) is a function that allows the user to truncate a timestamp to the specified field by zeroing out the rest of the fields. The function is implemented according to the spec from PostgreSQL: https://www.postgresql.org/docs/current/functions-datetime.html#FUNCTIONS-DATETIME-TRUNC Closes: elastic#46319
Pinging @elastic/es-search |
; | ||
|
||
selectDateTruncWithDate | ||
schema::dt_mil:s|dt_cent:s|dt_dec:s|dt_year:s|dt_quarter:s|dt_month:s|dt_week:s|dt_day:s |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Have to cast as string here for the time being, working on a solution to properly recognise the DATE type.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should be fixed with: 4b119dc
truncateDateDecades | ||
schema::decades:s | ||
// tag::truncateDateDecades | ||
SELECT DATE_TRUNC('decade', CAST('2019-09-04' AS DATE))::string AS decades; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Have to cast as string here for the time being, working on a solution to properly recognise the DATE type.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should be fixed with: 4b119dc
truncateDateQuarter | ||
schema::quarter:s | ||
// tag::truncateDateQuarter | ||
SELECT DATE_TRUNC('quarters', CAST('2019-09-04' AS DATE))::string AS quarter; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
same here
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should be fixed with: 4b119dc
@@ -121,6 +121,100 @@ SELECT WEEK(birth_date) week, birth_date FROM test_emp WHERE WEEK(birth_date) > | |||
2 |1953-01-07T00:00:00.000Z | |||
; | |||
|
|||
selectDateTruncWithDateTime | |||
schema::dt_hour:ts|dt_min:ts|dt_sec:ts|dt_millis:s|dt_micro:s|dt_nano:s |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Have to cast some columns to string as the .XXX
msecs part is not properly read by the CSV infrastructure.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's worth opening another issue for this to fix it long term (either inside the CSV library and elsewhere).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done: #46511
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Had some minor comments mainly around style but otherwise LGTM.
@@ -240,4 +241,9 @@ static Time asTime(long millis, ZoneId zoneId) { | |||
return new Time(ZonedDateTime.ofInstant(Instant.ofEpochMilli(millis), zoneId) | |||
.toLocalTime().atDate(JdbcTestUtils.EPOCH).atZone(zoneId).toInstant().toEpochMilli()); | |||
} | |||
|
|||
// Used to convert the DATE read from CSV file to a java.sql.Date at the System's timezone (-Dtests.timezone=XXXX) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
//change the internal timezone of a java.sql.Date class from UTC to that of the JVM
//used by the CsvTest
Considering this method is not used anywhere else, I would just move it to the Assert class and make it private static.
@@ -121,6 +121,100 @@ SELECT WEEK(birth_date) week, birth_date FROM test_emp WHERE WEEK(birth_date) > | |||
2 |1953-01-07T00:00:00.000Z | |||
; | |||
|
|||
selectDateTruncWithDateTime | |||
schema::dt_hour:ts|dt_min:ts|dt_sec:ts|dt_millis:s|dt_micro:s|dt_nano:s |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's worth opening another issue for this to fix it long term (either inside the CSV library and elsewhere).
return aliases; | ||
} | ||
|
||
public static DatePart resolveTruncate(String truncateTo) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
An EnumMap
is better than iterating and converting the string to lowercase every time.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hm, I don't get how the EnumMap can be beneficial here. We can receive any string like: w
or milliseconds
or MILLENNIUM
(or NaNOSecond
) and we need to find out if it resolves to an Enum name or its aliases. I was thinking of creating a HashMap<String, Enum>
for faster resolution to avoid the iteration, but we still need to do the lowerCase()
.
Can you please explain more your suggested solution? (probably I'm missing something)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Right, that is what I meant, an map of enums: Map<String, Enum>
(not EnumMap since the keys are not
enum`s themselves).
The point of lowercase is you don't have to do equalsIgnoreCase or lowercase the enum name since you can do that directly when initializing the map - this is the lowercasing I meant not that of the argument.
} | ||
|
||
@Override | ||
public int hashCode() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Small nitpick, generally within the code hashcode appears before equals (though not consistently).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it's quite a mixture currently in our classes, but can change.
} | ||
|
||
@Override | ||
public String getWriteableName() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please move the serialization methods (getWritable
, doWrite
) underneath the constructor for StreamInput
- it helps keep the IO bit in one place (see the rest of the classes) for both reading and writing data.
@@ -148,4 +150,95 @@ public static int getNanoPrecision(Expression precisionExpression, int nano) { | |||
nano = nano - nano % (int) Math.pow(10, (9 - precision)); | |||
return nano; | |||
} | |||
|
|||
public static ZonedDateTime truncate(ZonedDateTime dateTime, DateTrunc.DatePart datePart) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If the enum is not reused anywhere else (and it doesn't seem to be), I would move the truncate logic inside the enum itself as it looks self-contained.
error("SELECT DATE_TRUNC(int, date) FROM test")); | ||
assertEquals("1:8: second argument of [DATE_TRUNC(keyword, keyword)] must be [date or datetime], found value [keyword] " + | ||
"type [keyword]", error("SELECT DATE_TRUNC(keyword, keyword) FROM test")); | ||
assertEquals("1:8: first argument of [DATE_TRUNC('invalid', keyword)] must be one of [MILLENNIUM, CENTURY, DECADE, " + "" + |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's worth checking whether the message can be improved by using the levenshtein distance to suggest the best appropriate match (did you mean?) same as with do with field names.
It might not work though for short fields so it should be applied for properties with at least 3-4 chars.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice suggestion, thx! Please check edf42ce in case you see some further improvement there.
@@ -39,8 +45,9 @@ public Combinations(int n, int k) { | |||
|
|||
@Override | |||
public Iterator<BitSet> iterator() { | |||
return new Iterator<BitSet>() { | |||
return new Iterator<>() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This auto-formatting is kinda of annoying and distracting - make sure to enable auto-formatting on the lines you actually change not the whole file.
It's fine to improve the code however I would that in a separate PR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It was done manually, on purpose, it's not auto-formatting just a small code style improvement.
If you agree, I think I'd keep it since it's quite small.
@elasticmachine run elasticsearch-ci/1 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, great work!
Though, I would like to see more tests. For example:
DATE_TRUNC(null, date)
DATE_TRUNC(field_name, date + INTERVAL 12 YEAR)
SELECT DATE_TRUNC(CAST(CHAR(123) AS VARCHAR)
And, if you feel adventurous:
SELECT date, part, code, DATE_TRUNC(CASE WHEN code IS NOT NULL THEN CAST(CHAR(code) AS VARCHAR) ELSE part.keyword END, date + INTERVAL 100 YEAR) AS x FROM test WHERE x > '2018-09-04'::date
with this test data:
{"index":{"_id":1}}
{"date":"2004-07-31T11:57:52.000Z","part":"month","code":109}
{"index":{"_id":2}}
{"date":"1992-07-14T08:16:44.444Z","part":"minute","code":110}
{"index":{"_id":3}}
{"date":"2992-02-12T23:16:33.567Z","part":"second","code":115}
{"index":{"_id":4}}
{"date":"992-03-12T23:16:33.567Z","part":"year"}
|
||
*Input*: | ||
|
||
<1> string expression denoting the unit to which the date/datetime should be truncated |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Shouldn't this phrase be string expression denoting the unit which the date/datetime should be truncated to
?
<1> string expression denoting the unit to which the date/datetime should be truncated | ||
<2> date/datetime expression | ||
|
||
*Output*: date/datetime, same as datetime_exp |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
`datetime_exp`
@astefan Thx for your tests proposals! The |
@@ -264,7 +264,7 @@ DATE_TRUNC( | |||
<1> string expression denoting the unit to which the date/datetime should be truncated to | |||
<2> date/datetime expression | |||
|
|||
*Output*: date/datetime, same as `datetime_exp` | |||
*Output*: datetime (even if `datetime_exp` is of type date) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wouldn't mention the even if ....
part.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Left a few more comments.
MICROSECOND("microseconds", "mcs"), | ||
NANOSECOND("nanoseconds", "ns"); | ||
|
||
private static final Set<String> ALL_DATE_PARTS; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think there's a need for this set since RESOLVE_MAP.keySet()
returns the same thing (which is a wrapper around the internal map).
NANOSECOND("nanoseconds", "ns"); | ||
|
||
private static final Set<String> ALL_DATE_PARTS; | ||
private static final Map<String, DatePart> RESOLVE_MAP; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The name is a bit confusing - how about NAME_TO_PART
or NAME_TO_FIELD
?
return StringUtils.findSimilar(match, ALL_DATE_PARTS); | ||
} | ||
|
||
public static ZonedDateTime truncate(ZonedDateTime dateTime, DateTrunc.DatePart datePart) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In my previous comment I was suggesting to move this extraction inside the enum itself so each one would have its own of
implementation : DateTruncate.NANOSECOND.of(ZonedDateTime)
|
||
public class DateTrunc extends BinaryScalarFunction { | ||
|
||
public enum DatePart { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Considering the enum is bound to DateTrunc
it might make sense to drop Date
from its name it simply becomes Part
or maybe Field
.
DATE_TRUNC(<truncate field>, <date/datetime>) is a function that allows the user to truncate a timestamp to the specified field by zeroing out the rest of the fields. The function is implemented according to the spec from PostgreSQL: https://www.postgresql.org/docs/current/functions-datetime.html#FUNCTIONS-DATETIME-TRUNC Closes: #46319 (cherry picked from commit b37e967)
To be on the safe side in terms of use cases also add the alias DATETRUNC to the DATE_TRUNC function. Follows: elastic#46473
To be on the safe side in terms of use cases also add the alias DATETRUNC to the DATE_TRUNC function. Follows: #46473
DATE_TRUNC(, <date/datetime>) is a function that allows
the user to truncate a timestamp to the specified field by zeroing out
the rest of the fields. The function is implemented according to the
spec from PostgreSQL: https://www.postgresql.org/docs/current/functions-datetime.html#FUNCTIONS-DATETIME-TRUNC
Closes: #46319