SQL: Implement DATE_TRUNC function #46473

matriv · 2019-09-08T19:43:44Z

DATE_TRUNC(, <date/datetime>) is a function that allows
the user to truncate a timestamp to the specified field by zeroing out
the rest of the fields. The function is implemented according to the
spec from PostgreSQL: https://www.postgresql.org/docs/current/functions-datetime.html#FUNCTIONS-DATETIME-TRUNC

Closes: #46319

DATE_TRUNC(<truncate field>, <date/datetime>) is a function that allows the user to truncate a timestamp to the specified field by zeroing out the rest of the fields. The function is implemented according to the spec from PostgreSQL: https://www.postgresql.org/docs/current/functions-datetime.html#FUNCTIONS-DATETIME-TRUNC Closes: elastic#46319

elasticmachine · 2019-09-08T19:43:46Z

Pinging @elastic/es-search

matriv · 2019-09-08T19:44:48Z

x-pack/plugin/sql/qa/src/main/resources/datetime.csv-spec

+;
+
+selectDateTruncWithDate
+schema::dt_mil:s|dt_cent:s|dt_dec:s|dt_year:s|dt_quarter:s|dt_month:s|dt_week:s|dt_day:s


Have to cast as string here for the time being, working on a solution to properly recognise the DATE type.

Should be fixed with: 4b119dc

matriv · 2019-09-08T19:45:07Z

x-pack/plugin/sql/qa/src/main/resources/docs/docs.csv-spec

+truncateDateDecades
+schema::decades:s
+// tag::truncateDateDecades
+SELECT DATE_TRUNC('decade', CAST('2019-09-04' AS DATE))::string AS decades;


Have to cast as string here for the time being, working on a solution to properly recognise the DATE type.

Should be fixed with: 4b119dc

matriv · 2019-09-08T19:45:14Z

x-pack/plugin/sql/qa/src/main/resources/docs/docs.csv-spec

+truncateDateQuarter
+schema::quarter:s
+// tag::truncateDateQuarter
+SELECT DATE_TRUNC('quarters', CAST('2019-09-04' AS DATE))::string AS quarter;


Should be fixed with: 4b119dc

matriv · 2019-09-08T19:46:06Z

x-pack/plugin/sql/qa/src/main/resources/datetime.csv-spec

@@ -121,6 +121,100 @@ SELECT WEEK(birth_date) week, birth_date FROM test_emp WHERE WEEK(birth_date) >
 2              |1953-01-07T00:00:00.000Z
 ;

+selectDateTruncWithDateTime
+schema::dt_hour:ts|dt_min:ts|dt_sec:ts|dt_millis:s|dt_micro:s|dt_nano:s


Have to cast some columns to string as the .XXX msecs part is not properly read by the CSV infrastructure.

It's worth opening another issue for this to fix it long term (either inside the CSV library and elsewhere).

Done: #46511

costin

Had some minor comments mainly around style but otherwise LGTM.

costin · 2019-09-09T15:56:04Z

x-pack/plugin/sql/qa/src/main/java/org/elasticsearch/xpack/sql/qa/jdbc/JdbcTestUtils.java

@@ -240,4 +241,9 @@ static Time asTime(long millis, ZoneId zoneId) {
        return new Time(ZonedDateTime.ofInstant(Instant.ofEpochMilli(millis), zoneId)
                .toLocalTime().atDate(JdbcTestUtils.EPOCH).atZone(zoneId).toInstant().toEpochMilli());
    }
+
+    // Used to convert the DATE read from CSV file to a java.sql.Date at the System's timezone (-Dtests.timezone=XXXX)


//change the internal timezone of a java.sql.Date class from UTC to that of the JVM //used by the CsvTest

Considering this method is not used anywhere else, I would just move it to the Assert class and make it private static.

costin · 2019-09-09T15:57:53Z

x-pack/plugin/sql/qa/src/main/resources/datetime.csv-spec

@@ -121,6 +121,100 @@ SELECT WEEK(birth_date) week, birth_date FROM test_emp WHERE WEEK(birth_date) >
 2              |1953-01-07T00:00:00.000Z
 ;

+selectDateTruncWithDateTime
+schema::dt_hour:ts|dt_min:ts|dt_sec:ts|dt_millis:s|dt_micro:s|dt_nano:s


It's worth opening another issue for this to fix it long term (either inside the CSV library and elsewhere).

costin · 2019-09-09T16:03:16Z

...src/main/java/org/elasticsearch/xpack/sql/expression/function/scalar/datetime/DateTrunc.java

+            return aliases;
+        }
+
+        public static DatePart resolveTruncate(String truncateTo) {


An EnumMap is better than iterating and converting the string to lowercase every time.

Hm, I don't get how the EnumMap can be beneficial here. We can receive any string like: w or milliseconds or MILLENNIUM (or NaNOSecond) and we need to find out if it resolves to an Enum name or its aliases. I was thinking of creating a HashMap<String, Enum> for faster resolution to avoid the iteration, but we still need to do the lowerCase().

Can you please explain more your suggested solution? (probably I'm missing something)

Right, that is what I meant, an map of enums: Map<String, Enum> (not EnumMap since the keys are not enum`s themselves).
The point of lowercase is you don't have to do equalsIgnoreCase or lowercase the enum name since you can do that directly when initializing the map - this is the lowercasing I meant not that of the argument.

costin · 2019-09-09T16:04:18Z

...src/main/java/org/elasticsearch/xpack/sql/expression/function/scalar/datetime/DateTrunc.java

+    }
+
+    @Override
+    public int hashCode() {


Small nitpick, generally within the code hashcode appears before equals (though not consistently).

I think it's quite a mixture currently in our classes, but can change.

costin · 2019-09-09T16:07:15Z

...java/org/elasticsearch/xpack/sql/expression/function/scalar/datetime/DateTruncProcessor.java

+    }
+
+    @Override
+    public String getWriteableName() {


Please move the serialization methods (getWritable, doWrite) underneath the constructor for StreamInput - it helps keep the IO bit in one place (see the rest of the classes) for both reading and writing data.

costin · 2019-09-09T16:10:36Z

x-pack/plugin/sql/src/main/java/org/elasticsearch/xpack/sql/util/DateUtils.java

@@ -148,4 +150,95 @@ public static int getNanoPrecision(Expression precisionExpression, int nano) {
        nano = nano - nano % (int) Math.pow(10, (9 - precision));
        return nano;
    }
+
+    public static ZonedDateTime truncate(ZonedDateTime dateTime, DateTrunc.DatePart datePart) {


If the enum is not reused anywhere else (and it doesn't seem to be), I would move the truncate logic inside the enum itself as it looks self-contained.

costin · 2019-09-09T16:12:54Z

.../src/test/java/org/elasticsearch/xpack/sql/analysis/analyzer/VerifierErrorMessagesTests.java

+            error("SELECT DATE_TRUNC(int, date) FROM test"));
+        assertEquals("1:8: second argument of [DATE_TRUNC(keyword, keyword)] must be [date or datetime], found value [keyword] " +
+                "type [keyword]", error("SELECT DATE_TRUNC(keyword, keyword) FROM test"));
+        assertEquals("1:8: first argument of [DATE_TRUNC('invalid', keyword)] must be one of [MILLENNIUM, CENTURY, DECADE, " + "" +


It's worth checking whether the message can be improved by using the levenshtein distance to suggest the best appropriate match (did you mean?) same as with do with field names.
It might not work though for short fields so it should be applied for properties with at least 3-4 chars.

Nice suggestion, thx! Please check edf42ce in case you see some further improvement there.

costin · 2019-09-09T16:14:41Z

.../src/test/java/org/elasticsearch/xpack/sql/expression/function/scalar/FunctionTestUtils.java

@@ -39,8 +45,9 @@ public Combinations(int n, int k) {

        @Override
        public Iterator<BitSet> iterator() {
-            return new Iterator<BitSet>() {
+            return new Iterator<>() {


This auto-formatting is kinda of annoying and distracting - make sure to enable auto-formatting on the lines you actually change not the whole file.
It's fine to improve the code however I would that in a separate PR.

It was done manually, on purpose, it's not auto-formatting just a small code style improvement.
If you agree, I think I'd keep it since it's quite small.

matriv · 2019-09-10T07:31:59Z

@elasticmachine run elasticsearch-ci/1

astefan

LGTM, great work!

Though, I would like to see more tests. For example:

DATE_TRUNC(null, date)
DATE_TRUNC(field_name, date + INTERVAL 12 YEAR)
SELECT DATE_TRUNC(CAST(CHAR(123) AS VARCHAR)

And, if you feel adventurous:

SELECT date, part, code, DATE_TRUNC(CASE WHEN code IS NOT NULL THEN CAST(CHAR(code) AS VARCHAR) ELSE part.keyword END, date + INTERVAL 100 YEAR) AS x FROM test WHERE x > '2018-09-04'::date

with this test data:

{"index":{"_id":1}}
{"date":"2004-07-31T11:57:52.000Z","part":"month","code":109}
{"index":{"_id":2}}
{"date":"1992-07-14T08:16:44.444Z","part":"minute","code":110}
{"index":{"_id":3}}
{"date":"2992-02-12T23:16:33.567Z","part":"second","code":115}
{"index":{"_id":4}}
{"date":"992-03-12T23:16:33.567Z","part":"year"}

astefan · 2019-09-10T08:43:00Z

docs/reference/sql/functions/date-time.asciidoc

+
+*Input*:
+
+<1> string expression denoting the unit to which the date/datetime should be truncated


Shouldn't this phrase be string expression denoting the unit which the date/datetime should be truncated to?

astefan · 2019-09-10T08:43:40Z

docs/reference/sql/functions/date-time.asciidoc

+<1> string expression denoting the unit to which the date/datetime should be truncated
+<2> date/datetime expression
+
+*Output*: date/datetime, same as datetime_exp


`datetime_exp`

matriv · 2019-09-10T11:12:01Z

@astefan Thx for your tests proposals! The null truncateTo field exposed a bug indeed.
Added more tests.

astefan · 2019-09-10T15:32:18Z

docs/reference/sql/functions/date-time.asciidoc

@@ -264,7 +264,7 @@ DATE_TRUNC(
 <1> string expression denoting the unit to which the date/datetime should be truncated to
 <2> date/datetime expression

-*Output*: date/datetime, same as `datetime_exp`
+*Output*: datetime (even if `datetime_exp` is of type date)


I wouldn't mention the even if .... part.

costin

Left a few more comments.

costin · 2019-09-11T14:30:15Z

...src/main/java/org/elasticsearch/xpack/sql/expression/function/scalar/datetime/DateTrunc.java

+        MICROSECOND("microseconds", "mcs"),
+        NANOSECOND("nanoseconds", "ns");
+
+        private static final Set<String> ALL_DATE_PARTS;


I don't think there's a need for this set since RESOLVE_MAP.keySet() returns the same thing (which is a wrapper around the internal map).

costin · 2019-09-11T14:30:40Z

...src/main/java/org/elasticsearch/xpack/sql/expression/function/scalar/datetime/DateTrunc.java

+        NANOSECOND("nanoseconds", "ns");
+
+        private static final Set<String> ALL_DATE_PARTS;
+        private static final Map<String, DatePart> RESOLVE_MAP;


The name is a bit confusing - how about NAME_TO_PART or NAME_TO_FIELD?

costin · 2019-09-11T14:32:21Z

...src/main/java/org/elasticsearch/xpack/sql/expression/function/scalar/datetime/DateTrunc.java

+            return StringUtils.findSimilar(match, ALL_DATE_PARTS);
+        }
+
+        public static ZonedDateTime truncate(ZonedDateTime dateTime, DateTrunc.DatePart datePart) {


In my previous comment I was suggesting to move this extraction inside the enum itself so each one would have its own of implementation : DateTruncate.NANOSECOND.of(ZonedDateTime)

costin · 2019-09-11T14:36:00Z

...src/main/java/org/elasticsearch/xpack/sql/expression/function/scalar/datetime/DateTrunc.java

+
+public class DateTrunc extends BinaryScalarFunction {
+
+    public enum DatePart {


Considering the enum is bound to DateTrunc it might make sense to drop Date from its name it simply becomes Part or maybe Field.

DATE_TRUNC(<truncate field>, <date/datetime>) is a function that allows the user to truncate a timestamp to the specified field by zeroing out the rest of the fields. The function is implemented according to the spec from PostgreSQL: https://www.postgresql.org/docs/current/functions-datetime.html#FUNCTIONS-DATETIME-TRUNC Closes: #46319 (cherry picked from commit b37e967)

To be on the safe side in terms of use cases also add the alias DATETRUNC to the DATE_TRUNC function. Follows: elastic#46473

To be on the safe side in terms of use cases also add the alias DATETRUNC to the DATE_TRUNC function. Follows: #46473

To be on the safe side in terms of use cases also add the alias DATETRUNC to the DATE_TRUNC function. Follows: #46473 (cherry picked from commit 9ac223c)

matriv added >feature :Analytics/SQL SQL querying v8.0.0 v7.5.0 labels Sep 8, 2019

matriv requested review from costin, astefan and bpintea September 8, 2019 19:43

matriv commented Sep 8, 2019

View reviewed changes

matriv added 2 commits September 9, 2019 01:19

Fix issue with assertion of DATE types

4b119dc

Added comment

3c10b9c

costin approved these changes Sep 9, 2019

View reviewed changes

matriv and others added 3 commits September 9, 2019 23:09

Address comments

edf42ce

Address comments pt2 - use HashMap for resolution

6b5090e

Merge remote-tracking branch 'upstream/master' into impl-46319

00b8922

astefan approved these changes Sep 10, 2019

View reviewed changes

matriv added 2 commits September 10, 2019 13:02

Fix docs

55f13b2

Fix issue with nulls, added more tests

215d42e

matriv added 2 commits September 10, 2019 14:26

Fix integ test

8d5b398

Make return time always DATETIME

fc75b52

astefan approved these changes Sep 10, 2019

View reviewed changes

fix docs

0b0069f

costin approved these changes Sep 11, 2019

View reviewed changes

Address comments

7bd17d9

matriv merged commit b37e967 into elastic:master Sep 11, 2019

matriv deleted the impl-46319 branch September 11, 2019 18:09

matriv added a commit to matriv/elasticsearch that referenced this pull request Sep 26, 2019

SQL: Add alias DATETRUNC to DATE_TRUNC function

07d22a8

To be on the safe side in terms of use cases also add the alias DATETRUNC to the DATE_TRUNC function. Follows: elastic#46473

matriv mentioned this pull request Sep 26, 2019

SQL: Add alias DATETRUNC to DATE_TRUNC function #47173

Merged

matriv added a commit that referenced this pull request Sep 27, 2019

SQL: Add alias DATETRUNC to DATE_TRUNC function (#47173)

9ac223c

To be on the safe side in terms of use cases also add the alias DATETRUNC to the DATE_TRUNC function. Follows: #46473

matriv added a commit that referenced this pull request Sep 27, 2019

SQL: Add alias DATETRUNC to DATE_TRUNC function (#47173)

01623f9

To be on the safe side in terms of use cases also add the alias DATETRUNC to the DATE_TRUNC function. Follows: #46473 (cherry picked from commit 9ac223c)

jakelandis added v8.0.0-alpha1 and removed v8.0.0 labels Jul 26, 2021


		Input:

		<1> string expression denoting the unit to which the date/datetime should be truncated


		public class DateTrunc extends BinaryScalarFunction {

		public enum DatePart {

SQL: Implement DATE_TRUNC function #46473

SQL: Implement DATE_TRUNC function #46473

Conversation

matriv commented Sep 8, 2019

elasticmachine commented Sep 8, 2019

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

matriv Sep 8, 2019 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

costin left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

matriv Sep 9, 2019 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

matriv Sep 9, 2019 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

matriv commented Sep 10, 2019

astefan left a comment • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

matriv commented Sep 10, 2019

Choose a reason for hiding this comment

costin left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

matriv Sep 8, 2019 •

edited

Loading

matriv Sep 9, 2019 •

edited

Loading

matriv Sep 9, 2019 •

edited

Loading

astefan left a comment •

edited

Loading