[SPARK-1360] Add Timestamp Support for SQL #275

chenghao-intel · 2014-03-31T08:38:35Z

This PR includes:

Add new data type Timestamp
Add more data type casting base on Hive's Rule
Fix bug missing data type in both parsers (HiveQl & SQLParser).

AmplabJenkins · 2014-03-31T08:42:23Z

Can one of the admins verify this patch?

chenghao-intel · 2014-03-31T15:16:23Z

Sorry, some bugs in the unit test, I will look at those.

rxin · 2014-03-31T17:56:24Z

sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/Cast.scala

@@ -17,6 +17,9 @@

 package org.apache.spark.sql.catalyst.expressions

+import java.sql.Timestamp
+import java.lang.{NumberFormatException => NFE} 


Can we not shorten this? I am not sure if NFE is very obvious to readers

marmbrus · 2014-03-31T20:34:44Z

Overall, I think this looks pretty good. After you figure out the failing tests, you should also whitelist a bunch of tests that were only failing because we didn't have timestamp support.

I got the list below by running sbt/sbt -Dspark.hive.alltests hive/test (this runs all tests even if they aren't on the whitelist) and looking in sql/hive/target/HiveCompatibilitySuite.passed. You should also check out the logs in sql/hive/target/HiveCompatibilitySuite.failed and sql/hive/target/HiveCompatibilitySuite.wrong to make sure we aren't missing any edge cases regarding timestamps.

+    "input14",
+    "input21",
+    "input_testsequencefile",
+    "insert1",
+    "insert2_overwrite_partitions",
+    "join32_lessSize",
+    "join_map_ppr",
+    "join_rc",
+    "lateral_view_outer",
+    "loadpart1",
+    "mapreduce1",
+    "mapreduce2",
+    "mapreduce4",
+    "mapreduce5",
+    "mapreduce6",
+    "mapreduce8",
+    "multi_insert_gby",
+    "multi_insert_gby3",
+    "multi_insert_lateral_view",
+    "orc_dictionary_threshold",
+    "orc_empty_files",
+    "orc_ends_with_nulls",
+    "parallel",
+    "parenthesis_star_by",
+    "partcols1",
+    "partition_serde_format",
+    "partition_wise_fileformat4",
+    "partition_wise_fileformat5",
+    "partition_wise_fileformat6",
+    "partition_wise_fileformat7",
+    "partition_wise_fileformat9",
+    "ppd2",
+    "ppd_clusterby",
+    "ppd_constant_expr",
+    "ppd_transform",
+    "rcfile_columnar",
+    "rcfile_lazydecompress",
+    "rcfile_null_value",
+    "rcfile_toleratecorruptions",
+    "rcfile_union",
+    "reduce_deduplicate",
+    "reduce_deduplicate_exclude_gby",
+    "reducesink_dedup",
+    "smb_mapjoin_6",
+    "smb_mapjoin_7",
+    "stats_aggregator_error_1",
+    "stats_publisher_error_1",
+    "transform_ppr1",
+    "transform_ppr2",
+    "udaf_histogram_numeric",
+    "udf8",
+    "union3",
+    "union33",
+    "union_remove_11",

chenghao-intel · 2014-04-01T15:06:04Z

Thank you @marmbrus , @rxin , both code and unittest whitelist have been updated and passed the unit test in my local.

chenghao-intel · 2014-04-01T15:20:15Z

BTW, the whitelist has been reordered (via sort command of linux shell) after adding more passed cases, and actually more cases would be added like decimal_2 / decimal_3, however, the precision part of decimal still couldn't be exactly match, leave it for further improvement.

marmbrus · 2014-04-01T22:48:44Z

sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/Cast.scala


  override def apply(input: Row): Any = {
    val evaluated = child.apply(input)
    if (evaluated == null) {
      null
    } else {
-      castingFunction(evaluated)
+      if(child.dataType == dataType) evaluated else cast(evaluated)


There should already be a rule that eliminates casts that don't do anything, so I think this check is unnecessary.

marmbrus · 2014-04-01T22:52:40Z

This is looking pretty good! A few small comments:

You should also add Timestamp to ScalaReflection.
You will need to check in any new golden files (if there are any) that were created in sql/hive/src/test/resources/.

pwendell · 2014-04-01T23:23:54Z

Jenkins, test this please.

AmplabJenkins · 2014-04-01T23:27:24Z

Merged build triggered.

AmplabJenkins · 2014-04-01T23:27:32Z

Merged build started.

AmplabJenkins · 2014-04-01T23:29:06Z

Merged build finished.

AmplabJenkins · 2014-04-01T23:29:07Z

Refer to this link for build results: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/13653/

marmbrus · 2014-04-01T23:37:41Z

Jenkins, test this please.

pwendell · 2014-04-02T05:27:39Z

Jenkins, test this please.

chenghao-intel · 2014-04-02T05:31:32Z

sorry, wait a minute, I am updating the golden files.

AmplabJenkins · 2014-04-02T05:32:24Z

Merged build triggered.

AmplabJenkins · 2014-04-02T05:32:34Z

Merged build started.

AmplabJenkins · 2014-04-02T06:31:58Z

Merged build finished.

AmplabJenkins · 2014-04-02T06:31:58Z

Refer to this link for build results: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/13665/

chenghao-intel · 2014-04-02T08:19:03Z

Can you start a new test please?

rxin · 2014-04-02T08:23:58Z

Jenkins, retest this please,

AmplabJenkins · 2014-04-02T08:27:25Z

Merged build triggered.

AmplabJenkins · 2014-04-02T08:27:35Z

Merged build started.

AmplabJenkins · 2014-04-02T09:31:59Z

Merged build finished. All automated tests passed.

AmplabJenkins · 2014-04-02T09:31:59Z

All automated tests passed.
Refer to this link for build results: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/13677/

marmbrus · 2014-04-03T01:21:06Z

@chenghao-intel three final things and then we can merge this! It needs to be rebased as I don't think it merges cleanly anymore (I also fixed the missing datatypes in ScalaReflection). I also added a test case for this code. Please add Timestamp to that. Finally, can you roll back all the spurious changes to the machine dependent golden files? You only need to add the ones that are new.

This should do it:

git checkout HEAD sql/hive/src/test/resources/golden
sbt hive/test
git status
<add new files>

Thanks!

…Exception occurs

…ral_view_outer)

chenghao-intel · 2014-04-03T04:40:19Z

thank you @marmbrus, I've done the final things, I think it's ready to be merged.

marmbrus · 2014-04-03T17:53:43Z

Jenkins, test this please.

marmbrus · 2014-04-03T20:31:36Z

Jenkins, retest this please

marmbrus · 2014-04-03T20:33:10Z

This LGTM as soon as we can get Jenkins to agree.

rxin · 2014-04-03T21:42:42Z

Jenkins, retest this please

AmplabJenkins · 2014-04-03T21:47:25Z

Merged build triggered.

AmplabJenkins · 2014-04-03T21:47:31Z

Merged build started.

rxin · 2014-04-03T22:33:01Z

merged. thanks!

AmplabJenkins · 2014-04-03T23:03:58Z

Merged build finished. All automated tests passed.

AmplabJenkins · 2014-04-03T23:03:58Z

All automated tests passed.
Refer to this link for build results: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/13737/

This PR includes: 1) Add new data type Timestamp 2) Add more data type casting base on Hive's Rule 3) Fix bug missing data type in both parsers (HiveQl & SQLParser). Author: Cheng Hao <[email protected]> Closes apache#275 from chenghao-intel/timestamp and squashes the following commits: df709e5 [Cheng Hao] Move orc_ends_with_nulls to blacklist 24b04b0 [Cheng Hao] Put 3 cases into the black lists(describe_pretty,describe_syntax,lateral_view_outer) fc512c2 [Cheng Hao] remove the unnecessary data type equality check in data casting d0d1919 [Cheng Hao] Add more data type for scala reflection 3259808 [Cheng Hao] Add the new Golden files 3823b97 [Cheng Hao] Update the UnitTest cases & add timestamp type for HiveQL 54a0489 [Cheng Hao] fix bug mapping to 0 (which is supposed to be null) when NumberFormatException occurs 9cb505c [Cheng Hao] Fix issues according to PR comments e529168 [Cheng Hao] Fix bug of converting from String 6fc8100 [Cheng Hao] Update Unit Test & CodeStyle 8a1d4d6 [Cheng Hao] Add DataType for SqlParser ce4385e [Cheng Hao] Add TimestampType Support

* upgrades * Update deps list

* [SPARK-658] Updated the "Custom Installation" section of the "Install and Customize" doc page. * Fixed typo in "Limitations" doc ("use" => "user")

rxin reviewed Mar 31, 2014
View reviewed changes

marmbrus reviewed Apr 1, 2014
View reviewed changes

chenghao-intel added 7 commits April 3, 2014 10:19

fix bug mapping to 0 (which is supposed to be null) when NumberFormat…

54a0489

…Exception occurs

Update the UnitTest cases & add timestamp type for HiveQL

3823b97

Add the new Golden files

3259808

Add more data type for scala reflection

d0d1919

remove the unnecessary data type equality check in data casting

fc512c2

Put 3 cases into the black lists(describe_pretty,describe_syntax,late…

24b04b0

…ral_view_outer)

Move orc_ends_with_nulls to blacklist

df709e5

asfgit closed this in 5d1feda Apr 3, 2014

chenghao-intel deleted the timestamp branch April 4, 2014 04:28

marmbrus mentioned this pull request Apr 8, 2014

Remove extendedDebugInfo option in test build settings. #346

Closed

rahij pushed a commit to rahij/spark that referenced this pull request Dec 5, 2017

Upgrade okhttp/jetty to latest (apache#275)

c2520fa

* upgrades * Update deps list

bzhaoopenstack pushed a commit to bzhaoopenstack/spark that referenced this pull request Sep 11, 2019

Switch to use HK region of huaweicloud (apache#275)

def08d9

wangyum mentioned this pull request Aug 19, 2020

[SPARK-32444][SQL] Infer filters from DPP #29243

Closed

arjunshroff pushed a commit to arjunshroff/spark that referenced this pull request Nov 24, 2020

Read cluster configs from configMap (apache#275)

d88b2ba

[SPARK-1360] Add Timestamp Support for SQL #275

[SPARK-1360] Add Timestamp Support for SQL #275

Conversation

chenghao-intel commented Mar 31, 2014

AmplabJenkins commented Mar 31, 2014

chenghao-intel commented Mar 31, 2014

rxin Mar 31, 2014

Choose a reason for hiding this comment

marmbrus commented Mar 31, 2014

chenghao-intel commented Apr 1, 2014

chenghao-intel commented Apr 1, 2014

marmbrus Apr 1, 2014

Choose a reason for hiding this comment

marmbrus commented Apr 1, 2014

pwendell commented Apr 1, 2014

AmplabJenkins commented Apr 1, 2014

AmplabJenkins commented Apr 1, 2014

AmplabJenkins commented Apr 1, 2014

AmplabJenkins commented Apr 1, 2014

marmbrus commented Apr 1, 2014

pwendell commented Apr 2, 2014

chenghao-intel commented Apr 2, 2014

AmplabJenkins commented Apr 2, 2014

AmplabJenkins commented Apr 2, 2014

AmplabJenkins commented Apr 2, 2014

AmplabJenkins commented Apr 2, 2014

chenghao-intel commented Apr 2, 2014

rxin commented Apr 2, 2014

AmplabJenkins commented Apr 2, 2014

AmplabJenkins commented Apr 2, 2014

AmplabJenkins commented Apr 2, 2014

AmplabJenkins commented Apr 2, 2014

marmbrus commented Apr 3, 2014

chenghao-intel commented Apr 3, 2014

marmbrus commented Apr 3, 2014

marmbrus commented Apr 3, 2014

marmbrus commented Apr 3, 2014

rxin commented Apr 3, 2014

AmplabJenkins commented Apr 3, 2014

AmplabJenkins commented Apr 3, 2014

rxin commented Apr 3, 2014

AmplabJenkins commented Apr 3, 2014

AmplabJenkins commented Apr 3, 2014