Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SPARK-1360] Add Timestamp Support for SQL #275

Closed
wants to merge 12 commits into from

Conversation

chenghao-intel
Copy link
Contributor

This PR includes:

  1. Add new data type Timestamp
  2. Add more data type casting base on Hive's Rule
  3. Fix bug missing data type in both parsers (HiveQl & SQLParser).

@AmplabJenkins
Copy link

Can one of the admins verify this patch?

@chenghao-intel
Copy link
Contributor Author

Sorry, some bugs in the unit test, I will look at those.

@@ -17,6 +17,9 @@

package org.apache.spark.sql.catalyst.expressions

import java.sql.Timestamp
import java.lang.{NumberFormatException => NFE}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we not shorten this? I am not sure if NFE is very obvious to readers

@marmbrus
Copy link
Contributor

Overall, I think this looks pretty good. After you figure out the failing tests, you should also whitelist a bunch of tests that were only failing because we didn't have timestamp support.

I got the list below by running sbt/sbt -Dspark.hive.alltests hive/test (this runs all tests even if they aren't on the whitelist) and looking in sql/hive/target/HiveCompatibilitySuite.passed. You should also check out the logs in sql/hive/target/HiveCompatibilitySuite.failed and sql/hive/target/HiveCompatibilitySuite.wrong to make sure we aren't missing any edge cases regarding timestamps.

+    "input14",
+    "input21",
+    "input_testsequencefile",
+    "insert1",
+    "insert2_overwrite_partitions",
+    "join32_lessSize",
+    "join_map_ppr",
+    "join_rc",
+    "lateral_view_outer",
+    "loadpart1",
+    "mapreduce1",
+    "mapreduce2",
+    "mapreduce4",
+    "mapreduce5",
+    "mapreduce6",
+    "mapreduce8",
+    "multi_insert_gby",
+    "multi_insert_gby3",
+    "multi_insert_lateral_view",
+    "orc_dictionary_threshold",
+    "orc_empty_files",
+    "orc_ends_with_nulls",
+    "parallel",
+    "parenthesis_star_by",
+    "partcols1",
+    "partition_serde_format",
+    "partition_wise_fileformat4",
+    "partition_wise_fileformat5",
+    "partition_wise_fileformat6",
+    "partition_wise_fileformat7",
+    "partition_wise_fileformat9",
+    "ppd2",
+    "ppd_clusterby",
+    "ppd_constant_expr",
+    "ppd_transform",
+    "rcfile_columnar",
+    "rcfile_lazydecompress",
+    "rcfile_null_value",
+    "rcfile_toleratecorruptions",
+    "rcfile_union",
+    "reduce_deduplicate",
+    "reduce_deduplicate_exclude_gby",
+    "reducesink_dedup",
+    "smb_mapjoin_6",
+    "smb_mapjoin_7",
+    "stats_aggregator_error_1",
+    "stats_publisher_error_1",
+    "transform_ppr1",
+    "transform_ppr2",
+    "udaf_histogram_numeric",
+    "udf8",
+    "union3",
+    "union33",
+    "union_remove_11",

@chenghao-intel
Copy link
Contributor Author

Thank you @marmbrus , @rxin , both code and unittest whitelist have been updated and passed the unit test in my local.

@chenghao-intel
Copy link
Contributor Author

BTW, the whitelist has been reordered (via sort command of linux shell) after adding more passed cases, and actually more cases would be added like decimal_2 / decimal_3, however, the precision part of decimal still couldn't be exactly match, leave it for further improvement.


override def apply(input: Row): Any = {
val evaluated = child.apply(input)
if (evaluated == null) {
null
} else {
castingFunction(evaluated)
if(child.dataType == dataType) evaluated else cast(evaluated)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There should already be a rule that eliminates casts that don't do anything, so I think this check is unnecessary.

@marmbrus
Copy link
Contributor

marmbrus commented Apr 1, 2014

This is looking pretty good! A few small comments:

  • You should also add Timestamp to ScalaReflection.
  • You will need to check in any new golden files (if there are any) that were created in sql/hive/src/test/resources/.

@pwendell
Copy link
Contributor

pwendell commented Apr 1, 2014

Jenkins, test this please.

@AmplabJenkins
Copy link

Merged build triggered.

@AmplabJenkins
Copy link

Merged build started.

@AmplabJenkins
Copy link

Merged build finished.

@AmplabJenkins
Copy link

Refer to this link for build results: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/13653/

@marmbrus
Copy link
Contributor

marmbrus commented Apr 1, 2014

Jenkins, test this please.

1 similar comment
@pwendell
Copy link
Contributor

pwendell commented Apr 2, 2014

Jenkins, test this please.

@chenghao-intel
Copy link
Contributor Author

sorry, wait a minute, I am updating the golden files.

@AmplabJenkins
Copy link

Merged build triggered.

@AmplabJenkins
Copy link

Merged build started.

@AmplabJenkins
Copy link

Merged build finished.

@AmplabJenkins
Copy link

Refer to this link for build results: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/13665/

@chenghao-intel
Copy link
Contributor Author

Can you start a new test please?

@rxin
Copy link
Contributor

rxin commented Apr 2, 2014

Jenkins, retest this please,

@AmplabJenkins
Copy link

Merged build triggered.

@AmplabJenkins
Copy link

Merged build started.

@AmplabJenkins
Copy link

Merged build finished. All automated tests passed.

@AmplabJenkins
Copy link

All automated tests passed.
Refer to this link for build results: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/13677/

@marmbrus
Copy link
Contributor

marmbrus commented Apr 3, 2014

@chenghao-intel three final things and then we can merge this! It needs to be rebased as I don't think it merges cleanly anymore (I also fixed the missing datatypes in ScalaReflection). I also added a test case for this code. Please add Timestamp to that. Finally, can you roll back all the spurious changes to the machine dependent golden files? You only need to add the ones that are new.

This should do it:

git checkout HEAD sql/hive/src/test/resources/golden
sbt hive/test
git status
<add new files>

Thanks!

@chenghao-intel
Copy link
Contributor Author

thank you @marmbrus, I've done the final things, I think it's ready to be merged.

@marmbrus
Copy link
Contributor

marmbrus commented Apr 3, 2014

Jenkins, test this please.

@marmbrus
Copy link
Contributor

marmbrus commented Apr 3, 2014

Jenkins, retest this please

@marmbrus
Copy link
Contributor

marmbrus commented Apr 3, 2014

This LGTM as soon as we can get Jenkins to agree.

@rxin
Copy link
Contributor

rxin commented Apr 3, 2014

Jenkins, retest this please

@AmplabJenkins
Copy link

Merged build triggered.

@AmplabJenkins
Copy link

Merged build started.

@rxin
Copy link
Contributor

rxin commented Apr 3, 2014

merged. thanks!

@asfgit asfgit closed this in 5d1feda Apr 3, 2014
@AmplabJenkins
Copy link

Merged build finished. All automated tests passed.

@AmplabJenkins
Copy link

All automated tests passed.
Refer to this link for build results: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/13737/

@chenghao-intel chenghao-intel deleted the timestamp branch April 4, 2014 04:28
pdeyhim pushed a commit to pdeyhim/spark-1 that referenced this pull request Jun 25, 2014
This PR includes:
1) Add new data type Timestamp
2) Add more data type casting base on Hive's Rule
3) Fix bug missing data type in both parsers (HiveQl & SQLParser).

Author: Cheng Hao <[email protected]>

Closes apache#275 from chenghao-intel/timestamp and squashes the following commits:

df709e5 [Cheng Hao] Move orc_ends_with_nulls to blacklist
24b04b0 [Cheng Hao] Put 3 cases into the black lists(describe_pretty,describe_syntax,lateral_view_outer)
fc512c2 [Cheng Hao] remove the unnecessary data type equality check in data casting
d0d1919 [Cheng Hao] Add more data type for scala reflection
3259808 [Cheng Hao] Add the new Golden files
3823b97 [Cheng Hao] Update the UnitTest cases & add timestamp type for HiveQL
54a0489 [Cheng Hao] fix bug mapping to 0 (which is supposed to be null) when NumberFormatException occurs
9cb505c [Cheng Hao] Fix issues according to PR comments
e529168 [Cheng Hao] Fix bug of converting from String
6fc8100 [Cheng Hao] Update Unit Test & CodeStyle
8a1d4d6 [Cheng Hao] Add DataType for SqlParser
ce4385e [Cheng Hao] Add TimestampType Support
rahij pushed a commit to rahij/spark that referenced this pull request Dec 5, 2017
* upgrades

* Update deps list
Igosuki pushed a commit to Adikteev/spark that referenced this pull request Jul 31, 2018
* [SPARK-658] Updated the "Custom Installation" section of the "Install and Customize" doc page.

* Fixed typo in "Limitations" doc ("use" => "user")
bzhaoopenstack pushed a commit to bzhaoopenstack/spark that referenced this pull request Sep 11, 2019
arjunshroff pushed a commit to arjunshroff/spark that referenced this pull request Nov 24, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants