[SPARK-39339][SQL][FOLLOWUP] Fix bug TimestampNTZ type in JDBC data source is incorrect #37013

beliefer · 2022-06-28T08:42:24Z

What changes were proposed in this pull request?

#36726 supports TimestampNTZ type in JDBC data source.
But the implement is incorrect.
This PR just modify a test case and it will be failed !
The test case show below.

  test("SPARK-39339: TimestampNTZType with different local time zones") {
    val tableName = "timestamp_ntz_diff_tz_support_table"

    DateTimeTestUtils.outstandingZoneIds.foreach { zoneId =>
      DateTimeTestUtils.withDefaultTimeZone(zoneId) {
        Seq(
          "1972-07-04 03:30:00",
          "2019-01-20 12:00:00.502",
          "2019-01-20T00:00:00.123456",
          "1500-01-20T00:00:00.123456"
        ).foreach { case datetime =>
          val df = spark.sql(s"select timestamp_ntz '$datetime'")
          df.write.format("jdbc")
            .mode("overwrite")
            .option("url", urlWithUserAndPass)
            .option("dbtable", tableName)
            .save()

          DateTimeTestUtils.outstandingZoneIds.foreach { zoneId =>
            DateTimeTestUtils.withDefaultTimeZone(zoneId) {
              val res = spark.read.format("jdbc")
                .option("inferTimestampNTZType", "true")
                .option("url", urlWithUserAndPass)
                .option("dbtable", tableName)
                .load()

              checkAnswer(res, df)
            }
          }
        }
      }
    }
  }

The test case output failure show below.

Results do not match for query:
Timezone: sun.util.calendar.ZoneInfo[id="Africa/Dakar",offset=0,dstSavings=0,useDaylight=false,transitions=3,lastRule=null]
Timezone Env: 

== Parsed Logical Plan ==
Relation [TIMESTAMP_NTZ '1500-01-20 00:00:00.123456'#253] JDBCRelation(timestamp_ntz_diff_tz_support_table) [numPartitions=1]

== Analyzed Logical Plan ==
TIMESTAMP_NTZ '1500-01-20 00:00:00.123456': timestamp_ntz
Relation [TIMESTAMP_NTZ '1500-01-20 00:00:00.123456'#253] JDBCRelation(timestamp_ntz_diff_tz_support_table) [numPartitions=1]

== Optimized Logical Plan ==
Relation [TIMESTAMP_NTZ '1500-01-20 00:00:00.123456'#253] JDBCRelation(timestamp_ntz_diff_tz_support_table) [numPartitions=1]

== Physical Plan ==
*(1) Scan JDBCRelation(timestamp_ntz_diff_tz_support_table) [numPartitions=1] [TIMESTAMP_NTZ '1500-01-20 00:00:00.123456'#253] PushedFilters: [], ReadSchema: struct<TIMESTAMP_NTZ '1500-01-20 00:00:00.123456':timestamp_ntz>

== Results ==

== Results ==
!== Correct Answer - 1 ==                                           == Spark Answer - 1 ==
 struct<TIMESTAMP_NTZ '1500-01-20 00:00:00.123456':timestamp_ntz>   struct<TIMESTAMP_NTZ '1500-01-20 00:00:00.123456':timestamp_ntz>
![1500-01-20T00:00:00.123456]                                       [1500-01-20T00:16:08.123456]
    
       
ScalaTestFailureLocation: org.apache.spark.sql.QueryTest$ at (QueryTest.scala:243)
org.scalatest.exceptions.TestFailedException:

Why are the changes needed?

Fix an implement bug.
The reason of the bug is use toJavaTimestamp and fromJavaTimestamp.
toJavaTimestamp and fromJavaTimestamp lead to the timestamp with JVM system time zone.

Does this PR introduce any user-facing change?

'No'.
New feature.

How was this patch tested?

New test case.

beliefer · 2022-06-28T09:59:45Z

ping @gengliangwang cc @cloud-fan @sadikovi

cloud-fan · 2022-06-28T15:09:42Z

sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/jdbc/JdbcUtils.scala

@@ -599,10 +610,13 @@ object JdbcUtils extends Logging with SQLConfHelper {
          stmt.setTimestamp(pos + 1, toJavaTimestamp(instantToMicros(row.getAs[Instant](pos))))
      } else {


since we are touching the code here, let's make it right. I think for ntz type, its value is always java.time.LocalDateTime, no matter datetimeJava8ApiEnabled is true or not, right? @MaxGekk @gengliangwang

Yes, TimestampNTZ is independent of the configuration.

sadikovi

Thanks for submitting the PR. Can you elaborate on why these changes are necessary and provide a clear explanation of the problem and solution in the PR description and/or comments? This will help us to review this PR.

The current explanation is basically non-existent and just states that it is a bug; however, previously all of the tests passed in CI and on my local machine. It is likely that the existing tests did not cover this corner case so it would be good to outline what that case was.

sadikovi · 2022-06-28T21:58:21Z

sql/core/src/test/scala/org/apache/spark/sql/jdbc/JDBCSuite.scala

@@ -1939,17 +1939,22 @@ class JDBCSuite extends QueryTest
          val df = spark.sql(s"select timestamp_ntz '$datetime'")
          df.write.format("jdbc")
            .mode("overwrite")
+            .option("inferTimestampNTZType", "true")


inferTimestampNTZType is only applied for reads. Could you remove this option here?

sadikovi · 2022-06-28T21:59:01Z

sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/jdbc/JdbcUtils.scala

+          val seconds = Math.floorDiv(micros, DateTimeConstants.MICROS_PER_SECOND)
+          val nanos = (micros - seconds * DateTimeConstants.MICROS_PER_SECOND) *
+            DateTimeConstants.NANOS_PER_MICROS
+          val result = new java.sql.Timestamp(seconds * DateTimeConstants.MILLIS_PER_SECOND)


Isn't it what toJavaTimestamp does already?

Because toJavaTimestamp based on the JVM system time zone, we must keep it without time zone.

Are you saying that toJavaTimestamp rebases the original value, i.e. adds an offset, so java.sql.Timestamp ends up with that offset causing the date to be different?

+1 with @sadikovi

The offset based on JVM system time zone.

@beliefer it's basically the same except for rebasing the old timestmaps

def toJavaTimestamp(micros: Long): Timestamp = { val rebasedMicros = rebaseGregorianToJulianMicros(micros) val seconds = Math.floorDiv(rebasedMicros, MICROS_PER_SECOND) val ts = new Timestamp(seconds * MILLIS_PER_SECOND) val nanos = (rebasedMicros - seconds * MICROS_PER_SECOND) * NANOS_PER_MICROS ts.setNanos(nanos.toInt) ts }

sadikovi · 2022-06-29T06:20:23Z

Please update the PR description with the clear explanation of the bug and how the solution fixes the problem.

gengliangwang · 2022-06-30T00:06:23Z

This PR just modify a test case and it will be failed !
The test case output failure show below.

@beliefer could you provide the test case itself in the PR description?
It seems that only"1500-01-20T00:00:00.123456" failed? Why is that?

sadikovi · 2022-06-30T00:21:21Z

I think the problem the unit test is trying to address is what happens when writes are in one time zone but reads are in another. This explains why there are 2 loops: one is for writes and the other one is for reads.

I am happy with the fix, thanks for working on it but I am also curious why it fails on 1500-01-20T00:00:00.123456. Is it because of the calendar that Spark uses?

cloud-fan · 2022-06-30T00:58:05Z

I am also curious why it fails on 1500-01-20T00:00:00.123456. Is it because of the calendar that Spark uses?

I believe so, toJavaTimestamp takes care of the legacy calendar, which applies to old dates like the above one.

cloud-fan · 2022-06-30T05:42:18Z

sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/jdbc/JdbcUtils.scala

+      (rs: ResultSet, row: InternalRow, pos: Int) =>
+        val t = rs.getTimestamp(pos + 1)
+        if (t != null) {
+          val micros = DateTimeUtils.millisToMicros(t.getTime) +


to make it more readable, can we add DateTimeUtils.fromJavaTimestampNoRebase?

cloud-fan · 2022-06-30T05:42:29Z

sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/jdbc/JdbcUtils.scala

+      (stmt: PreparedStatement, row: Row, pos: Int) =>
+        val micros = localDateTimeToMicros(row.getAs[java.time.LocalDateTime](pos))
+        val seconds = Math.floorDiv(micros, DateTimeConstants.MICROS_PER_SECOND)
+        val nanos = (micros - seconds * DateTimeConstants.MICROS_PER_SECOND) *


ditto, add toJavaTimestampNoRebase

… incorrect

cloud-fan · 2022-06-30T06:24:39Z

sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/util/DateTimeUtils.scala

+   * @param micros The number of microseconds since 1970-01-01T00:00:00.000000Z.
+   * @return A `java.sql.Timestamp` from number of micros since epoch.
+   */
+  def toJavaTimestampNoRebase(micros: Long): Timestamp = {


is it possible to share code with toJavaTimestamp? e.g.

def toJavaTimestamp ... { toJavaTimestampNoRebase(rebaseGregorianToJulianMicros(micros)) }

gengliangwang · 2022-06-30T20:54:35Z

Thanks, merging to master

beliefer · 2022-07-01T00:47:18Z

@gengliangwang @cloud-fan @sadikovi Thank you !

### What changes were proposed in this pull request? #36726 supports TimestampNTZ type in JDBC data source and #37013 applies a fix to pass more test cases with H2. The problem is that Java Timestamp is a poorly defined class and different JDBC drivers implement "getTimestamp" and "setTimestamp" with different expected behaviors in mind. The general conversion implementation would work with some JDBC dialects and their drivers but not others. This issue is discovered when testing with PostgreSQL database. This PR adds a `dialect` parameter to `makeGetter` for applying dialect specific conversions when reading a Java Timestamp into TimestampNTZType. `makeSetter` already has a `dialect` field and we will use that for converting back to Java Timestamp. ### Why are the changes needed? Fix TimestampNTZ support for PostgreSQL. Allows other JDBC dialects to provide dialect specific implementation for converting between Java Timestamp and Spark TimestampNTZType. ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? Existing unit test. I added new test cases for `PostgresIntegrationSuite` to cover TimestampNTZ read and writes. Closes #40678 from tianhanhu/SPARK-43040_jdbc_timestamp_ntz. Authored-by: tianhanhu <[email protected]> Signed-off-by: Wenchen Fan <[email protected]>

### What changes were proposed in this pull request? apache#36726 supports TimestampNTZ type in JDBC data source and apache#37013 applies a fix to pass more test cases with H2. The problem is that Java Timestamp is a poorly defined class and different JDBC drivers implement "getTimestamp" and "setTimestamp" with different expected behaviors in mind. The general conversion implementation would work with some JDBC dialects and their drivers but not others. This issue is discovered when testing with PostgreSQL database. This PR adds a `dialect` parameter to `makeGetter` for applying dialect specific conversions when reading a Java Timestamp into TimestampNTZType. `makeSetter` already has a `dialect` field and we will use that for converting back to Java Timestamp. ### Why are the changes needed? Fix TimestampNTZ support for PostgreSQL. Allows other JDBC dialects to provide dialect specific implementation for converting between Java Timestamp and Spark TimestampNTZType. ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? Existing unit test. I added new test cases for `PostgresIntegrationSuite` to cover TimestampNTZ read and writes. Closes apache#40678 from tianhanhu/SPARK-43040_jdbc_timestamp_ntz. Authored-by: tianhanhu <[email protected]> Signed-off-by: Wenchen Fan <[email protected]>

github-actions bot added the SQL label Jun 28, 2022

beliefer changed the title ~~[SPARK-39339][SQL][FOLLOWUP] TimestampNTZ type in JDBC data source is incorrect~~ [WIP][SPARK-39339][SQL][FOLLOWUP] TimestampNTZ type in JDBC data source is incorrect Jun 28, 2022

beliefer changed the title ~~[WIP][SPARK-39339][SQL][FOLLOWUP] TimestampNTZ type in JDBC data source is incorrect~~ [SPARK-39339][SQL][FOLLOWUP] TimestampNTZ type in JDBC data source is incorrect Jun 28, 2022

beliefer changed the title ~~[SPARK-39339][SQL][FOLLOWUP] TimestampNTZ type in JDBC data source is incorrect~~ [SPARK-39339][SQL][FOLLOWUP] Fix bug TimestampNTZ type in JDBC data source is incorrect Jun 28, 2022

cloud-fan reviewed Jun 28, 2022

View reviewed changes

sadikovi suggested changes Jun 28, 2022

View reviewed changes

cloud-fan reviewed Jun 30, 2022

View reviewed changes

cloud-fan approved these changes Jun 30, 2022

View reviewed changes

gengliangwang approved these changes Jun 30, 2022

View reviewed changes

beliefer added 5 commits June 30, 2022 14:18

[SPARK-39339][SQL][FOLLOWUP] TimestampNTZ type in JDBC data source is…

7e3bb0e

… incorrect

Update code

3a7de50

Update code

e2ce5d8

Update code

09abec5

Update code

ebe69c5

beliefer force-pushed the SPARK-39339_followup branch from b80959b to ebe69c5 Compare June 30, 2022 06:18

cloud-fan reviewed Jun 30, 2022

View reviewed changes

Update code

355627f

cloud-fan approved these changes Jun 30, 2022

View reviewed changes

gengliangwang closed this in ede4b7d Jun 30, 2022

tianhanhu mentioned this pull request Apr 5, 2023

[SPARK-43040][SQL] Improve TimestampNTZ type support in JDBC data source #40678

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[SPARK-39339][SQL][FOLLOWUP] Fix bug TimestampNTZ type in JDBC data source is incorrect #37013

[SPARK-39339][SQL][FOLLOWUP] Fix bug TimestampNTZ type in JDBC data source is incorrect #37013

beliefer commented Jun 28, 2022 •

edited

Loading

beliefer commented Jun 28, 2022 •

edited

Loading

cloud-fan Jun 28, 2022

gengliangwang Jun 29, 2022

beliefer Jun 30, 2022

sadikovi left a comment

sadikovi Jun 28, 2022

sadikovi Jun 28, 2022

beliefer Jun 29, 2022 •

edited

Loading

sadikovi Jun 29, 2022

gengliangwang Jun 30, 2022

beliefer Jun 30, 2022

gengliangwang Jun 30, 2022

beliefer Jun 30, 2022

sadikovi commented Jun 29, 2022

gengliangwang commented Jun 30, 2022 •

edited

Loading

sadikovi commented Jun 30, 2022

cloud-fan commented Jun 30, 2022

cloud-fan Jun 30, 2022

beliefer Jun 30, 2022

cloud-fan Jun 30, 2022 •

edited

Loading

beliefer Jun 30, 2022

cloud-fan Jun 30, 2022

beliefer Jun 30, 2022

gengliangwang commented Jun 30, 2022

beliefer commented Jul 1, 2022

		@@ -599,10 +610,13 @@ object JdbcUtils extends Logging with SQLConfHelper {
		stmt.setTimestamp(pos + 1, toJavaTimestamp(instantToMicros(row.getAs[Instant](pos))))
		} else {

[SPARK-39339][SQL][FOLLOWUP] Fix bug TimestampNTZ type in JDBC data source is incorrect #37013

[SPARK-39339][SQL][FOLLOWUP] Fix bug TimestampNTZ type in JDBC data source is incorrect #37013

Conversation

beliefer commented Jun 28, 2022 • edited Loading

What changes were proposed in this pull request?

Why are the changes needed?

Does this PR introduce any user-facing change?

How was this patch tested?

beliefer commented Jun 28, 2022 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

sadikovi left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

beliefer Jun 29, 2022 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

sadikovi commented Jun 29, 2022

gengliangwang commented Jun 30, 2022 • edited Loading

sadikovi commented Jun 30, 2022

cloud-fan commented Jun 30, 2022

Choose a reason for hiding this comment

Choose a reason for hiding this comment

cloud-fan Jun 30, 2022 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

gengliangwang commented Jun 30, 2022

beliefer commented Jul 1, 2022

beliefer commented Jun 28, 2022 •

edited

Loading

beliefer commented Jun 28, 2022 •

edited

Loading

beliefer Jun 29, 2022 •

edited

Loading

gengliangwang commented Jun 30, 2022 •

edited

Loading

cloud-fan Jun 30, 2022 •

edited

Loading