-
Notifications
You must be signed in to change notification settings - Fork 28.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[SPARK-39339][SQL][FOLLOWUP] Fix bug TimestampNTZ type in JDBC data source is incorrect #37013
Conversation
ping @gengliangwang cc @cloud-fan @sadikovi |
@@ -599,10 +610,13 @@ object JdbcUtils extends Logging with SQLConfHelper { | |||
stmt.setTimestamp(pos + 1, toJavaTimestamp(instantToMicros(row.getAs[Instant](pos)))) | |||
} else { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
since we are touching the code here, let's make it right. I think for ntz type, its value is always java.time.LocalDateTime
, no matter datetimeJava8ApiEnabled
is true or not, right? @MaxGekk @gengliangwang
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, TimestampNTZ is independent of the configuration.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Updated.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for submitting the PR. Can you elaborate on why these changes are necessary and provide a clear explanation of the problem and solution in the PR description and/or comments? This will help us to review this PR.
The current explanation is basically non-existent and just states that it is a bug; however, previously all of the tests passed in CI and on my local machine. It is likely that the existing tests did not cover this corner case so it would be good to outline what that case was.
@@ -1939,17 +1939,22 @@ class JDBCSuite extends QueryTest | |||
val df = spark.sql(s"select timestamp_ntz '$datetime'") | |||
df.write.format("jdbc") | |||
.mode("overwrite") | |||
.option("inferTimestampNTZType", "true") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
inferTimestampNTZType
is only applied for reads. Could you remove this option here?
val seconds = Math.floorDiv(micros, DateTimeConstants.MICROS_PER_SECOND) | ||
val nanos = (micros - seconds * DateTimeConstants.MICROS_PER_SECOND) * | ||
DateTimeConstants.NANOS_PER_MICROS | ||
val result = new java.sql.Timestamp(seconds * DateTimeConstants.MILLIS_PER_SECOND) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Isn't it what toJavaTimestamp
does already?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Because toJavaTimestamp
based on the JVM system time zone, we must keep it without time zone.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are you saying that toJavaTimestamp
rebases the original value, i.e. adds an offset, so java.sql.Timestamp
ends up with that offset causing the date to be different?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1 with @sadikovi
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The offset based on JVM system time zone.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@beliefer it's basically the same except for rebasing the old timestmaps
def toJavaTimestamp(micros: Long): Timestamp = {
val rebasedMicros = rebaseGregorianToJulianMicros(micros)
val seconds = Math.floorDiv(rebasedMicros, MICROS_PER_SECOND)
val ts = new Timestamp(seconds * MILLIS_PER_SECOND)
val nanos = (rebasedMicros - seconds * MICROS_PER_SECOND) * NANOS_PER_MICROS
ts.setNanos(nanos.toInt)
ts
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes
Please update the PR description with the clear explanation of the bug and how the solution fixes the problem. |
@beliefer could you provide the test case itself in the PR description? |
I think the problem the unit test is trying to address is what happens when writes are in one time zone but reads are in another. This explains why there are 2 loops: one is for writes and the other one is for reads. I am happy with the fix, thanks for working on it but I am also curious why it fails on |
I believe so, |
(rs: ResultSet, row: InternalRow, pos: Int) => | ||
val t = rs.getTimestamp(pos + 1) | ||
if (t != null) { | ||
val micros = DateTimeUtils.millisToMicros(t.getTime) + |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
to make it more readable, can we add DateTimeUtils.fromJavaTimestampNoRebase
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK
(stmt: PreparedStatement, row: Row, pos: Int) => | ||
val micros = localDateTimeToMicros(row.getAs[java.time.LocalDateTime](pos)) | ||
val seconds = Math.floorDiv(micros, DateTimeConstants.MICROS_PER_SECOND) | ||
val nanos = (micros - seconds * DateTimeConstants.MICROS_PER_SECOND) * |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ditto, add toJavaTimestampNoRebase
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK
b80959b
to
ebe69c5
Compare
* @param micros The number of microseconds since 1970-01-01T00:00:00.000000Z. | ||
* @return A `java.sql.Timestamp` from number of micros since epoch. | ||
*/ | ||
def toJavaTimestampNoRebase(micros: Long): Timestamp = { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is it possible to share code with toJavaTimestamp
? e.g.
def toJavaTimestamp ... {
toJavaTimestampNoRebase(rebaseGregorianToJulianMicros(micros))
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Got it.
Thanks, merging to master |
@gengliangwang @cloud-fan @sadikovi Thank you ! |
### What changes were proposed in this pull request? #36726 supports TimestampNTZ type in JDBC data source and #37013 applies a fix to pass more test cases with H2. The problem is that Java Timestamp is a poorly defined class and different JDBC drivers implement "getTimestamp" and "setTimestamp" with different expected behaviors in mind. The general conversion implementation would work with some JDBC dialects and their drivers but not others. This issue is discovered when testing with PostgreSQL database. This PR adds a `dialect` parameter to `makeGetter` for applying dialect specific conversions when reading a Java Timestamp into TimestampNTZType. `makeSetter` already has a `dialect` field and we will use that for converting back to Java Timestamp. ### Why are the changes needed? Fix TimestampNTZ support for PostgreSQL. Allows other JDBC dialects to provide dialect specific implementation for converting between Java Timestamp and Spark TimestampNTZType. ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? Existing unit test. I added new test cases for `PostgresIntegrationSuite` to cover TimestampNTZ read and writes. Closes #40678 from tianhanhu/SPARK-43040_jdbc_timestamp_ntz. Authored-by: tianhanhu <[email protected]> Signed-off-by: Wenchen Fan <[email protected]>
### What changes were proposed in this pull request? apache#36726 supports TimestampNTZ type in JDBC data source and apache#37013 applies a fix to pass more test cases with H2. The problem is that Java Timestamp is a poorly defined class and different JDBC drivers implement "getTimestamp" and "setTimestamp" with different expected behaviors in mind. The general conversion implementation would work with some JDBC dialects and their drivers but not others. This issue is discovered when testing with PostgreSQL database. This PR adds a `dialect` parameter to `makeGetter` for applying dialect specific conversions when reading a Java Timestamp into TimestampNTZType. `makeSetter` already has a `dialect` field and we will use that for converting back to Java Timestamp. ### Why are the changes needed? Fix TimestampNTZ support for PostgreSQL. Allows other JDBC dialects to provide dialect specific implementation for converting between Java Timestamp and Spark TimestampNTZType. ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? Existing unit test. I added new test cases for `PostgresIntegrationSuite` to cover TimestampNTZ read and writes. Closes apache#40678 from tianhanhu/SPARK-43040_jdbc_timestamp_ntz. Authored-by: tianhanhu <[email protected]> Signed-off-by: Wenchen Fan <[email protected]>
What changes were proposed in this pull request?
#36726 supports TimestampNTZ type in JDBC data source.
But the implement is incorrect.
This PR just modify a test case and it will be failed !
The test case show below.
The test case output failure show below.
Why are the changes needed?
Fix an implement bug.
The reason of the bug is use
toJavaTimestamp
andfromJavaTimestamp
.toJavaTimestamp
andfromJavaTimestamp
lead to the timestamp with JVM system time zone.Does this PR introduce any user-facing change?
'No'.
New feature.
How was this patch tested?
New test case.