Skip to content

Commit

Permalink
[SPARK-20482][SQL] Resolving Casts is too strict on having time zone set
Browse files Browse the repository at this point in the history
## What changes were proposed in this pull request?

Relax the requirement that a `TimeZoneAwareExpression` has to have its `timeZoneId` set to be considered resolved.
With this change, a `Cast` (which is a `TimeZoneAwareExpression`) can be considered resolved if the `(fromType, toType)` combination doesn't require time zone information.

Also de-relaxed test cases in `CastSuite` so Casts in that test suite don't get a default`timeZoneId = Option("GMT")`.

## How was this patch tested?

Ran the de-relaxed`CastSuite` and it's passing. Also ran the SQL unit tests and they're passing too.

Author: Kris Mok <[email protected]>

Closes #17777 from rednaxelafx/fix-catalyst-cast-timezone.
  • Loading branch information
rednaxelafx authored and gatorsmile committed Apr 27, 2017
1 parent 85c6ce6 commit 26ac2ce
Show file tree
Hide file tree
Showing 2 changed files with 34 additions and 2 deletions.
Original file line number Diff line number Diff line change
Expand Up @@ -89,6 +89,31 @@ object Cast {
case _ => false
}

/**
* Return true if we need to use the `timeZone` information casting `from` type to `to` type.
* The patterns matched reflect the current implementation in the Cast node.
* c.f. usage of `timeZone` in:
* * Cast.castToString
* * Cast.castToDate
* * Cast.castToTimestamp
*/
def needsTimeZone(from: DataType, to: DataType): Boolean = (from, to) match {
case (StringType, TimestampType) => true
case (DateType, TimestampType) => true
case (TimestampType, StringType) => true
case (TimestampType, DateType) => true
case (ArrayType(fromType, _), ArrayType(toType, _)) => needsTimeZone(fromType, toType)
case (MapType(fromKey, fromValue, _), MapType(toKey, toValue, _)) =>
needsTimeZone(fromKey, toKey) || needsTimeZone(fromValue, toValue)
case (StructType(fromFields), StructType(toFields)) =>
fromFields.length == toFields.length &&
fromFields.zip(toFields).exists {
case (fromField, toField) =>
needsTimeZone(fromField.dataType, toField.dataType)
}
case _ => false
}

/**
* Return true iff we may truncate during casting `from` type to `to` type. e.g. long -> int,
* timestamp -> date.
Expand Down Expand Up @@ -165,6 +190,13 @@ case class Cast(child: Expression, dataType: DataType, timeZoneId: Option[String
override def withTimeZone(timeZoneId: String): TimeZoneAwareExpression =
copy(timeZoneId = Option(timeZoneId))

// When this cast involves TimeZone, it's only resolved if the timeZoneId is set;
// Otherwise behave like Expression.resolved.
override lazy val resolved: Boolean =
childrenResolved && checkInputDataTypes().isSuccess && (!needsTimeZone || timeZoneId.isDefined)

private[this] def needsTimeZone: Boolean = Cast.needsTimeZone(child.dataType, dataType)

// [[func]] assumes the input is no longer null because eval already does the null check.
@inline private[this] def buildCast[T](a: Any, func: T => Any): Any = func(a.asInstanceOf[T])

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -34,7 +34,7 @@ import org.apache.spark.unsafe.types.UTF8String
*/
class CastSuite extends SparkFunSuite with ExpressionEvalHelper {

private def cast(v: Any, targetType: DataType, timeZoneId: Option[String] = Some("GMT")): Cast = {
private def cast(v: Any, targetType: DataType, timeZoneId: Option[String] = None): Cast = {
v match {
case lit: Expression => Cast(lit, targetType, timeZoneId)
case _ => Cast(Literal(v), targetType, timeZoneId)
Expand All @@ -47,7 +47,7 @@ class CastSuite extends SparkFunSuite with ExpressionEvalHelper {
}

private def checkNullCast(from: DataType, to: DataType): Unit = {
checkEvaluation(cast(Literal.create(null, from), to), null)
checkEvaluation(cast(Literal.create(null, from), to, Option("GMT")), null)
}

test("null cast") {
Expand Down

0 comments on commit 26ac2ce

Please sign in to comment.