Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix Cloudera 3.3.2 shim for handling CheckOverflowInTableInsert and orc zstd support #9463

Merged
merged 4 commits into from
Oct 19, 2023

Conversation

tgravescs
Copy link
Collaborator

fixes #9457

We had some unit tests failing on 3.3.2 CDH. It looks like there are 2 issues:

  1. Update test to skip the orc zstd because Cloudera doesn't support
  2. The shim was based on 3.3.0 CDH instead of 3.3.1. This is 3.3.2 CDH so it should have been based on 3.3.1. Spark 3.3.1 adds handling for CheckOverflowInTableInsert, so the tests for this were failing as they should have.

@tgravescs tgravescs added the bug Something isn't working label Oct 17, 2023
@tgravescs tgravescs self-assigned this Oct 17, 2023
@tgravescs
Copy link
Collaborator Author

build

Comment on lines 29 to 49
override def getDataWriteCmds: Map[Class[_ <: DataWritingCommand],
DataWritingCommandRule[_ <: DataWritingCommand]] = {
Seq(GpuOverrides.dataWriteCmd[CreateDataSourceTableAsSelectCommand](
"Create table with select command",
(a, conf, p, r) => new CreateDataSourceTableAsSelectCommandMeta(a, conf, p, r))
).map(r => (r.getClassFor.asSubclass(classOf[DataWritingCommand]), r)).toMap
}

override def getRunnableCmds: Map[Class[_ <: RunnableCommand],
RunnableCommandRule[_ <: RunnableCommand]] = {
Map.empty
}

override def getExecs: Map[Class[_ <: SparkPlan], ExecRule[_ <: SparkPlan]] = {

val newRule = org.apache.spark.sql.hive.cdh.HiveOverrides.getHiveTableExecRule()
// scala compiler gets confused if we don't give an explicit type to the class here.
// ... perhaps there is a cleaner way to do this
val ruleClass: Class[_ <: SparkPlan] = newRule.getClassFor.asSubclass(classOf[SparkPlan])
super.getExecs ++ Seq(ruleClass -> newRule).toMap
}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The three methods in the body of SparkShimImpl for 330cdh and 332cdh are identical.

We could have another trait

/*** spark-rapids-shim-json-lines
{"spark": "330cdh"}
{"spark": "332cdh"}
spark-rapids-shim-json-lines ***/
package com.nvidia.spark.rapids.shims

trait Spark33cdhShims extends AnsiCastRuleShims {
  override def getDataWriteCmds: Map[Class[_ <: DataWritingCommand],
      DataWritingCommandRule[_ <: DataWritingCommand]] = {
    Seq(GpuOverrides.dataWriteCmd[CreateDataSourceTableAsSelectCommand](
    "Create table with select command",
    (a, conf, p, r) => new CreateDataSourceTableAsSelectCommandMeta(a, conf, p, r))
    ).map(r => (r.getClassFor.asSubclass(classOf[DataWritingCommand]), r)).toMap
  }

  override def getRunnableCmds: Map[Class[_ <: RunnableCommand],
      RunnableCommandRule[_ <: RunnableCommand]] = {
    Map.empty
  }

  override def getExecs: Map[Class[_ <: SparkPlan], ExecRule[_ <: SparkPlan]] = {

    val newRule = org.apache.spark.sql.hive.cdh.HiveOverrides.getHiveTableExecRule()
    // scala compiler gets confused if we don't give an explicit type to the class here.
    // ... perhaps there is a cleaner way to do this
    val ruleClass: Class[_ <: SparkPlan] = newRule.getClassFor.asSubclass(classOf[SparkPlan])
    super.getExecs ++ Seq(ruleClass -> newRule).toMap
  }
}

for 330cdh object SparkShimImpl extends Spark330PlusNonDBShims with Spark33cdhShims

for 332cdh object SparkShimImpl extends Spark331PlusShims with Spark33cdhShims

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

updated but slightly differently to deal with calling super.getExecs from trait

@tgravescs
Copy link
Collaborator Author

build

Copy link
Collaborator

@gerashegalov gerashegalov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@pxLi pxLi merged commit 51607e9 into NVIDIA:branch-23.12 Oct 19, 2023
29 of 30 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[BUG] CDH 332 unit tests failing
3 participants