Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] format_number Spark UT failed because Type conversion is not allowed #10899

Closed
thirtiseven opened this issue May 27, 2024 · 0 comments · Fixed by #10900
Closed

[BUG] format_number Spark UT failed because Type conversion is not allowed #10899

thirtiseven opened this issue May 27, 2024 · 0 comments · Fixed by #10900
Assignees
Labels
bug Something isn't working

Comments

@thirtiseven
Copy link
Collaborator

Describe the bug
format_number / FormatNumber Spark UT failed because Type conversion is not allowed error.

Steps/Code to reproduce bug

run:

mvn test -Dbuildver=330 -DwildcardSuites=org.apache.spark.sql.rapids.suites.RapidsStringExpressionsSuite
- format_number / FormatNumber *** FAILED ***
  org.apache.spark.SparkException: Job aborted due to stage failure: Task 1 in stage 334.0 failed 1 times, most recent failure: Lost task 1.0 in stage 334.0 (TID 669) (spark-haoyang executor driver): java.lang.AssertionError: Type conversion is not allowed from INT8 to StringType expected STRING
	at com.nvidia.spark.rapids.GpuColumnVector.from(GpuColumnVector.java:711)
	at com.nvidia.spark.rapids.GpuColumnVector.from(GpuColumnVector.java:791)
	at org.apache.spark.sql.rapids.GpuFormatNumber.doColumnar(stringFunctions.scala:2387)
	at com.nvidia.spark.rapids.GpuBinaryExpression.$anonfun$columnarEval$3(GpuExpressions.scala:320)
	at com.nvidia.spark.rapids.Arm$.withResourceIfAllowed(Arm.scala:84)
	at com.nvidia.spark.rapids.GpuBinaryExpression.$anonfun$columnarEval$2(GpuExpressions.scala:311)
	at com.nvidia.spark.rapids.Arm$.withResourceIfAllowed(Arm.scala:84)
	at com.nvidia.spark.rapids.GpuBinaryExpression.columnarEval(GpuExpressions.scala:310)
	at com.nvidia.spark.rapids.GpuBinaryExpression.columnarEval$(GpuExpressions.scala:309)
	at org.apache.spark.sql.rapids.GpuFormatNumber.columnarEval(stringFunctions.scala:2140)
	at com.nvidia.spark.rapids.RapidsPluginImplicits$ReallyAGpuExpression.columnarEval(implicits.scala:35)
	at com.nvidia.spark.rapids.GpuAlias.columnarEval(namedExpressions.scala:110)
	at com.nvidia.spark.rapids.RapidsPluginImplicits$ReallyAGpuExpression.columnarEval(implicits.scala:35)
	at com.nvidia.spark.rapids.GpuProjectExec$.$anonfun$project$1(basicPhysicalOperators.scala:110)
	at com.nvidia.spark.rapids.RapidsPluginImplicits$MapsSafely.$anonfun$safeMap$1(implicits.scala:221)
	at com.nvidia.spark.rapids.RapidsPluginImplicits$MapsSafely.$anonfun$safeMap$1$adapted(implicits.scala:218)
	at scala.collection.immutable.List.foreach(List.scala:431)
	at com.nvidia.spark.rapids.RapidsPluginImplicits$MapsSafely.safeMap(implicits.scala:218)
	at com.nvidia.spark.rapids.RapidsPluginImplicits$AutoCloseableProducingSeq.safeMap(implicits.scala:253)
	at com.nvidia.spark.rapids.GpuProjectExec$.project(basicPhysicalOperators.scala:110)
	at com.nvidia.spark.rapids.GpuTieredProject.$anonfun$project$2(basicPhysicalOperators.scala:619)
	at com.nvidia.spark.rapids.Arm$.withResource(Arm.scala:30)
	at com.nvidia.spark.rapids.GpuTieredProject.recurse$2(basicPhysicalOperators.scala:618)
	at com.nvidia.spark.rapids.GpuTieredProject.project(basicPhysicalOperators.scala:631)
	at com.nvidia.spark.rapids.GpuTieredProject.$anonfun$projectWithRetrySingleBatchInternal$5(basicPhysicalOperators.scala:567)
	at com.nvidia.spark.rapids.RmmRapidsRetryIterator$.withRestoreOnRetry(RmmRapidsRetryIterator.scala:272)
	at com.nvidia.spark.rapids.GpuTieredProject.$anonfun$projectWithRetrySingleBatchInternal$4(basicPhysicalOperators.scala:567)
	at com.nvidia.spark.rapids.Arm$.withResource(Arm.scala:30)
	at com.nvidia.spark.rapids.GpuTieredProject.$anonfun$projectWithRetrySingleBatchInternal$3(basicPhysicalOperators.scala:565)
	at com.nvidia.spark.rapids.RmmRapidsRetryIterator$NoInputSpliterator.next(RmmRapidsRetryIterator.scala:395)
	at com.nvidia.spark.rapids.RmmRapidsRetryIterator$RmmRapidsRetryIterator.next(RmmRapidsRetryIterator.scala:613)
	at com.nvidia.spark.rapids.RmmRapidsRetryIterator$RmmRapidsRetryAutoCloseableIterator.next(RmmRapidsRetryIterator.scala:517)
	at com.nvidia.spark.rapids.RmmRapidsRetryIterator$.drainSingleWithVerification(RmmRapidsRetryIterator.scala:291)
	at com.nvidia.spark.rapids.RmmRapidsRetryIterator$.withRetryNoSplit(RmmRapidsRetryIterator.scala:185)
	at com.nvidia.spark.rapids.GpuTieredProject.$anonfun$projectWithRetrySingleBatchInternal$1(basicPhysicalOperators.scala:565)
	at com.nvidia.spark.rapids.Arm$.withResource(Arm.scala:39)
	at com.nvidia.spark.rapids.GpuTieredProject.projectWithRetrySingleBatchInternal(basicPhysicalOperators.scala:562)
	at com.nvidia.spark.rapids.GpuTieredProject.projectAndCloseWithRetrySingleBatch(basicPhysicalOperators.scala:601)
	at com.nvidia.spark.rapids.GpuProjectExec.$anonfun$internalDoExecuteColumnar$2(basicPhysicalOperators.scala:384)
	at com.nvidia.spark.rapids.Arm$.withResource(Arm.scala:30)
	at com.nvidia.spark.rapids.GpuProjectExec.$anonfun$internalDoExecuteColumnar$1(basicPhysicalOperators.scala:380)
	at scala.collection.Iterator$$anon$10.next(Iterator.scala:461)
	at com.nvidia.spark.rapids.ColumnarToRowIterator.$anonfun$fetchNextBatch$3(GpuColumnarToRowExec.scala:290)
	at com.nvidia.spark.rapids.Arm$.withResource(Arm.scala:30)
	at com.nvidia.spark.rapids.ColumnarToRowIterator.fetchNextBatch(GpuColumnarToRowExec.scala:287)
	at com.nvidia.spark.rapids.ColumnarToRowIterator.loadNextBatch(GpuColumnarToRowExec.scala:257)
	at com.nvidia.spark.rapids.GpuTieredProject.$anonfun$projectWithRetrySingleBatchInternal$5(basicPhysicalOperators.scala:567) ~[rapids-4-spark-aggregator_2.12-24.06.0-SNAPSHOT-spark330.jar:?]
	at com.nvidia.spark.rapids.RmmRapidsRetryIterator$.withRestoreOnRetry(RmmRapidsRetryIterator.scala:272) ~[rapids-4-spark-aggregator_2.12-24.06.0-SNAPSHOT-spark330.jar:?]
	at com.nvidia.spark.rapids.GpuTieredProject.$anonfun$projectWithRetrySingleBatchInternal$4(basicPhysicalOperators.scala:567) ~[rapids-4-spark-aggregator_2.12-24.06.0-SNAPSHOT-spark330.jar:?]
	at com.nvidia.spark.rapids.Arm$.withResource(Arm.scala:30) ~[rapids-4-spark-aggregator_2.12-24.06.0-SNAPSHOT-spark330.jar:?]
	at com.nvidia.spark.rapids.GpuTieredProject.$anonfun$projectWithRetrySingleBatchInternal$3(basicPhysicalOperators.scala:565) ~[rapids-4-spark-aggregator_2.12-24.06.0-SNAPSHOT-spark330.jar:?]
	at com.nvidia.spark.rapids.RmmRapidsRetryIterator$NoInputSpliterator.next(RmmRapidsRetryIterator.scala:395) ~[rapids-4-spark-aggregator_2.12-24.06.0-SNAPSHOT-spark330.jar:?]
	at com.nvidia.spark.rapids.RmmRapidsRetryIterator$RmmRapidsRetryIterator.next(RmmRapidsRetryIterator.scala:613) ~[rapids-4-spark-aggregator_2.12-24.06.0-SNAPSHOT-spark330.jar:?]
	at com.nvidia.spark.rapids.RmmRapidsRetryIterator$RmmRapidsRetryAutoCloseableIterator.next(RmmRapidsRetryIterator.scala:517) ~[rapids-4-spark-aggregator_2.12-24.06.0-SNAPSHOT-spark330.jar:?]
	at com.nvidia.spark.rapids.RmmRapidsRetryIterator$.drainSingleWithVerification(RmmRapidsRetryIterator.scala:291) ~[rapids-4-spark-aggregator_2.12-24.06.0-SNAPSHOT-spark330.jar:?]
	at com.nvidia.spark.rapids.RmmRapidsRetryIterator$.withRetryNoSplit(RmmRapidsRetryIterator.scala:185) ~[rapids-4-spark-aggregator_2.12-24.06.0-SNAPSHOT-spark330.jar:?]
	at com.nvidia.spark.rapids.GpuTieredProject.$anonfun$projectWithRetrySingleBatchInternal$1(basicPhysicalOperators.scala:565) ~[rapids-4-spark-aggregator_2.12-24.06.0-SNAPSHOT-spark330.jar:?]
	at com.nvidia.spark.rapids.Arm$.withResource(Arm.scala:39) ~[rapids-4-spark-aggregator_2.12-24.06.0-SNAPSHOT-spark330.jar:?]
	at com.nvidia.spark.rapids.GpuTieredProject.projectWithRetrySingleBatchInternal(basicPhysicalOperators.scala:562) ~[rapids-4-spark-aggregator_2.12-24.06.0-SNAPSHOT-spark330.jar:?]
	at com.nvidia.spark.rapids.GpuTieredProject.projectAndCloseWithRetrySingleBatch(basicPhysicalOperators.scala:601) ~[rapids-4-spark-aggregator_2.12-24.06.0-SNAPSHOT-spark330.jar:?]
	at com.nvidia.spark.rapids.GpuProjectExec.$anonfun$internalDoExecuteColumnar$2(basicPhysicalOperators.scala:384) ~[rapids-4-spark-aggregator_2.12-24.06.0-SNAPSHOT-spark330.jar:?]
	at com.nvidia.spark.rapids.Arm$.withResource(Arm.scala:30) ~[rapids-4-spark-aggregator_2.12-24.06.0-SNAPSHOT-spark330.jar:?]
	at com.nvidia.spark.rapids.GpuProjectExec.$anonfun$internalDoExecuteColumnar$1(basicPhysicalOperators.scala:380) ~[rapids-4-spark-aggregator_2.12-24.06.0-SNAPSHOT-spark330.jar:?]
	at scala.collection.Iterator$$anon$10.next(Iterator.scala:461) ~[scala-library-2.12.15.jar:?]
	at com.nvidia.spark.rapids.ColumnarToRowIterator.$anonfun$fetchNextBatch$3(GpuColumnarToRowExec.scala:290) ~[rapids-4-spark-aggregator_2.12-24.06.0-SNAPSHOT-spark330.jar:?]
	at com.nvidia.spark.rapids.Arm$.withResource(Arm.scala:30) ~[rapids-4-spark-aggregator_2.12-24.06.0-SNAPSHOT-spark330.jar:?]
	at com.nvidia.spark.rapids.ColumnarToRowIterator.fetchNextBatch(GpuColumnarToRowExec.scala:287) ~[rapids-4-spark-aggregator_2.12-24.06.0-SNAPSHOT-spark330.jar:?]
	at com.nvidia.spark.rapids.ColumnarToRowIterator.loadNextBatch(GpuColumnarToRowExec.scala:257) ~[rapids-4-spark-aggregator_2.12-24.06.0-SNAPSHOT-spark330.jar:?]
	at com.nvidia.spark.rapids.ColumnarToRowIterator.hasNext(GpuColumnarToRowExec.scala:304) ~[rapids-4-spark-aggregator_2.12-24.06.0-SNAPSHOT-spark330.jar:?]
	at scala.collection.Iterator$$anon$10.hasNext(Iterator.scala:460) ~[scala-library-2.12.15.jar:?]
	at org.apache.spark.sql.execution.SparkPlan.$anonfun$getByteArrayRdd$1(SparkPlan.scala:364) ~[spark-sql_2.12-3.3.0.jar:3.3.0]
	at org.apache.spark.rdd.RDD.$anonfun$mapPartitionsInternal$2(RDD.scala:890) ~[spark-core_2.12-3.3.0.jar:3.3.0]
	at org.apache.spark.rdd.RDD.$anonfun$mapPartitionsInternal$2$adapted(RDD.scala:890) ~[spark-core_2.12-3.3.0.jar:3.3.0]
	at com.nvidia.spark.rapids.ColumnarToRowIterator.hasNext(GpuColumnarToRowExec.scala:304)
	at scala.collection.Iterator$$anon$10.hasNext(Iterator.scala:460)
	at org.apache.spark.sql.execution.SparkPlan.$anonfun$getByteArrayRdd$1(SparkPlan.scala:364)
	at org.apache.spark.rdd.RDD.$anonfun$mapPartitionsInternal$2(RDD.scala:890)
	at org.apache.spark.rdd.RDD.$anonfun$mapPartitionsInternal$2$adapted(RDD.scala:890)
	at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52)
	at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:365)
	at org.apache.spark.rdd.RDD.iterator(RDD.scala:329)
	at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:90)
	at org.apache.spark.scheduler.Task.run(Task.scala:136)
	at org.apache.spark.executor.Executor$TaskRunner.$anonfun$run$3(Executor.scala:548)
	at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1504)
	at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:551)
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
	at java.lang.Thread.run(Thread.java:750)

Driver stacktrace:
  at org.apache.spark.scheduler.DAGScheduler.failJobAndIndependentStages(DAGScheduler.scala:2672)
  at org.apache.spark.scheduler.DAGScheduler.$anonfun$abortStage$2(DAGScheduler.scala:2608)
  at org.apache.spark.scheduler.DAGScheduler.$anonfun$abortStage$2$adapted(DAGScheduler.scala:2607)
  at scala.collection.mutable.ResizableArray.foreach(ResizableArray.scala:62)
  at scala.collection.mutable.ResizableArray.foreach$(ResizableArray.scala:55)
  at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:49)
  at org.apache.spark.scheduler.DAGScheduler.abortStage(DAGScheduler.scala:2607)
  at org.apache.spark.scheduler.DAGScheduler.$anonfun$handleTaskSetFailed$1(DAGScheduler.scala:1182)
  at org.apache.spark.scheduler.DAGScheduler.$anonfun$handleTaskSetFailed$1$adapted(DAGScheduler.scala:1182)
  at scala.Option.foreach(Option.scala:407)
  ...
  Cause: java.lang.AssertionError: Type conversion is not allowed from INT8 to StringType expected STRING
  at com.nvidia.spark.rapids.GpuColumnVector.from(GpuColumnVector.java:711)
  at com.nvidia.spark.rapids.GpuColumnVector.from(GpuColumnVector.java:791)
  at org.apache.spark.sql.rapids.GpuFormatNumber.doColumnar(stringFunctions.scala:2387)
  at com.nvidia.spark.rapids.GpuBinaryExpression.$anonfun$columnarEval$3(GpuExpressions.scala:320)
  at com.nvidia.spark.rapids.Arm$.withResourceIfAllowed(Arm.scala:84)
  at com.nvidia.spark.rapids.GpuBinaryExpression.$anonfun$columnarEval$2(GpuExpressions.scala:311)
  at com.nvidia.spark.rapids.Arm$.withResourceIfAllowed(Arm.scala:84)
  at com.nvidia.spark.rapids.GpuBinaryExpression.columnarEval(GpuExpressions.scala:310)
  at com.nvidia.spark.rapids.GpuBinaryExpression.columnarEval$(GpuExpressions.scala:309)
  at org.apache.spark.sql.rapids.GpuFormatNumber.columnarEval(stringFunctions.scala:2140)
  ...
to_number($345, S$999,099.99) is being evaluated with Scalar Parameter
	at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52) ~[spark-core_2.12-3.3.0.jar:3.3.0]
	at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:365) ~[spark-core_2.12-3.3.0.jar:3.3.0]
	at org.apache.spark.rdd.RDD.iterator(RDD.scala:329) ~[spark-core_2.12-3.3.0.jar:3.3.0]
	at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:90) ~[spark-core_2.12-3.3.0.jar:3.3.0]
	at org.apache.spark.scheduler.Task.run(Task.scala:136) ~[spark-core_2.12-3.3.0.jar:3.3.0]
	at org.apache.spark.executor.Executor$TaskRunner.$anonfun$run$3(Executor.scala:548) ~[spark-core_2.12-3.3.0.jar:3.3.0]
	at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1504) ~[spark-core_2.12-3.3.0.jar:3.3.0]
	at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:551) ~[spark-core_2.12-3.3.0.jar:3.3.0]
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) ~[?:1.8.0_402]
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) ~[?:1.8.0_402]
	at java.lang.Thread.run(Thread.java:750) ~[?:1.8.0_402]
24/05/27 10:15:27.928 task-result-getter-1 ERROR TaskSetManager: Task 1 in stage 334.0 failed 1 times; aborting job

Expected behavior
It should pass

Environment details (please complete the following information)
local spark 330

@thirtiseven thirtiseven added bug Something isn't working ? - Needs Triage Need team to review and classify labels May 27, 2024
@thirtiseven thirtiseven self-assigned this May 27, 2024
@mattahrens mattahrens removed the ? - Needs Triage Need team to review and classify label May 28, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants