You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Describe the bug
Customer job failed shuffle with the 23.08 pre release jar. It looks like we don't support shuffling list of struct and we aren't falling back properly.
Job aborted due to stage failure: Task 9 in stage 515.0 failed 4 times, most recent failure: Lost task 9.3 in stage 515.0 (TID 2804) (10.13.13.147 executor 4): ai.rapids.cudf.CudfException: CUDF failure at: /home/jenkins/agent/workspace/jenkins-spark-rapids-jni_nightly-pre_release-170-cuda11/thirdparty/cudf/cpp/src/hash/spark_murmurhash3_x86_32.cu:381: Cannot compute hash of a table with a LIST of STRUCT columns.
at ai.rapids.cudf.ColumnVector.hash(Native Method)
at ai.rapids.cudf.ColumnVector.spark32BitMurmurHash3(ColumnVector.java:795)
at org.apache.spark.sql.rapids.GpuMurmur3Hash$.$anonfun$compute$3(HashFunctions.scala:83)
at com.nvidia.spark.rapids.Arm$.withResource(Arm.scala:56)
at org.apache.spark.sql.rapids.GpuMurmur3Hash$.$anonfun$compute$1(HashFunctions.scala:82)
at com.nvidia.spark.rapids.Arm$.withResource(Arm.scala:29)
at org.apache.spark.sql.rapids.GpuMurmur3Hash$.compute(HashFunctions.scala:77)
at org.apache.spark.sql.rapids.GpuMurmur3Hash.columnarEval(HashFunctions.scala:95)
at com.nvidia.spark.rapids.RapidsPluginImplicits$ReallyAGpuExpression.columnarEval(implicits.scala:34)
at com.nvidia.spark.rapids.GpuAlias.columnarEval(namedExpressions.scala:110)
at com.nvidia.spark.rapids.RapidsPluginImplicits$ReallyAGpuExpression.columnarEval(implicits.scala:34)
at com.nvidia.spark.rapids.GpuProjectExec$.$anonfun$project$1(basicPhysicalOperators.scala:108)
at com.nvidia.spark.rapids.RapidsPluginImplicits$MapsSafely.$anonfun$safeMap$1(implicits.scala:220)
at com.nvidia.spark.rapids.RapidsPluginImplicits$MapsSafely.$anonfun$safeMap$1$adapted(implicits.scala:217)
at scala.collection.immutable.List.foreach(List.scala:431)
at com.nvidia.spark.rapids.RapidsPluginImplicits$MapsSafely.safeMap(implicits.scala:217)
at com.nvidia.spark.rapids.RapidsPluginImplicits$AutoCloseableProducingSeq.safeMap(implicits.scala:252)
at com.nvidia.spark.rapids.GpuProjectExec$.project(basicPhysicalOperators.scala:108)
at com.nvidia.spark.rapids.GpuTieredProject.$anonfun$project$2(basicPhysicalOperators.scala:595)
at com.nvidia.spark.rapids.Arm$.withResource(Arm.scala:29)
at com.nvidia.spark.rapids.GpuTieredProject.recurse$2(basicPhysicalOperators.scala:594)
at com.nvidia.spark.rapids.GpuTieredProject.project(basicPhysicalOperators.scala:607)
at com.nvidia.spark.rapids.GpuTieredProject.$anonfun$projectWithRetrySingleBatchInternal$2(basicPhysicalOperators.scala:538)
at com.nvidia.spark.rapids.Arm$.withResource(Arm.scala:29)
at com.nvidia.spark.rapids.GpuTieredProject.$anonfun$projectWithRetrySingleBatchInternal$1(basicPhysicalOperators.scala:537)
at com.nvidia.spark.rapids.RmmRapidsRetryIterator$AutoCloseableAttemptSpliterator.next(RmmRapidsRetryIterator.scala:431)
at com.nvidia.spark.rapids.RmmRapidsRetryIterator$RmmRapidsRetryIterator.next(RmmRapidsRetryIterator.scala:542)
at com.nvidia.spark.rapids.RmmRapidsRetryIterator$RmmRapidsRetryAutoCloseableIterator.next(RmmRapidsRetryIterator.scala:468)
at com.nvidia.spark.rapids.RmmRapidsRetryIterator$.drainSingleWithVerification(RmmRapidsRetryIterator.scala:275)
at com.nvidia.spark.rapids.RmmRapidsRetryIterator$.withRetryNoSplit(RmmRapidsRetryIterator.scala:128)
at com.nvidia.spark.rapids.GpuTieredProject.projectWithRetrySingleBatchInternal(basicPhysicalOperators.scala:536)
at com.nvidia.spark.rapids.GpuTieredProject.projectAndCloseWithRetrySingleBatch(basicPhysicalOperators.scala:577)
at com.nvidia.spark.rapids.GpuProjectExec.$anonfun$internalDoExecuteColumnar$2(basicPhysicalOperators.scala:377)
at com.nvidia.spark.rapids.Arm$.withResource(Arm.scala:29)
at com.nvidia.spark.rapids.GpuProjectExec.$anonfun$internalDoExecuteColumnar$1(basicPhysicalOperators.scala:373)
at scala.collection.Iterator$$anon$10.next(Iterator.scala:461)
at org.apache.spark.sql.rapids.execution.GpuShuffleExchangeExecBase$$anon$1.partNextBatch(GpuShuffleExchangeExecBase.scala:318)
at org.apache.spark.sql.rapids.execution.GpuShuffleExchangeExecBase$$anon$1.hasNext(GpuShuffleExchangeExecBase.scala:340)
at org.apache.spark.sql.rapids.RapidsShuffleThreadedWriterBase.$anonfun$write$2(RapidsShuffleInternalManagerBase.scala:281)
at org.apache.spark.sql.rapids.RapidsShuffleThreadedWriterBase.$anonfun$write$2$adapted(RapidsShuffleInternalManagerBase.scala:274)
at com.nvidia.spark.rapids.Arm$.withResource(Arm.scala:29)
at org.apache.spark.sql.rapids.RapidsShuffleThreadedWriterBase.$anonfun$write$1(RapidsShuffleInternalManagerBase.scala:274)
at org.apache.spark.sql.rapids.RapidsShuffleThreadedWriterBase.$anonfun$write$1$adapted(RapidsShuffleInternalManagerBase.scala:273)
at com.nvidia.spark.rapids.Arm$.withResource(Arm.scala:29)
at org.apache.spark.sql.rapids.RapidsShuffleThreadedWriterBase.write(RapidsShuffleInternalManagerBase.scala:273)
at org.apache.spark.shuffle.ShuffleWriteProcessor.write(ShuffleWriteProcessor.scala:59)
at org.apache.spark.scheduler.ShuffleMapTask.$anonfun$runTask$3(ShuffleMapTask.scala:81)
at com.databricks.spark.util.ExecutorFrameProfiler$.record(ExecutorFrameProfiler.scala:110)
The text was updated successfully, but these errors were encountered:
This turns out to be a bug where a check was implemented for hash partition but not for murmur3 directly. This is being done as a part of a project so murmur3 is being called directly.
Describe the bug
Customer job failed shuffle with the 23.08 pre release jar. It looks like we don't support shuffling list of struct and we aren't falling back properly.
The text was updated successfully, but these errors were encountered: