Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] Integration tests FAILED on: "nvCOMP 2.3/2.4 or newer is required for Zstandard compression" #10627

Closed
NvTimLiu opened this issue Mar 25, 2024 · 1 comment · Fixed by NVIDIA/spark-rapids-jni#1895
Assignees
Labels
bug Something isn't working build Related to CI / CD or cleanly building

Comments

@NvTimLiu
Copy link
Collaborator

NvTimLiu commented Mar 25, 2024

Describe the bug
A lot of pytest cases FAILED on the same error: "nvCOMP 2.3/2.4 or newer is required for Zstandard compression"

FAILED ../../src/main/python/orc_write_test.py::test_compress_write_round_trip[zstd][DATAGEN_SEED=1711292952, TZ=UTC]
FAILED ../../src/main/python/parquet_test.py::test_parquet_compress_read_round_trip[-reader_confs0-zstd][DATAGEN_SEED=1711292952, TZ=UTC]
FAILED ../../src/main/python/parquet_test.py::test_parquet_compress_read_round_trip[-reader_confs1-zstd][DATAGEN_SEED=1711292952, TZ=UTC]
FAILED ../../src/main/python/parquet_write_test.py::test_write_ts_millis[CORRECTED-TIMESTAMP_MICROS][DATAGEN_SEED=1711300812, TZ=UTC, IGNORE_ORDER] - py4j.protocol.Py4JJavaError: An error occurred while calling o52950.collect...
FAILED ../../src/main/python/orc_test.py::test_mixed_compress_read[{'spark.rapids.sql.format.orc.reader.type': 'PERFILE'}-][DATAGEN_SEED=1711300164, TZ=UTC, IGNORE_ORDER] - py4j.protocol.Py4JJavaError: An error occurred while calling o196070.collec...
FAILED ../../src/main/python/orc_test.py::test_mixed_compress_read[{'spark.rapids.sql.format.orc.reader.type': 'PERFILE'}-orc][DATAGEN_SEED=1711300164, ```
FAILED ../../src/main/python/iceberg_test.py::test_iceberg_read_parquet_compression_codec[COALESCING-('zstd', None)] 
.....

E py4j.protocol.Py4JJavaError: An error occurred while calling o6480642.parquet.
E : org.apache.spark.SparkException: Job aborted.
E at org.apache.spark.sql.rapids.GpuFileFormatWriter$.write(GpuFileFormatWriter.scala:288)
E at org.apache.spark.sql.rapids.GpuInsertIntoHadoopFsRelationCommand.runColumnar(GpuInsertIntoHadoopFsRelationCommand.scala:184)
E at com.nvidia.spark.rapids.GpuDataWritingCommandExec.sideEffectResult$lzycompute(GpuDataWritingCommandExec.scala:117)
E at com.nvidia.spark.rapids.GpuDataWritingCommandExec.sideEffectResult(GpuDataWritingCommandExec.scala:116)
E at com.nvidia.spark.rapids.GpuDataWritingCommandExec.internalDoExecuteColumnar(GpuDataWritingCommandExec.scala:140)
E at com.nvidia.spark.rapids.GpuExec.doExecuteColumnar(GpuExec.scala:366)
E at com.nvidia.spark.rapids.GpuExec.doExecuteColumnar$(GpuExec.scala:365)
E at com.nvidia.spark.rapids.GpuDataWritingCommandExec.doExecuteColumnar(GpuDataWritingCommandExec.scala:112)
E at org.apache.spark.sql.execution.SparkPlan.$anonfun$executeColumnar$1(SparkPlan.scala:221)
E at org.apache.spark.sql.execution.SparkPlan.$anonfun$executeQuery$1(SparkPlan.scala:232)
E at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)

......

E at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:244)
E at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:357)
E at py4j.Gateway.invoke(Gateway.java:282)
E at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:132)
E at py4j.commands.CallCommand.execute(CallCommand.java:79)
E at py4j.ClientServerConnection.waitForCommands(ClientServerConnection.java:182)
E at py4j.ClientServerConnection.run(ClientServerConnection.java:106)
E at java.base/java.lang.Thread.run(Thread.java:833)
E Caused by: org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in stage 99032.0 failed 1 times, most recent failure: Lost task 0.0 in stage 99032.0 (TID 2570550) (10.136.6.4 executor 3): org.apache.spark.SparkException: Task failed while writing rows.
E at org.apache.spark.sql.rapids.GpuFileFormatWriter$.executeTask(GpuFileFormatWriter.scala:354)
E at org.apache.spark.sql.rapids.GpuFileFormatWriter$.$anonfun$write$15(GpuFileFormatWriter.scala:267)
E at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:90)
E at org.apache.spark.scheduler.Task.run(Task.scala:136)
E at org.apache.spark.executor.Executor$TaskRunner.$anonfun$run$3(Executor.scala:548)
E at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1504)
E at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:551)
E at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1136)
E at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:635)
E at java.base/java.lang.Thread.run(Thread.java:833)
E Caused by: ai.rapids.cudf.CudfException: CUDF failure at:/home/jenkins/agent/workspace/jenkins-spark-rapids-jni_nightly-dev-708-cuda11/thirdparty/cudf/cpp/src/io/comp/nvcomp_adapter.cpp:688: Compression error: nvCOMP 2.4 or newer is required for Zstandard compression
E at ai.rapids.cudf.Table.writeParquetChunk(Native Method)
E at ai.rapids.cudf.Table.access$700(Table.java:41)
E at ai.rapids.cudf.Table$ParquetTableWriter.write(Table.java:1791)
E at com.nvidia.spark.rapids.ColumnarOutputWriter.$anonfun$encodeAndBufferToHost$1(ColumnarOutputWriter.scala:205)
E at com.nvidia.spark.rapids.ColumnarOutputWriter.$anonfun$encodeAndBufferToHost$1$adapted(ColumnarOutputWriter.scala:196)
E at com.nvidia.spark.rapids.Arm$.withResource(Arm.scala:30)
E at com.nvidia.spark.rapids.ColumnarOutputWriter.encodeAndBufferToHost(ColumnarOutputWriter.scala:196)
E at com.nvidia.spark.rapids.ColumnarOutputWriter.$anonfun$bufferBatchAndClose$2(ColumnarOutputWriter.scala:180)
E at com.nvidia.spark.rapids.ColumnarOutputWriter.$anonfun$bufferBatchAndClose$2$adapted(ColumnarOutputWriter.scala:179)
E at com.nvidia.spark.rapids.Arm$.withResource(Arm.scala:30)
E at com.nvidia.spark.rapids.ColumnarOutputWriter.$anonfun$bufferBatchAndClose$1(ColumnarOutputWriter.scala:179)
E at com.nvidia.spark.rapids.ColumnarOutputWriter.$anonfun$bufferBatchAndClose$1$adapted(ColumnarOutputWriter.scala:178)
E at com.nvidia.spark.rapids.Arm$.withResource(Arm.scala:30)
E at com.nvidia.spark.rapids.ColumnarOutputWriter.bufferBatchAndClose(ColumnarOutputWriter.scala:178)
E at com.nvidia.spark.rapids.ColumnarOutputWriter.$anonfun$writeSpillableAndClose$6(ColumnarOutputWriter.scala:159)
E at scala.runtime.java8.JFunction0$mcJ$sp.apply(JFunction0$mcJ$sp.java:23)
E at com.nvidia.spark.rapids.RmmRapidsRetryIterator$.withRestoreOnRetry(RmmRapidsRetryIterator.scala:245)
E at com.nvidia.spark.rapids.ColumnarOutputWriter.$anonfun$writeSpillableAndClose$5(ColumnarOutputWriter.scala:159)
E at com.nvidia.spark.rapids.ColumnarOutputWriter.$anonfun$writeSpillableAndClose$5$adapted(ColumnarOutputWriter.scala:157)
E at com.nvidia.spark.rapids.RmmRapidsRetryIterator$AutoCloseableAttemptSpliterator.next(RmmRapidsRetryIterator.scala:477)
E at com.nvidia.spark.rapids.RmmRapidsRetryIterator$RmmRapidsRetryIterator.next(RmmRapidsRetryIterator.scala:613)
E at com.nvidia.spark.rapids.RmmRapidsRetryIterator$RmmRapidsRetryAutoCloseableIterator.next(RmmRapidsRetryIterator.scala:517)
E at scala.collection.Iterator.foreach(Iterator.scala:943)
E at scala.collection.Iterator.foreach$(Iterator.scala:943)
E at com.nvidia.spark.rapids.RmmRapidsRetryIterator$RmmRapidsRetryIterator.foreach(RmmRapidsRetryIterator.scala:536)
E at scala.collection.TraversableOnce.foldLeft(TraversableOnce.scala:199)
E at scala.collection.TraversableOnce.foldLeft$(TraversableOnce.scala:192)
E at com.nvidia.spark.rapids.RmmRapidsRetryIterator$RmmRapidsRetryIterator.foldLeft(RmmRapidsRetryIterator.scala:536)
E at scala.collection.TraversableOnce.sum(TraversableOnce.scala:262)
E at scala.collection.TraversableOnce.sum$(TraversableOnce.scala:262)
E at com.nvidia.spark.rapids.RmmRapidsRetryIterator$RmmRapidsRetryIterator.sum(RmmRapidsRetryIterator.scala:536)
E at com.nvidia.spark.rapids.ColumnarOutputWriter.writeSpillableAndClose(ColumnarOutputWriter.scala:161)
E at org.apache.spark.sql.rapids.GpuSingleDirectoryDataWriter.writeUpdateMetricsAndClose(GpuFileFormatDataWriter.scala:241)
E at org.apache.spark.sql.rapids.GpuSingleDirectoryDataWriter.write(GpuFileFormatDataWriter.scala:249)
E at org.apache.spark.sql.rapids.GpuFileFormatDataWriter.writeWithIterator(GpuFileFormatDataWriter.scala:161)
E at org.apache.spark.sql.rapids.GpuFileFormatWriter$.$anonfun$executeTask$1(GpuFileFormatWriter.scala:341)
E at org.apache.spark.util.Utils$.tryWithSafeFinallyAndFailureCallbacks(Utils.scala:1538)
E at org.apache.spark.sql.rapids.GpuFileFormatWriter$.executeTask(GpuFileFormatWriter.scala:348)
E ... 9 more
E Suppressed: com.nvidia.spark.rapids.jni.GpuRetryOOM: injected RetryOOM
E ... 47 more
E
E Driver stacktrace:
E at org.apache.spark.scheduler.DAGScheduler.failJobAndIndependentStages(DAGScheduler.scala:2672)
E at org.apache.spark.scheduler.DAGScheduler.$anonfun$abortStage$2(DAGScheduler.scala:2608)
E at org.apache.spark.scheduler.DAGScheduler.$anonfun$abortStage$2$adapted(DAGScheduler.scala:2607)
E at scala.collection.mutable.ResizableArray.foreach(ResizableArray.scala:62)
E at scala.collection.mutable.ResizableArray.foreach$(ResizableArray.scala:55)
E at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:49)
E at org.apache.spark.scheduler.DAGScheduler.abortStage(DAGScheduler.scala:2607)
E at org.apache.spark.scheduler.DAGScheduler.$anonfun$handleTaskSetFailed$1(DAGScheduler.scala:1182)
E at org.apache.spark.scheduler.DAGScheduler.$anonfun$handleTaskSetFailed$1$adapted(DAGScheduler.scala:1182)
E at scala.Option.foreach(Option.scala:407)
E at org.apache.spark.scheduler.DAGScheduler.handleTaskSetFailed(DAGScheduler.scala:1182)
E at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.doOnReceive(DAGScheduler.scala:2860)
E at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:2802)
E at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:2791)
E at org.apache.spark.util.EventLoop$$anon$1.run(EventLoop.scala:49)
E at org.apache.spark.scheduler.DAGScheduler.runJob(DAGScheduler.scala:952)
E at org.apache.spark.SparkContext.runJob(SparkContext.scala:2228)
E at org.apache.spark.sql.rapids.GpuFileFormatWriter$.write(GpuFileFormatWriter.scala:256)
E ... 57 more
E Caused by: org.apache.spark.SparkException: Task failed while writing rows.
E at org.apache.spark.sql.rapids.GpuFileFormatWriter$.executeTask(GpuFileFormatWriter.scala:354)
E at org.apache.spark.sql.rapids.GpuFileFormatWriter$.$anonfun$write$15(GpuFileFormatWriter.scala:267)
E at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:90)
E at org.apache.spark.scheduler.Task.run(Task.scala:136)
E at org.apache.spark.executor.Executor$TaskRunner.$anonfun$run$3(Executor.scala:548)
E at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1504)
E at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:551)
E at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1136)
E at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:635)
E ... 1 more
E Caused by: ai.rapids.cudf.CudfException: CUDF failure at:/home/jenkins/agent/workspace/jenkins-spark-rapids-jni_nightly-dev-708-cuda11/thirdparty/cudf/cpp/src/io/comp/nvcomp_adapter.cpp:688: Compression error: nvCOMP 2.4 or newer is required for Zstandard compression
E at ai.rapids.cudf.Table.writeParquetChunk(Native Method)
E at ai.rapids.cudf.Table.access$700(Table.java:41)
E at ai.rapids.cudf.Table$ParquetTableWriter.write(Table.java:1791)
E at com.nvidia.spark.rapids.ColumnarOutputWriter.$anonfun$encodeAndBufferToHost$1(ColumnarOutputWriter.scala:205)
E at com.nvidia.spark.rapids.ColumnarOutputWriter.$anonfun$encodeAndBufferToHost$1$adapted(ColumnarOutputWriter.scala:196)
E at com.nvidia.spark.rapids.Arm$.withResource(Arm.scala:30)
E at com.nvidia.spark.rapids.ColumnarOutputWriter.encodeAndBufferToHost(ColumnarOutputWriter.scala:196)
E at com.nvidia.spark.rapids.ColumnarOutputWriter.$anonfun$bufferBatchAndClose$2(ColumnarOutputWriter.scala:180)
E at com.nvidia.spark.rapids.ColumnarOutputWriter.$anonfun$bufferBatchAndClose$2$adapted(ColumnarOutputWriter.scala:179)
E at com.nvidia.spark.rapids.Arm$.withResource(Arm.scala:30)
E at com.nvidia.spark.rapids.ColumnarOutputWriter.$anonfun$bufferBatchAndClose$1(ColumnarOutputWriter.scala:179)
E at com.nvidia.spark.rapids.ColumnarOutputWriter.$anonfun$bufferBatchAndClose$1$adapted(ColumnarOutputWriter.scala:178)
E at com.nvidia.spark.rapids.Arm$.withResource(Arm.scala:30)
E at com.nvidia.spark.rapids.ColumnarOutputWriter.bufferBatchAndClose(ColumnarOutputWriter.scala:178)
E at com.nvidia.spark.rapids.ColumnarOutputWriter.$anonfun$writeSpillableAndClose$6(ColumnarOutputWriter.scala:159)
E at scala.runtime.java8.JFunction0$mcJ$sp.apply(JFunction0$mcJ$sp.java:23)

@NvTimLiu NvTimLiu added bug Something isn't working ? - Needs Triage Need team to review and classify labels Mar 25, 2024
@NvTimLiu NvTimLiu changed the title [BUG] Integration tests FAILED on: "nvCOMP 2.4 or newer is required for Zstandard compression" [BUG] Integration tests FAILED on: "nvCOMP 2.3/2.4 or newer is required for Zstandard compression" Mar 25, 2024
@jbrennan333
Copy link
Collaborator

I suspect this may be related to NVIDIA/spark-rapids-jni#1877
rapids-cmake is responsible for identifying which version of nvcomp we pull in.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working build Related to CI / CD or cleanly building
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants