[BUG] Databricks 11.3 exception - Multithreaded parquet reader - Missing Credential Scope #8210

tgravescs · 2023-05-01T17:26:22Z

Describe the bug

Seeing an exception running on Databricks 11.3 when reading parquet files, I tried with both 23.04 released jar and 23.06 snapshot jar, here it is using the multithreaded reader. Fails with the coalescing reader as well, but the PERFILE worked.

org.apache.spark.util.TaskCompletionListenerException: org.apache.spark.SparkException: Missing Credential Scope.

Previous exception in task: org.apache.spark.SparkException: Missing Credential Scope.
	java.util.concurrent.FutureTask.report(FutureTask.java:122)
	java.util.concurrent.FutureTask.get(FutureTask.java:192)
	com.nvidia.spark.rapids.MultiFileCloudPartitionReaderBase.readReadyFiles(GpuMultiFileReader.scala:658)
	com.nvidia.spark.rapids.MultiFileCloudPartitionReaderBase.getNextBuffersAndMetaAndCombine(GpuMultiFileReader.scala:670)
	com.nvidia.spark.rapids.MultiFileCloudPartitionReaderBase.getNextBuffersAndMeta(GpuMultiFileReader.scala:706)
	com.nvidia.spark.rapids.MultiFileCloudPartitionReaderBase.$anonfun$next$1(GpuMultiFileReader.scala:765)
	com.nvidia.spark.rapids.MultiFileCloudPartitionReaderBase.$anonfun$next$1$adapted(GpuMultiFileReader.scala:740)
	com.nvidia.spark.rapids.Arm.withResource(Arm.scala:28)
	com.nvidia.spark.rapids.Arm.withResource$(Arm.scala:26)
	com.nvidia.spark.rapids.FilePartitionReaderBase.withResource(GpuMultiFileReader.scala:375)
	com.nvidia.spark.rapids.MultiFileCloudPartitionReaderBase.next(GpuMultiFileReader.scala:740)
	com.nvidia.spark.rapids.PartitionIterator.hasNext(dataSourceUtil.scala:29)
	com.nvidia.spark.rapids.MetricsBatchIterator.hasNext(dataSourceUtil.scala:46)
	com.nvidia.spark.rapids.shims.GpuDataSourceRDD$$anon$1.$anonfun$hasNext$1(GpuDataSourceRDD.scala:61)
	com.nvidia.spark.rapids.shims.GpuDataSourceRDD$$anon$1.$anonfun$hasNext$1$adapted(GpuDataSourceRDD.scala:61)
	scala.Option.exists(Option.scala:376)
	com.nvidia.spark.rapids.shims.GpuDataSourceRDD$$anon$1.hasNext(GpuDataSourceRDD.scala:61)
	com.nvidia.spark.rapids.shims.GpuDataSourceRDD$$anon$1.advanceToNextIter(GpuDataSourceRDD.scala:86)
	com.nvidia.spark.rapids.shims.GpuDataSourceRDD$$anon$1.hasNext(GpuDataSourceRDD.scala:61)
	org.apache.spark.InterruptibleIterator.hasNext(InterruptibleIterator.scala:37)
	org.apache.spark.sql.rapids.GpuFileSourceScanExec$$anon$1.hasNext(GpuFileSourceScanExec.scala:469)
	org.apache.spark.sql.rapids.execution.GpuShuffleExchangeExecBase$$anon$1.partNextBatch(GpuShuffleExchangeExecBase.scala:317)
	org.apache.spark.sql.rapids.execution.GpuShuffleExchangeExecBase$$anon$1.hasNext(GpuShuffleExchangeExecBase.scala:340)
	org.apache.spark.sql.rapids.RapidsShuffleThreadedWriterBase.$anonfun$write$2(RapidsShuffleInternalManagerBase.scala:281)
	org.apache.spark.sql.rapids.RapidsShuffleThreadedWriterBase.$anonfun$write$2$adapted(RapidsShuffleInternalManagerBase.scala:274)
	com.nvidia.spark.rapids.Arm.withResource(Arm.scala:28)
	com.nvidia.spark.rapids.Arm.withResource$(Arm.scala:26)
	org.apache.spark.sql.rapids.RapidsShuffleThreadedWriterBase.withResource(RapidsShuffleInternalManagerBase.scala:234)
	org.apache.spark.sql.rapids.RapidsShuffleThreadedWriterBase.$anonfun$write$1(RapidsShuffleInternalManagerBase.scala:274)
	org.apache.spark.sql.rapids.RapidsShuffleThreadedWriterBase.$anonfun$write$1$adapted(RapidsShuffleInternalManagerBase.scala:273)
	com.nvidia.spark.rapids.Arm.withResource(Arm.scala:28)
	com.nvidia.spark.rapids.Arm.withResource$(Arm.scala:26)
	org.apache.spark.sql.rapids.RapidsShuffleThreadedWriterBase.withResource(RapidsShuffleInternalManagerBase.scala:234)
	org.apache.spark.sql.rapids.RapidsShuffleThreadedWriterBase.write(RapidsShuffleInternalManagerBase.scala:273)
	org.apache.spark.shuffle.ShuffleWriteProcessor.write(ShuffleWriteProcessor.scala:59)
	org.apache.spark.scheduler.ShuffleMapTask.$anonfun$runTask$3(ShuffleMapTask.scala:81)
	com.databricks.spark.util.ExecutorFrameProfiler$.record(ExecutorFrameProfiler.scala:110)
	org.apache.spark.scheduler.ShuffleMapTask.$anonfun$runTask$1(ShuffleMapTask.scala:81)
	com.databricks.spark.util.ExecutorFrameProfiler$.record(ExecutorFrameProfiler.scala:110)
	org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:53)
	org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:41)
	org.apache.spark.scheduler.Task.doRunTask(Task.scala:169)
	org.apache.spark.scheduler.Task.$anonfun$run$4(Task.scala:137)
	com.databricks.unity.UCSEphemeralState$Handle.runWith(UCSEphemeralState.scala:41)
	com.databricks.unity.HandleImpl.runWith(UCSHandle.scala:99)
	com.databricks.unity.HandleImpl.$anonfun$runWithAndClose$1(UCSHandle.scala:104)
	scala.util.Using$.resource(Using.scala:269)
	com.databricks.unity.HandleImpl.runWithAndClose(UCSHandle.scala:103)
	org.apache.spark.scheduler.Task.$anonfun$run$1(Task.scala:137)
	com.databricks.spark.util.ExecutorFrameProfiler$.record(ExecutorFrameProfiler.scala:110)
	org.apache.spark.scheduler.Task.run(Task.scala:96)
	org.apache.spark.executor.Executor$TaskRunner.$anonfun$run$13(Executor.scala:902)
	org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1697)
	org.apache.spark.executor.Executor$TaskRunner.$anonfun$run$4(Executor.scala:905)
	scala.runtime.java8.JFunction0$mcV$sp.apply(JFunction0$mcV$sp.java:23)
	com.databricks.spark.util.ExecutorFrameProfiler$.record(ExecutorFrameProfiler.scala:110)
	org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:760)
	java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
	java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
	java.lang.Thread.run(Thread.java:750)
	at org.apache.spark.TaskContextImpl.invokeListeners(TaskContextImpl.scala:233)
	at org.apache.spark.TaskContextImpl.invokeTaskCompletionListeners(TaskContextImpl.scala:169)
	at org.apache.spark.TaskContextImpl.markTaskCompleted(TaskContextImpl.scala:162)
	at org.apache.spark.scheduler.Task.doRunTask(Task.scala:179)
	at org.apache.spark.scheduler.Task.$anonfun$run$4(Task.scala:137)
	at com.databricks.unity.UCSEphemeralState$Handle.runWith(UCSEphemeralState.scala:41)
	at com.databricks.unity.HandleImpl.runWith(UCSHandle.scala:99)
	at com.databricks.unity.HandleImpl.$anonfun$runWithAndClose$1(UCSHandle.scala:104)
	at scala.util.Using$.resource(Using.scala:269)
	at com.databricks.unity.HandleImpl.runWithAndClose(UCSHandle.scala:103)
	at org.apache.spark.scheduler.Task.$anonfun$run$1(Task.scala:137)
	at com.databricks.spark.util.ExecutorFrameProfiler$.record(ExecutorFrameProfiler.scala:110)
	at org.apache.spark.scheduler.Task.run(Task.scala:96)
	at org.apache.spark.executor.Executor$TaskRunner.$anonfun$run$13(Executor.scala:902)
	at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1697)
	at org.apache.spark.executor.Executor$TaskRunner.$anonfun$run$4(Executor.scala:905)
	at scala.runtime.java8.JFunction0$mcV$sp.apply(JFunction0$mcV$sp.java:23)
	at com.databricks.spark.util.ExecutorFrameProfiler$.record(ExecutorFrameProfiler.scala:110)
	at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:760)
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
	at java.lang.Thread.run(Thread.java:750)
	Suppressed: java.util.concurrent.ExecutionException: org.apache.spark.SparkException: Missing Credential Scope.
		at java.util.concurrent.FutureTask.report(FutureTask.java:122)
		at java.util.concurrent.FutureTask.get(FutureTask.java:192)
		at com.nvidia.spark.rapids.MultiFileCloudPartitionReaderBase.$anonfun$close$1(GpuMultiFileReader.scala:844)
		at scala.collection.Iterator.foreach(Iterator.scala:943)
		at scala.collection.Iterator.foreach$(Iterator.scala:943)
		at scala.collection.AbstractIterator.foreach(Iterator.scala:1431)
		at scala.collection.IterableLike.foreach(IterableLike.scala:74)
		at scala.collection.IterableLike.foreach$(IterableLike.scala:73)
		at scala.collection.AbstractIterable.foreach(Iterable.scala:56)
		at com.nvidia.spark.rapids.MultiFileCloudPartitionReaderBase.close(GpuMultiFileReader.scala:842)
		at com.nvidia.spark.rapids.shims.GpuDataSourceRDD$$anon$1.$anonfun$advanceToNextIter$1(GpuDataSourceRDD.scala:83)
		at com.nvidia.spark.rapids.shims.GpuDataSourceRDD$$anon$1.$anonfun$advanceToNextIter$1$adapted(GpuDataSourceRDD.scala:82)
		at org.apache.spark.TaskContext$$anon$1.onTaskCompletion(TaskContext.scala:162)
		at org.apache.spark.TaskContextImpl.$anonfun$invokeTaskCompletionListeners$1(TaskContextImpl.scala:169)
		at org.apache.spark.TaskContextImpl.$anonfun$invokeTaskCompletionListeners$1$adapted(TaskContextImpl.scala:169)
		at org.apache.spark.TaskContextImpl.invokeListeners(TaskContextImpl.scala:224)
		... 21 more
	Caused by: org.apache.spark.SparkException: Missing Credential Scope.
		at com.databricks.unity.UCSExecutor.$anonfun$currentScope$1(UCSExecutor.scala:68)
		at scala.Option.getOrElse(Option.scala:189)
		at com.databricks.unity.UCSExecutor.currentScope(UCSExecutor.scala:68)
		at com.databricks.unity.UCSExecutor.currentScope$(UCSExecutor.scala:66)
		at com.databricks.unity.UCSExecutor$.currentScope(UCSExecutor.scala:76)
		at com.databricks.unity.UnityCredentialScope$.currentScope(UnityCredentialScope.scala:96)
		at com.databricks.unity.UnityCredentialScope$.getSAMRegistry(UnityCredentialScope.scala:120)
		at com.databricks.unity.SAMRegistry$.getSAM(SAMRegistry.scala:343)
		at com.databricks.sql.acl.fs.CredentialScopeFileSystem.setDelegates(CredentialScopeFileSystem.scala:133)
		at com.databricks.sql.acl.fs.CredentialScopeFileSystem.initialize(CredentialScopeFileSystem.scala:178)
		at org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:3469)
		at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:537)
		at org.apache.hadoop.fs.Path.getFileSystem(Path.java:365)
		at org.apache.parquet.hadoop.util.HadoopInputFile.fromPath(HadoopInputFile.java:43)
		at org.apache.parquet.hadoop.ParquetFileReader.readFooter(ParquetFileReader.java:481)
		at com.nvidia.spark.rapids.GpuParquetFileFilterHandler.$anonfun$readAndSimpleFilterFooter$1(GpuParquetScan.scala:616)
		at com.nvidia.spark.rapids.Arm.withResource(Arm.scala:28)
		at com.nvidia.spark.rapids.Arm.withResource$(Arm.scala:26)
		at com.nvidia.spark.rapids.GpuParquetFileFilterHandler.withResource(GpuParquetScan.scala:479)
		at com.nvidia.spark.rapids.GpuParquetFileFilterHandler.readAndSimpleFilterFooter(GpuParquetScan.scala:614)
		at com.nvidia.spark.rapids.GpuParquetFileFilterHandler.$anonfun$filterBlocks$1(GpuParquetScan.scala:663)
		at com.nvidia.spark.rapids.Arm.withResource(Arm.scala:28)
		at com.nvidia.spark.rapids.Arm.withResource$(Arm.scala:26)
		at com.nvidia.spark.rapids.GpuParquetFileFilterHandler.withResource(GpuParquetScan.scala:479)
		at com.nvidia.spark.rapids.GpuParquetFileFilterHandler.filterBlocks(GpuParquetScan.scala:627)
		at com.nvidia.spark.rapids.GpuParquetMultiFilePartitionReaderFactory.$anonfun$buildBaseColumnarReaderForCloud$1(GpuParquetScan.scala:995)
		at com.nvidia.spark.rapids.MultiFileCloudParquetPartitionReader$ReadBatchRunner.doRead(GpuParquetScan.scala:2131)
		at com.nvidia.spark.rapids.MultiFileCloudParquetPartitionReader$ReadBatchRunner.call(GpuParquetScan.scala:2106)
		at com.nvidia.spark.rapids.MultiFileCloudParquetPartitionReader$ReadBatchRunner.call(GpuParquetScan.scala:2087)
		at java.util.concurrent.FutureTask.run(FutureTask.java:266)
		... 3 more

The text was updated successfully, but these errors were encountered:

tgravescs · 2023-05-01T21:56:29Z

Note the exception is related to unity catalog and the job that had this failure had that enabled.

tgravescs · 2023-05-03T18:47:41Z

Note the stack trace changes for the location it gets thrown when using the coalescing reader without parallel footer filtering. So it seems it definitely just happens when we are doing the parsing in separate threads so likely something local or something we aren't passing on.

tgravescs · 2023-05-15T20:58:25Z

the only way I've been able to reproduce at this point is to do a delta write to a one table and then a read from another one. I have limited options as its customer env. My guess might be that the write is setting up credentials scope to one thing and then when we go to do the read the scope is set wrong and we have to explicitly set it for the read to work.

tgravescs · 2023-05-17T14:18:35Z

parquet was handled under PR 8296. Also filed #8242 to find a good way to integration test against Unity

tgravescs · 2023-05-18T20:43:34Z

Also need to make sure to test Unity with Alluxio

tgravescs · 2023-08-11T19:09:34Z

I believe the thing left here was just to try to find a reproduce case to add specific test for this fix.

tgravescs added bug Something isn't working ? - Needs Triage Need team to review and classify labels May 1, 2023

mattahrens assigned tgravescs May 2, 2023

mattahrens removed the ? - Needs Triage Need team to review and classify label May 2, 2023

tgravescs mentioned this issue May 15, 2023

Fix Multithreaded Readers working with Unity Catalog on Databricks [databricks] #8296

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[BUG] Databricks 11.3 exception - Multithreaded parquet reader - Missing Credential Scope #8210

[BUG] Databricks 11.3 exception - Multithreaded parquet reader - Missing Credential Scope #8210

tgravescs commented May 1, 2023 •

edited

Loading

tgravescs commented May 1, 2023 •

edited

Loading

tgravescs commented May 3, 2023

tgravescs commented May 15, 2023

tgravescs commented May 17, 2023

tgravescs commented May 18, 2023

tgravescs commented Aug 11, 2023

[BUG] Databricks 11.3 exception - Multithreaded parquet reader - Missing Credential Scope #8210

[BUG] Databricks 11.3 exception - Multithreaded parquet reader - Missing Credential Scope #8210

Comments

tgravescs commented May 1, 2023 • edited Loading

tgravescs commented May 1, 2023 • edited Loading

tgravescs commented May 3, 2023

tgravescs commented May 15, 2023

tgravescs commented May 17, 2023

tgravescs commented May 18, 2023

tgravescs commented Aug 11, 2023

tgravescs commented May 1, 2023 •

edited

Loading

tgravescs commented May 1, 2023 •

edited

Loading