Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] Databricks 11.3 exception - Multithreaded parquet reader - Missing Credential Scope #8210

Open
tgravescs opened this issue May 1, 2023 · 6 comments
Assignees
Labels
bug Something isn't working

Comments

@tgravescs
Copy link
Collaborator

tgravescs commented May 1, 2023

Describe the bug

Seeing an exception running on Databricks 11.3 when reading parquet files, I tried with both 23.04 released jar and 23.06 snapshot jar, here it is using the multithreaded reader. Fails with the coalescing reader as well, but the PERFILE worked.

org.apache.spark.util.TaskCompletionListenerException: org.apache.spark.SparkException: Missing Credential Scope.

Previous exception in task: org.apache.spark.SparkException: Missing Credential Scope.
	java.util.concurrent.FutureTask.report(FutureTask.java:122)
	java.util.concurrent.FutureTask.get(FutureTask.java:192)
	com.nvidia.spark.rapids.MultiFileCloudPartitionReaderBase.readReadyFiles(GpuMultiFileReader.scala:658)
	com.nvidia.spark.rapids.MultiFileCloudPartitionReaderBase.getNextBuffersAndMetaAndCombine(GpuMultiFileReader.scala:670)
	com.nvidia.spark.rapids.MultiFileCloudPartitionReaderBase.getNextBuffersAndMeta(GpuMultiFileReader.scala:706)
	com.nvidia.spark.rapids.MultiFileCloudPartitionReaderBase.$anonfun$next$1(GpuMultiFileReader.scala:765)
	com.nvidia.spark.rapids.MultiFileCloudPartitionReaderBase.$anonfun$next$1$adapted(GpuMultiFileReader.scala:740)
	com.nvidia.spark.rapids.Arm.withResource(Arm.scala:28)
	com.nvidia.spark.rapids.Arm.withResource$(Arm.scala:26)
	com.nvidia.spark.rapids.FilePartitionReaderBase.withResource(GpuMultiFileReader.scala:375)
	com.nvidia.spark.rapids.MultiFileCloudPartitionReaderBase.next(GpuMultiFileReader.scala:740)
	com.nvidia.spark.rapids.PartitionIterator.hasNext(dataSourceUtil.scala:29)
	com.nvidia.spark.rapids.MetricsBatchIterator.hasNext(dataSourceUtil.scala:46)
	com.nvidia.spark.rapids.shims.GpuDataSourceRDD$$anon$1.$anonfun$hasNext$1(GpuDataSourceRDD.scala:61)
	com.nvidia.spark.rapids.shims.GpuDataSourceRDD$$anon$1.$anonfun$hasNext$1$adapted(GpuDataSourceRDD.scala:61)
	scala.Option.exists(Option.scala:376)
	com.nvidia.spark.rapids.shims.GpuDataSourceRDD$$anon$1.hasNext(GpuDataSourceRDD.scala:61)
	com.nvidia.spark.rapids.shims.GpuDataSourceRDD$$anon$1.advanceToNextIter(GpuDataSourceRDD.scala:86)
	com.nvidia.spark.rapids.shims.GpuDataSourceRDD$$anon$1.hasNext(GpuDataSourceRDD.scala:61)
	org.apache.spark.InterruptibleIterator.hasNext(InterruptibleIterator.scala:37)
	org.apache.spark.sql.rapids.GpuFileSourceScanExec$$anon$1.hasNext(GpuFileSourceScanExec.scala:469)
	org.apache.spark.sql.rapids.execution.GpuShuffleExchangeExecBase$$anon$1.partNextBatch(GpuShuffleExchangeExecBase.scala:317)
	org.apache.spark.sql.rapids.execution.GpuShuffleExchangeExecBase$$anon$1.hasNext(GpuShuffleExchangeExecBase.scala:340)
	org.apache.spark.sql.rapids.RapidsShuffleThreadedWriterBase.$anonfun$write$2(RapidsShuffleInternalManagerBase.scala:281)
	org.apache.spark.sql.rapids.RapidsShuffleThreadedWriterBase.$anonfun$write$2$adapted(RapidsShuffleInternalManagerBase.scala:274)
	com.nvidia.spark.rapids.Arm.withResource(Arm.scala:28)
	com.nvidia.spark.rapids.Arm.withResource$(Arm.scala:26)
	org.apache.spark.sql.rapids.RapidsShuffleThreadedWriterBase.withResource(RapidsShuffleInternalManagerBase.scala:234)
	org.apache.spark.sql.rapids.RapidsShuffleThreadedWriterBase.$anonfun$write$1(RapidsShuffleInternalManagerBase.scala:274)
	org.apache.spark.sql.rapids.RapidsShuffleThreadedWriterBase.$anonfun$write$1$adapted(RapidsShuffleInternalManagerBase.scala:273)
	com.nvidia.spark.rapids.Arm.withResource(Arm.scala:28)
	com.nvidia.spark.rapids.Arm.withResource$(Arm.scala:26)
	org.apache.spark.sql.rapids.RapidsShuffleThreadedWriterBase.withResource(RapidsShuffleInternalManagerBase.scala:234)
	org.apache.spark.sql.rapids.RapidsShuffleThreadedWriterBase.write(RapidsShuffleInternalManagerBase.scala:273)
	org.apache.spark.shuffle.ShuffleWriteProcessor.write(ShuffleWriteProcessor.scala:59)
	org.apache.spark.scheduler.ShuffleMapTask.$anonfun$runTask$3(ShuffleMapTask.scala:81)
	com.databricks.spark.util.ExecutorFrameProfiler$.record(ExecutorFrameProfiler.scala:110)
	org.apache.spark.scheduler.ShuffleMapTask.$anonfun$runTask$1(ShuffleMapTask.scala:81)
	com.databricks.spark.util.ExecutorFrameProfiler$.record(ExecutorFrameProfiler.scala:110)
	org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:53)
	org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:41)
	org.apache.spark.scheduler.Task.doRunTask(Task.scala:169)
	org.apache.spark.scheduler.Task.$anonfun$run$4(Task.scala:137)
	com.databricks.unity.UCSEphemeralState$Handle.runWith(UCSEphemeralState.scala:41)
	com.databricks.unity.HandleImpl.runWith(UCSHandle.scala:99)
	com.databricks.unity.HandleImpl.$anonfun$runWithAndClose$1(UCSHandle.scala:104)
	scala.util.Using$.resource(Using.scala:269)
	com.databricks.unity.HandleImpl.runWithAndClose(UCSHandle.scala:103)
	org.apache.spark.scheduler.Task.$anonfun$run$1(Task.scala:137)
	com.databricks.spark.util.ExecutorFrameProfiler$.record(ExecutorFrameProfiler.scala:110)
	org.apache.spark.scheduler.Task.run(Task.scala:96)
	org.apache.spark.executor.Executor$TaskRunner.$anonfun$run$13(Executor.scala:902)
	org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1697)
	org.apache.spark.executor.Executor$TaskRunner.$anonfun$run$4(Executor.scala:905)
	scala.runtime.java8.JFunction0$mcV$sp.apply(JFunction0$mcV$sp.java:23)
	com.databricks.spark.util.ExecutorFrameProfiler$.record(ExecutorFrameProfiler.scala:110)
	org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:760)
	java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
	java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
	java.lang.Thread.run(Thread.java:750)
	at org.apache.spark.TaskContextImpl.invokeListeners(TaskContextImpl.scala:233)
	at org.apache.spark.TaskContextImpl.invokeTaskCompletionListeners(TaskContextImpl.scala:169)
	at org.apache.spark.TaskContextImpl.markTaskCompleted(TaskContextImpl.scala:162)
	at org.apache.spark.scheduler.Task.doRunTask(Task.scala:179)
	at org.apache.spark.scheduler.Task.$anonfun$run$4(Task.scala:137)
	at com.databricks.unity.UCSEphemeralState$Handle.runWith(UCSEphemeralState.scala:41)
	at com.databricks.unity.HandleImpl.runWith(UCSHandle.scala:99)
	at com.databricks.unity.HandleImpl.$anonfun$runWithAndClose$1(UCSHandle.scala:104)
	at scala.util.Using$.resource(Using.scala:269)
	at com.databricks.unity.HandleImpl.runWithAndClose(UCSHandle.scala:103)
	at org.apache.spark.scheduler.Task.$anonfun$run$1(Task.scala:137)
	at com.databricks.spark.util.ExecutorFrameProfiler$.record(ExecutorFrameProfiler.scala:110)
	at org.apache.spark.scheduler.Task.run(Task.scala:96)
	at org.apache.spark.executor.Executor$TaskRunner.$anonfun$run$13(Executor.scala:902)
	at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1697)
	at org.apache.spark.executor.Executor$TaskRunner.$anonfun$run$4(Executor.scala:905)
	at scala.runtime.java8.JFunction0$mcV$sp.apply(JFunction0$mcV$sp.java:23)
	at com.databricks.spark.util.ExecutorFrameProfiler$.record(ExecutorFrameProfiler.scala:110)
	at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:760)
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
	at java.lang.Thread.run(Thread.java:750)
	Suppressed: java.util.concurrent.ExecutionException: org.apache.spark.SparkException: Missing Credential Scope.
		at java.util.concurrent.FutureTask.report(FutureTask.java:122)
		at java.util.concurrent.FutureTask.get(FutureTask.java:192)
		at com.nvidia.spark.rapids.MultiFileCloudPartitionReaderBase.$anonfun$close$1(GpuMultiFileReader.scala:844)
		at scala.collection.Iterator.foreach(Iterator.scala:943)
		at scala.collection.Iterator.foreach$(Iterator.scala:943)
		at scala.collection.AbstractIterator.foreach(Iterator.scala:1431)
		at scala.collection.IterableLike.foreach(IterableLike.scala:74)
		at scala.collection.IterableLike.foreach$(IterableLike.scala:73)
		at scala.collection.AbstractIterable.foreach(Iterable.scala:56)
		at com.nvidia.spark.rapids.MultiFileCloudPartitionReaderBase.close(GpuMultiFileReader.scala:842)
		at com.nvidia.spark.rapids.shims.GpuDataSourceRDD$$anon$1.$anonfun$advanceToNextIter$1(GpuDataSourceRDD.scala:83)
		at com.nvidia.spark.rapids.shims.GpuDataSourceRDD$$anon$1.$anonfun$advanceToNextIter$1$adapted(GpuDataSourceRDD.scala:82)
		at org.apache.spark.TaskContext$$anon$1.onTaskCompletion(TaskContext.scala:162)
		at org.apache.spark.TaskContextImpl.$anonfun$invokeTaskCompletionListeners$1(TaskContextImpl.scala:169)
		at org.apache.spark.TaskContextImpl.$anonfun$invokeTaskCompletionListeners$1$adapted(TaskContextImpl.scala:169)
		at org.apache.spark.TaskContextImpl.invokeListeners(TaskContextImpl.scala:224)
		... 21 more
	Caused by: org.apache.spark.SparkException: Missing Credential Scope.
		at com.databricks.unity.UCSExecutor.$anonfun$currentScope$1(UCSExecutor.scala:68)
		at scala.Option.getOrElse(Option.scala:189)
		at com.databricks.unity.UCSExecutor.currentScope(UCSExecutor.scala:68)
		at com.databricks.unity.UCSExecutor.currentScope$(UCSExecutor.scala:66)
		at com.databricks.unity.UCSExecutor$.currentScope(UCSExecutor.scala:76)
		at com.databricks.unity.UnityCredentialScope$.currentScope(UnityCredentialScope.scala:96)
		at com.databricks.unity.UnityCredentialScope$.getSAMRegistry(UnityCredentialScope.scala:120)
		at com.databricks.unity.SAMRegistry$.getSAM(SAMRegistry.scala:343)
		at com.databricks.sql.acl.fs.CredentialScopeFileSystem.setDelegates(CredentialScopeFileSystem.scala:133)
		at com.databricks.sql.acl.fs.CredentialScopeFileSystem.initialize(CredentialScopeFileSystem.scala:178)
		at org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:3469)
		at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:537)
		at org.apache.hadoop.fs.Path.getFileSystem(Path.java:365)
		at org.apache.parquet.hadoop.util.HadoopInputFile.fromPath(HadoopInputFile.java:43)
		at org.apache.parquet.hadoop.ParquetFileReader.readFooter(ParquetFileReader.java:481)
		at com.nvidia.spark.rapids.GpuParquetFileFilterHandler.$anonfun$readAndSimpleFilterFooter$1(GpuParquetScan.scala:616)
		at com.nvidia.spark.rapids.Arm.withResource(Arm.scala:28)
		at com.nvidia.spark.rapids.Arm.withResource$(Arm.scala:26)
		at com.nvidia.spark.rapids.GpuParquetFileFilterHandler.withResource(GpuParquetScan.scala:479)
		at com.nvidia.spark.rapids.GpuParquetFileFilterHandler.readAndSimpleFilterFooter(GpuParquetScan.scala:614)
		at com.nvidia.spark.rapids.GpuParquetFileFilterHandler.$anonfun$filterBlocks$1(GpuParquetScan.scala:663)
		at com.nvidia.spark.rapids.Arm.withResource(Arm.scala:28)
		at com.nvidia.spark.rapids.Arm.withResource$(Arm.scala:26)
		at com.nvidia.spark.rapids.GpuParquetFileFilterHandler.withResource(GpuParquetScan.scala:479)
		at com.nvidia.spark.rapids.GpuParquetFileFilterHandler.filterBlocks(GpuParquetScan.scala:627)
		at com.nvidia.spark.rapids.GpuParquetMultiFilePartitionReaderFactory.$anonfun$buildBaseColumnarReaderForCloud$1(GpuParquetScan.scala:995)
		at com.nvidia.spark.rapids.MultiFileCloudParquetPartitionReader$ReadBatchRunner.doRead(GpuParquetScan.scala:2131)
		at com.nvidia.spark.rapids.MultiFileCloudParquetPartitionReader$ReadBatchRunner.call(GpuParquetScan.scala:2106)
		at com.nvidia.spark.rapids.MultiFileCloudParquetPartitionReader$ReadBatchRunner.call(GpuParquetScan.scala:2087)
		at java.util.concurrent.FutureTask.run(FutureTask.java:266)
		... 3 more
@tgravescs tgravescs added bug Something isn't working ? - Needs Triage Need team to review and classify labels May 1, 2023
@tgravescs
Copy link
Collaborator Author

tgravescs commented May 1, 2023

Note the exception is related to unity catalog and the job that had this failure had that enabled.

@mattahrens mattahrens removed the ? - Needs Triage Need team to review and classify label May 2, 2023
@tgravescs
Copy link
Collaborator Author

Note the stack trace changes for the location it gets thrown when using the coalescing reader without parallel footer filtering. So it seems it definitely just happens when we are doing the parsing in separate threads so likely something local or something we aren't passing on.

@tgravescs
Copy link
Collaborator Author

the only way I've been able to reproduce at this point is to do a delta write to a one table and then a read from another one. I have limited options as its customer env. My guess might be that the write is setting up credentials scope to one thing and then when we go to do the read the scope is set wrong and we have to explicitly set it for the read to work.

@tgravescs
Copy link
Collaborator Author

parquet was handled under PR 8296. Also filed #8242 to find a good way to integration test against Unity

@tgravescs
Copy link
Collaborator Author

Also need to make sure to test Unity with Alluxio

@tgravescs
Copy link
Collaborator Author

I believe the thing left here was just to try to find a reproduce case to add specific test for this fix.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants