Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] fs.azure.account.keyInvalid configuration issue while reading from Unity Catalog Tables on Azure DB #10318

Closed
SurajAralihalli opened this issue Jan 29, 2024 · 4 comments · Fixed by #10756
Assignees
Labels
bug Something isn't working

Comments

@SurajAralihalli
Copy link
Collaborator

SurajAralihalli commented Jan 29, 2024

Describe the bug
While using Azure Databricks and attempting to read a Managed Table from the Unity Catalog Metastore with the RAPIDS Accelerator, I encountered invalid credentials issue with the following message: Failure to initialize configuration for storage account databricksmetaeast.dfs.core.windows.net: Invalid configuration value detected for fs.azure.account.keyInvalid configuration value detected for fs.azure.account.key. However, this error doesn't occur when RAPIDS is disabled.

Notes
Adding the credentials of the storage container to the Spark configuration properties can serve as an interim solution. However, this approach is not scalable when there are multiple storage containers.

Environment details
Managed Tables on Azure Databricks with Unity Catalog and RAPIDS Accelerator

@SurajAralihalli SurajAralihalli added bug Something isn't working ? - Needs Triage Need team to review and classify labels Jan 29, 2024
@mattahrens mattahrens removed the ? - Needs Triage Need team to review and classify label Jan 30, 2024
@jlowe
Copy link
Member

jlowe commented Feb 1, 2024

What is the table format -- is it a Delta Lake table, a raw Parquet table, or something else? A stacktrace of the error would help.

Assuming this is with a table that's ultimately comprised of Parquet files, does this happen even with the config spark.rapids.sql.format.parquet.reader.type=PERFILE? If it works with the PERFILE reader, then that tells us the issue is with setting up the proper context for the multithreaded readers.

@SurajAralihalli
Copy link
Collaborator Author

Yes, its a Delta lake table. It didn't work with the spark.rapids.sql.format.parquet.reader.type=PERFILE. However works when I explicitly configure the fs.azure.account.key in spark properties.

Stack Trace:

: org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in stage 22.0 failed 4 times, most recent failure: Lost task 0.3 in stage 22.0 (TID 28) (10.9.4.10 executor 0): Failure to initialize configuration for storage account databricksmetaeast.dfs.core.windows.net: Invalid configuration value detected for fs.azure.account.keyInvalid configuration value detected for fs.azure.account.key
	at shaded.databricks.azurebfs.org.apache.hadoop.fs.azurebfs.services.SimpleKeyProvider.getStorageAccountKey(SimpleKeyProvider.java:52)
	at shaded.databricks.azurebfs.org.apache.hadoop.fs.azurebfs.AbfsConfiguration.getStorageAccountKey(AbfsConfiguration.java:670)
	at shaded.databricks.azurebfs.org.apache.hadoop.fs.azurebfs.AzureBlobFileSystemStore.initializeClient(AzureBlobFileSystemStore.java:2055)
	at shaded.databricks.azurebfs.org.apache.hadoop.fs.azurebfs.AzureBlobFileSystemStore.<init>(AzureBlobFileSystemStore.java:267)
	at shaded.databricks.azurebfs.org.apache.hadoop.fs.azurebfs.AzureBlobFileSystem.initialize(AzureBlobFileSystem.java:225)
	at com.databricks.common.filesystem.LokiFileSystem$.$anonfun$getLokiFS$1(LokiFileSystem.scala:64)
	at com.databricks.common.filesystem.Cache.getOrCompute(Cache.scala:38)
	at com.databricks.common.filesystem.LokiFileSystem$.getLokiFS(LokiFileSystem.scala:61)
	at com.databricks.common.filesystem.LokiFileSystem.initialize(LokiFileSystem.scala:87)
	at org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:3469)
	at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:537)
	at org.apache.hadoop.fs.Path.getFileSystem(Path.java:365)
	at org.apache.parquet.hadoop.util.HadoopInputFile.fromPath(HadoopInputFile.java:43)
	at org.apache.parquet.hadoop.ParquetFileReader.readFooter(ParquetFileReader.java:482)
	at com.nvidia.spark.rapids.GpuParquetFileFilterHandler.$anonfun$readAndSimpleFilterFooter$11(GpuParquetScan.scala:676)
	at scala.Option.getOrElse(Option.scala:189)
	at com.nvidia.spark.rapids.GpuParquetFileFilterHandler.$anonfun$readAndSimpleFilterFooter$6(GpuParquetScan.scala:675)
	at scala.Option.getOrElse(Option.scala:189)
	at com.nvidia.spark.rapids.GpuParquetFileFilterHandler.$anonfun$readAndSimpleFilterFooter$1(GpuParquetScan.scala:652)
	at com.nvidia.spark.rapids.Arm$.withResource(Arm.scala:29)
	at com.nvidia.spark.rapids.GpuParquetFileFilterHandler.readAndSimpleFilterFooter(GpuParquetScan.scala:643)
	at com.nvidia.spark.rapids.GpuParquetFileFilterHandler.$anonfun$filterBlocks$1(GpuParquetScan.scala:728)
	at com.nvidia.spark.rapids.Arm$.withResource(Arm.scala:29)
	at com.nvidia.spark.rapids.GpuParquetFileFilterHandler.filterBlocks(GpuParquetScan.scala:689)
	at com.nvidia.spark.rapids.GpuParquetPartitionReaderFactory.buildBaseColumnarParquetReader(GpuParquetScan.scala:1338)
	at com.nvidia.spark.rapids.GpuParquetPartitionReaderFactory.buildColumnarReader(GpuParquetScan.scala:1328)
	at com.nvidia.spark.rapids.PartitionReaderIterator$.$anonfun$buildReader$1(PartitionReaderIterator.scala:66)
	at org.apache.spark.sql.rapids.shims.GpuFileScanRDD$$anon$1.org$apache$spark$sql$rapids$shims$GpuFileScanRDD$$anon$$readCurrentFile(GpuFileScanRDD.scala:97)
	at org.apache.spark.sql.rapids.shims.GpuFileScanRDD$$anon$1.nextIterator(GpuFileScanRDD.scala:151)
	at org.apache.spark.sql.rapids.shims.GpuFileScanRDD$$anon$1.hasNext(GpuFileScanRDD.scala:74)
	at org.apache.spark.sql.rapids.GpuFileSourceScanExec$$anon$1.hasNext(GpuFileSourceScanExec.scala:474)
	at com.nvidia.spark.rapids.DynamicGpuPartialSortAggregateIterator.$anonfun$hasNext$4(GpuAggregateExec.scala:1930)
	at scala.runtime.java8.JFunction0$mcZ$sp.apply(JFunction0$mcZ$sp.java:23)
	at scala.Option.getOrElse(Option.scala:189)
	at com.nvidia.spark.rapids.DynamicGpuPartialSortAggregateIterator.hasNext(GpuAggregateExec.scala:1930)
	at org.apache.spark.sql.rapids.execution.GpuShuffleExchangeExecBase$$anon$1.partNextBatch(GpuShuffleExchangeExecBase.scala:332)
	at org.apache.spark.sql.rapids.execution.GpuShuffleExchangeExecBase$$anon$1.hasNext(GpuShuffleExchangeExecBase.scala:355)
	at org.apache.spark.shuffle.sort.BypassMergeSortShuffleWriter.write(BypassMergeSortShuffleWriter.java:151)
	at org.apache.spark.shuffle.ShuffleWriteProcessor.write(ShuffleWriteProcessor.scala:59)
	at org.apache.spark.scheduler.ShuffleMapTask.$anonfun$runTask$3(ShuffleMapTask.scala:81)
	at com.databricks.spark.util.ExecutorFrameProfiler$.record(ExecutorFrameProfiler.scala:110)
	at org.apache.spark.scheduler.ShuffleMapTask.$anonfun$runTask$1(ShuffleMapTask.scala:81)
	at com.databricks.spark.util.ExecutorFrameProfiler$.record(ExecutorFrameProfiler.scala:110)
	at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:53)
	at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:41)
	at org.apache.spark.scheduler.Task.doRunTask(Task.scala:179)
	at org.apache.spark.scheduler.Task.$anonfun$run$5(Task.scala:142)
	at com.databricks.unity.UCSEphemeralState$Handle.runWith(UCSEphemeralState.scala:41)
	at com.databricks.unity.HandleImpl.runWith(UCSHandle.scala:99)
	at com.databricks.unity.HandleImpl.$anonfun$runWithAndClose$1(UCSHandle.scala:104)
	at scala.util.Using$.resource(Using.scala:269)
	at com.databricks.unity.HandleImpl.runWithAndClose(UCSHandle.scala:103)
	at org.apache.spark.scheduler.Task.$anonfun$run$1(Task.scala:142)
	at com.databricks.spark.util.ExecutorFrameProfiler$.record(ExecutorFrameProfiler.scala:110)
	at org.apache.spark.scheduler.Task.run(Task.scala:97)
	at org.apache.spark.executor.Executor$TaskRunner.$anonfun$run$13(Executor.scala:904)
	at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1740)
	at org.apache.spark.executor.Executor$TaskRunner.$anonfun$run$4(Executor.scala:907)
	at scala.runtime.java8.JFunction0$mcV$sp.apply(JFunction0$mcV$sp.java:23)
	at com.databricks.spark.util.ExecutorFrameProfiler$.record(ExecutorFrameProfiler.scala:110)
	at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:761)
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
	at java.lang.Thread.run(Thread.java:750)
Caused by: Invalid configuration value detected for fs.azure.account.key
	at shaded.databricks.azurebfs.org.apache.hadoop.fs.azurebfs.diagnostics.ConfigurationBasicValidator.validate(ConfigurationBasicValidator.java:49)
	at shaded.databricks.azurebfs.org.apache.hadoop.fs.azurebfs.diagnostics.Base64StringConfigurationBasicValidator.validate(Base64StringConfigurationBasicValidator.java:40)
	at shaded.databricks.azurebfs.org.apache.hadoop.fs.azurebfs.services.SimpleKeyProvider.validateStorageAccountKey(SimpleKeyProvider.java:71)
	at shaded.databricks.azurebfs.org.apache.hadoop.fs.azurebfs.services.SimpleKeyProvider.getStorageAccountKey(SimpleKeyProvider.java:49)
	... 63 more

@razajafri
Copy link
Collaborator

@mattahrens @sameerz Is this related to #8242?

@sameerz
Copy link
Collaborator

sameerz commented May 1, 2024

@mattahrens @sameerz Is this related to #8242?

Yes it is related. We do not need to test with Alluxio, but with filecache.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
5 participants