[BUG] `fs.azure.account.keyInvalid` configuration issue while reading from Unity Catalog Tables on Azure DB #10318

SurajAralihalli · 2024-01-29T19:15:38Z

Describe the bug
While using Azure Databricks and attempting to read a Managed Table from the Unity Catalog Metastore with the RAPIDS Accelerator, I encountered invalid credentials issue with the following message: Failure to initialize configuration for storage account databricksmetaeast.dfs.core.windows.net: Invalid configuration value detected for fs.azure.account.keyInvalid configuration value detected for fs.azure.account.key. However, this error doesn't occur when RAPIDS is disabled.

Notes
Adding the credentials of the storage container to the Spark configuration properties can serve as an interim solution. However, this approach is not scalable when there are multiple storage containers.

Environment details
Managed Tables on Azure Databricks with Unity Catalog and RAPIDS Accelerator

The text was updated successfully, but these errors were encountered:

jlowe · 2024-02-01T23:05:02Z

What is the table format -- is it a Delta Lake table, a raw Parquet table, or something else? A stacktrace of the error would help.

Assuming this is with a table that's ultimately comprised of Parquet files, does this happen even with the config spark.rapids.sql.format.parquet.reader.type=PERFILE? If it works with the PERFILE reader, then that tells us the issue is with setting up the proper context for the multithreaded readers.

SurajAralihalli · 2024-02-01T23:47:56Z

Yes, its a Delta lake table. It didn't work with the spark.rapids.sql.format.parquet.reader.type=PERFILE. However works when I explicitly configure the fs.azure.account.key in spark properties.

Stack Trace:

: org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in stage 22.0 failed 4 times, most recent failure: Lost task 0.3 in stage 22.0 (TID 28) (10.9.4.10 executor 0): Failure to initialize configuration for storage account databricksmetaeast.dfs.core.windows.net: Invalid configuration value detected for fs.azure.account.keyInvalid configuration value detected for fs.azure.account.key
	at shaded.databricks.azurebfs.org.apache.hadoop.fs.azurebfs.services.SimpleKeyProvider.getStorageAccountKey(SimpleKeyProvider.java:52)
	at shaded.databricks.azurebfs.org.apache.hadoop.fs.azurebfs.AbfsConfiguration.getStorageAccountKey(AbfsConfiguration.java:670)
	at shaded.databricks.azurebfs.org.apache.hadoop.fs.azurebfs.AzureBlobFileSystemStore.initializeClient(AzureBlobFileSystemStore.java:2055)
	at shaded.databricks.azurebfs.org.apache.hadoop.fs.azurebfs.AzureBlobFileSystemStore.<init>(AzureBlobFileSystemStore.java:267)
	at shaded.databricks.azurebfs.org.apache.hadoop.fs.azurebfs.AzureBlobFileSystem.initialize(AzureBlobFileSystem.java:225)
	at com.databricks.common.filesystem.LokiFileSystem$.$anonfun$getLokiFS$1(LokiFileSystem.scala:64)
	at com.databricks.common.filesystem.Cache.getOrCompute(Cache.scala:38)
	at com.databricks.common.filesystem.LokiFileSystem$.getLokiFS(LokiFileSystem.scala:61)
	at com.databricks.common.filesystem.LokiFileSystem.initialize(LokiFileSystem.scala:87)
	at org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:3469)
	at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:537)
	at org.apache.hadoop.fs.Path.getFileSystem(Path.java:365)
	at org.apache.parquet.hadoop.util.HadoopInputFile.fromPath(HadoopInputFile.java:43)
	at org.apache.parquet.hadoop.ParquetFileReader.readFooter(ParquetFileReader.java:482)
	at com.nvidia.spark.rapids.GpuParquetFileFilterHandler.$anonfun$readAndSimpleFilterFooter$11(GpuParquetScan.scala:676)
	at scala.Option.getOrElse(Option.scala:189)
	at com.nvidia.spark.rapids.GpuParquetFileFilterHandler.$anonfun$readAndSimpleFilterFooter$6(GpuParquetScan.scala:675)
	at scala.Option.getOrElse(Option.scala:189)
	at com.nvidia.spark.rapids.GpuParquetFileFilterHandler.$anonfun$readAndSimpleFilterFooter$1(GpuParquetScan.scala:652)
	at com.nvidia.spark.rapids.Arm$.withResource(Arm.scala:29)
	at com.nvidia.spark.rapids.GpuParquetFileFilterHandler.readAndSimpleFilterFooter(GpuParquetScan.scala:643)
	at com.nvidia.spark.rapids.GpuParquetFileFilterHandler.$anonfun$filterBlocks$1(GpuParquetScan.scala:728)
	at com.nvidia.spark.rapids.Arm$.withResource(Arm.scala:29)
	at com.nvidia.spark.rapids.GpuParquetFileFilterHandler.filterBlocks(GpuParquetScan.scala:689)
	at com.nvidia.spark.rapids.GpuParquetPartitionReaderFactory.buildBaseColumnarParquetReader(GpuParquetScan.scala:1338)
	at com.nvidia.spark.rapids.GpuParquetPartitionReaderFactory.buildColumnarReader(GpuParquetScan.scala:1328)
	at com.nvidia.spark.rapids.PartitionReaderIterator$.$anonfun$buildReader$1(PartitionReaderIterator.scala:66)
	at org.apache.spark.sql.rapids.shims.GpuFileScanRDD$$anon$1.org$apache$spark$sql$rapids$shims$GpuFileScanRDD$$anon$$readCurrentFile(GpuFileScanRDD.scala:97)
	at org.apache.spark.sql.rapids.shims.GpuFileScanRDD$$anon$1.nextIterator(GpuFileScanRDD.scala:151)
	at org.apache.spark.sql.rapids.shims.GpuFileScanRDD$$anon$1.hasNext(GpuFileScanRDD.scala:74)
	at org.apache.spark.sql.rapids.GpuFileSourceScanExec$$anon$1.hasNext(GpuFileSourceScanExec.scala:474)
	at com.nvidia.spark.rapids.DynamicGpuPartialSortAggregateIterator.$anonfun$hasNext$4(GpuAggregateExec.scala:1930)
	at scala.runtime.java8.JFunction0$mcZ$sp.apply(JFunction0$mcZ$sp.java:23)
	at scala.Option.getOrElse(Option.scala:189)
	at com.nvidia.spark.rapids.DynamicGpuPartialSortAggregateIterator.hasNext(GpuAggregateExec.scala:1930)
	at org.apache.spark.sql.rapids.execution.GpuShuffleExchangeExecBase$$anon$1.partNextBatch(GpuShuffleExchangeExecBase.scala:332)
	at org.apache.spark.sql.rapids.execution.GpuShuffleExchangeExecBase$$anon$1.hasNext(GpuShuffleExchangeExecBase.scala:355)
	at org.apache.spark.shuffle.sort.BypassMergeSortShuffleWriter.write(BypassMergeSortShuffleWriter.java:151)
	at org.apache.spark.shuffle.ShuffleWriteProcessor.write(ShuffleWriteProcessor.scala:59)
	at org.apache.spark.scheduler.ShuffleMapTask.$anonfun$runTask$3(ShuffleMapTask.scala:81)
	at com.databricks.spark.util.ExecutorFrameProfiler$.record(ExecutorFrameProfiler.scala:110)
	at org.apache.spark.scheduler.ShuffleMapTask.$anonfun$runTask$1(ShuffleMapTask.scala:81)
	at com.databricks.spark.util.ExecutorFrameProfiler$.record(ExecutorFrameProfiler.scala:110)
	at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:53)
	at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:41)
	at org.apache.spark.scheduler.Task.doRunTask(Task.scala:179)
	at org.apache.spark.scheduler.Task.$anonfun$run$5(Task.scala:142)
	at com.databricks.unity.UCSEphemeralState$Handle.runWith(UCSEphemeralState.scala:41)
	at com.databricks.unity.HandleImpl.runWith(UCSHandle.scala:99)
	at com.databricks.unity.HandleImpl.$anonfun$runWithAndClose$1(UCSHandle.scala:104)
	at scala.util.Using$.resource(Using.scala:269)
	at com.databricks.unity.HandleImpl.runWithAndClose(UCSHandle.scala:103)
	at org.apache.spark.scheduler.Task.$anonfun$run$1(Task.scala:142)
	at com.databricks.spark.util.ExecutorFrameProfiler$.record(ExecutorFrameProfiler.scala:110)
	at org.apache.spark.scheduler.Task.run(Task.scala:97)
	at org.apache.spark.executor.Executor$TaskRunner.$anonfun$run$13(Executor.scala:904)
	at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1740)
	at org.apache.spark.executor.Executor$TaskRunner.$anonfun$run$4(Executor.scala:907)
	at scala.runtime.java8.JFunction0$mcV$sp.apply(JFunction0$mcV$sp.java:23)
	at com.databricks.spark.util.ExecutorFrameProfiler$.record(ExecutorFrameProfiler.scala:110)
	at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:761)
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
	at java.lang.Thread.run(Thread.java:750)
Caused by: Invalid configuration value detected for fs.azure.account.key
	at shaded.databricks.azurebfs.org.apache.hadoop.fs.azurebfs.diagnostics.ConfigurationBasicValidator.validate(ConfigurationBasicValidator.java:49)
	at shaded.databricks.azurebfs.org.apache.hadoop.fs.azurebfs.diagnostics.Base64StringConfigurationBasicValidator.validate(Base64StringConfigurationBasicValidator.java:40)
	at shaded.databricks.azurebfs.org.apache.hadoop.fs.azurebfs.services.SimpleKeyProvider.validateStorageAccountKey(SimpleKeyProvider.java:71)
	at shaded.databricks.azurebfs.org.apache.hadoop.fs.azurebfs.services.SimpleKeyProvider.getStorageAccountKey(SimpleKeyProvider.java:49)
	... 63 more

razajafri · 2024-04-30T23:04:30Z

@mattahrens @sameerz Is this related to #8242?

sameerz · 2024-05-01T07:06:02Z

@mattahrens @sameerz Is this related to #8242?

Yes it is related. We do not need to test with Alluxio, but with filecache.

SurajAralihalli added bug Something isn't working ? - Needs Triage Need team to review and classify labels Jan 29, 2024

mattahrens removed the ? - Needs Triage Need team to review and classify label Jan 30, 2024

sameerz assigned jlowe Feb 2, 2024

tgravescs mentioned this issue Mar 8, 2024

[BUG] Can't access Unity Catalogue data on Databricks AWS cluster #10566

Open

razajafri self-assigned this Apr 18, 2024

razajafri mentioned this issue Apr 30, 2024

Fix Authorization Failure While Reading Tables From Unity Catalog [databricks] #10756

Merged

razajafri closed this as completed in #10756 May 1, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[BUG] `fs.azure.account.keyInvalid` configuration issue while reading from Unity Catalog Tables on Azure DB #10318

[BUG] `fs.azure.account.keyInvalid` configuration issue while reading from Unity Catalog Tables on Azure DB #10318

SurajAralihalli commented Jan 29, 2024 •

edited

Loading

jlowe commented Feb 1, 2024

SurajAralihalli commented Feb 1, 2024

razajafri commented Apr 30, 2024

sameerz commented May 1, 2024

[BUG] fs.azure.account.keyInvalid configuration issue while reading from Unity Catalog Tables on Azure DB #10318

[BUG] fs.azure.account.keyInvalid configuration issue while reading from Unity Catalog Tables on Azure DB #10318

Comments

SurajAralihalli commented Jan 29, 2024 • edited Loading

jlowe commented Feb 1, 2024

SurajAralihalli commented Feb 1, 2024

razajafri commented Apr 30, 2024

sameerz commented May 1, 2024

[BUG] `fs.azure.account.keyInvalid` configuration issue while reading from Unity Catalog Tables on Azure DB #10318

[BUG] `fs.azure.account.keyInvalid` configuration issue while reading from Unity Catalog Tables on Azure DB #10318

SurajAralihalli commented Jan 29, 2024 •

edited

Loading