The following is a summary of the various ways to connect to Blob Storage and Azure Data Lake Gen2 from Azure Databricks.
To download all sample notebooks, here is the DBC archive you can import to your workspace.
How to connect | Scope of connection | Authentication | Authorization Requirements | Code Sample | Docs/Supported Storage |
---|---|---|---|---|---|
Direct connect | Typicaly SparkSession* | Storage Key | All rights | Python, SQL | Blob |
OAuth via Service Principal (SP) | **SP has correct RBAC role assigned OR ACLs permissions to files/folders in ADLS Gen2 | Python, SQL | ADLS Gen2 | ||
AD Passthrough | **User has correct RBAC role assigned OR ACLs permissions to files/folders in ADLS Gen2 | Python, SQL | ADLS Gen2 | ||
Mount on DBFS | Databricks Workspace | Storage Key | All rights | Python | Blob, ADLS Gen2 |
OAuth via Service Principal (SP) | **SP has correct RBAC role assigned OR ACLs permissions to files/folders in ADLS Gen2 | Python | ADLS Gen2 | ||
--- | --- | --- | --- | --- | --- |
*This will depend on where Spark Configuration is set. This is typically set on the SparkSession of the running notebook and therefore scoped to only that SparkSession.
**IMPORTANT NOTE on Authorization requirements
You need to assign specifically either of the following RBAC roles to the Service Principal or User. See here for more information.
- Storage Blob Data Owner
- Storage Blob Data Contributor
- Storage Blob Data Reader
NOTE: Owner/Contributor role is insufficient.
For more granular access control, you can use use ACLs on folders/files in the ADLS Gen2 Filesystem.
All examples do not make sure of Azure Databricks secrets for simplicity.
Azure Databricks Secrets is the recommended way to store sensitive information in Azure Databricks. Essentially, you create Secret Scopes where you can store secrets in. Permissions are managed at the Secret Scope level. Users with the correct permission to a particular scope can retrieve secrets within it.
There are two types of Secret Scopes: