From 7b469bc680f4b104a3673b2d146d706a602e12f9 Mon Sep 17 00:00:00 2001 From: Srinivasulu Punuru Date: Mon, 19 Jul 2021 14:04:43 -0700 Subject: [PATCH] Documentation for ABFS --- .../docs/deployment/filesystems/azure.md | 41 +++++++++++++++++-- .../docs/deployment/filesystems/overview.md | 2 +- 2 files changed, 39 insertions(+), 4 deletions(-) diff --git a/docs/content/docs/deployment/filesystems/azure.md b/docs/content/docs/deployment/filesystems/azure.md index dcb20387ab0ca..b26c22ee381f4 100644 --- a/docs/content/docs/deployment/filesystems/azure.md +++ b/docs/content/docs/deployment/filesystems/azure.md @@ -30,13 +30,31 @@ under the License. [Azure Blob Storage](https://docs.microsoft.com/en-us/azure/storage/) is a Microsoft-managed service providing cloud storage for a variety of use cases. You can use Azure Blob Storage with Flink for **reading** and **writing data** as well in conjunction with the [streaming **state backends**]({{< ref "docs/ops/state/state_backends" >}}) +Flink supports accessing Azure Blob Storage using both [wasb://](https://hadoop.apache.org/docs/stable/hadoop-azure/index.html) or [abfs://](https://hadoop.apache.org/docs/stable/hadoop-azure/abfs.html). + +{{< hint info >}} +Azure recommends using abfs:// for accessing ADLS Gen2 storage accounts even though wasb:// works through backward compatibility. +{{< /hint >}} + +{{< hint warning >}} +abfs:// can be used for accessing the ADLS Gen2 storage accounts only. Please visit Azure documentation on how to identify ADLS Gen2 storage account. +{{< /hint >}} + + You can use Azure Blob Storage objects like regular files by specifying paths in the following format: ```plain +// WASB unencrypted access wasb://@$.blob.core.windows.net/ -// SSL encrypted access +// WASB SSL encrypted access wasbs://@$.blob.core.windows.net/ + +// ABFS unecrypted access +abfs://@$.dfs.core.windows.net/ + +// ABFS SSL encrypted access +abfss://@$.dfs.core.windows.net/ ``` See below for how to use Azure Blob Storage in a Flink job: @@ -63,9 +81,11 @@ cp ./opt/flink-azure-fs-hadoop-{{< version >}}.jar ./plugins/azure-fs-hadoop/ `flink-azure-fs-hadoop` registers default FileSystem wrappers for URIs with the *wasb://* and *wasbs://* (SSL encrypted access) scheme. -### Credentials Configuration +## Credentials Configuration + +### WASB -Hadoop's Azure Filesystem supports configuration of credentials via the Hadoop configuration as +Hadoop's WASB Azure Filesystem supports configuration of credentials via the Hadoop configuration as outlined in the [Hadoop Azure Blob Storage documentation](https://hadoop.apache.org/docs/current/hadoop-azure/index.html#Configuring_Credentials). For convenience Flink forwards all Flink configurations with a key prefix of `fs.azure` to the Hadoop configuration of the filesystem. Consequentially, the azure blob storage key can be configured @@ -83,4 +103,19 @@ environment variable `AZURE_STORAGE_KEY` by setting the following configuration fs.azure.account.keyprovider..blob.core.windows.net: org.apache.flink.fs.azurefs.EnvironmentVariableKeyProvider ``` +### ABFS + +Hadoop's ABFS Azure Filesystem supports several ways of configuring authentication. Please visit the [Hadoop ABFS documentation](https://hadoop.apache.org/docs/stable/hadoop-azure/abfs.html#Authentication) documentation on how to configure. + +{{< hint info >}} +Azure recommends using Azure managed identity to access the ADLS Gen2 storage accounts using abfs. Details on how to do this are beyond the scope of this documentation, please refer to the Azure documentation for more details. +{{< /hint >}} + +##### Accessing ABFS using storage Keys (Discouraged) +Azure blob storage key can be configured in `flink-conf.yaml` via: + +```yaml +fs.azure.account.key..dfs.core.windows.net: +``` + {{< top >}} diff --git a/docs/content/docs/deployment/filesystems/overview.md b/docs/content/docs/deployment/filesystems/overview.md index aec8b539c9d89..82ecaf9f28287 100644 --- a/docs/content/docs/deployment/filesystems/overview.md +++ b/docs/content/docs/deployment/filesystems/overview.md @@ -56,7 +56,7 @@ The Apache Flink project supports the following file systems: - **[Aliyun Object Storage Service]({{< ref "docs/deployment/filesystems/oss" >}})** is supported by `flink-oss-fs-hadoop` and registered under the *oss://* URI scheme. The implementation is based on the [Hadoop Project](https://hadoop.apache.org/) but is self-contained with no dependency footprint. - - **[Azure Blob Storage]({{< ref "docs/deployment/filesystems/azure" >}})** is supported by `flink-azure-fs-hadoop` and registered under the *wasb(s)://* URI schemes. + - **[Azure Blob Storage]({{< ref "docs/deployment/filesystems/azure" >}})** is supported by `flink-azure-fs-hadoop` and registered under the *abfs(s)://* and *wasb(s)://* URI schemes. The implementation is based on the [Hadoop Project](https://hadoop.apache.org/) but is self-contained with no dependency footprint. - **[Google Cloud Storage]({{< ref "docs/deployment/filesystems/gcs" >}})** is supported by `gcs-connector` and registered under the *gs://* URI scheme.