Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

AzDatalake Support #21387

Merged
merged 33 commits into from
Aug 16, 2023
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
33 commits
Select commit Hold shift + click to select a range
ebf6f91
[AzDatalake] Generated Layer + mod file (#20725)
tasherif-msft May 3, 2023
8fd8eda
Merge branch 'main' into feature/azdatalake
tasherif-msft May 11, 2023
a93ed52
Merge main into datalake feature branch (#20833)
tasherif-msft May 11, 2023
c94fa00
Merge remote-tracking branch 'upstream/feature/azdatalake' into featu…
tasherif-msft May 11, 2023
bcd2b48
AzDatalake APIView (#20757)
tasherif-msft Jun 12, 2023
fc0b2b5
Merge remote-tracking branch 'upstream/feature/azdatalake' into featu…
tasherif-msft Jun 12, 2023
c09de0b
[Datalake] SAS Support (#21019)
tasherif-msft Jun 19, 2023
6fb1694
Merge remote-tracking branch 'upstream/feature/azdatalake' into featu…
tasherif-msft Jun 19, 2023
47082af
[Datalake] Client Constructors (#21034)
tasherif-msft Jun 26, 2023
4f7fe43
Merge remote-tracking branch 'upstream/feature/azdatalake' into featu…
tasherif-msft Jun 26, 2023
6fe421e
[AzDatalake] Filesystem Client Implementation + Test Framework (#21067)
tasherif-msft Jul 4, 2023
3dac9d0
Merge remote-tracking branch 'upstream/feature/azdatalake' into featu…
tasherif-msft Jul 4, 2023
79f06e1
Feedback + Service Client (#21096)
tasherif-msft Jul 7, 2023
a0a861b
Merge remote-tracking branch 'upstream/feature/azdatalake' into featu…
tasherif-msft Jul 7, 2023
9a351b9
[AzDatalake] File Client Support (#21141)
tasherif-msft Jul 19, 2023
124e27e
Merge remote-tracking branch 'upstream/feature/azdatalake' into featu…
tasherif-msft Jul 19, 2023
68d465f
[AzDatalake] Cleanup + Improvements (#21222)
tasherif-msft Jul 24, 2023
0f5a52c
Merge remote-tracking branch 'upstream/feature/azdatalake' into featu…
tasherif-msft Jul 24, 2023
ad288fb
[AzDatalake] File Client Upload/Download Support (#21261)
tasherif-msft Jul 27, 2023
81dabb1
Merge remote-tracking branch 'upstream/feature/azdatalake' into featu…
tasherif-msft Jul 27, 2023
757507a
[AzDatalake] Directory Client Implementation (#21283)
tasherif-msft Jul 31, 2023
d87e78b
Merge remote-tracking branch 'upstream/feature/azdatalake' into featu…
tasherif-msft Jul 31, 2023
58726b9
[AzDatalake] Lease Clients Implementation (#21297)
tasherif-msft Aug 1, 2023
1628f26
Merge remote-tracking branch 'upstream/feature/azdatalake' into featu…
tasherif-msft Aug 1, 2023
8fd0022
[AzDatalake] Pipelines + cleanup (#21298)
tasherif-msft Aug 10, 2023
e043b9b
Merge remote-tracking branch 'upstream/feature/azdatalake' into featu…
tasherif-msft Aug 10, 2023
40f7a7f
Refactor handling of setting file expiry policy (#21378)
jhendrixMSFT Aug 16, 2023
8aac485
Merge remote-tracking branch 'upstream/feature/azdatalake' into featu…
tasherif-msft Aug 16, 2023
d366659
[AzDatalake] APIView Feedback + Samples + Doc + CI/Live issues cleanu…
tasherif-msft Aug 16, 2023
be2950b
Merge remote-tracking branch 'upstream/feature/azdatalake' into featu…
tasherif-msft Aug 16, 2023
513b43a
Merge branch 'main' into feature/azdatalake
tasherif-msft Aug 16, 2023
fd63ad0
Merge main into Datalake branch (#21386)
tasherif-msft Aug 16, 2023
ac1cc5a
Merge remote-tracking branch 'upstream/feature/azdatalake' into featu…
tasherif-msft Aug 16, 2023
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 4 additions & 0 deletions eng/config.json
Original file line number Diff line number Diff line change
Expand Up @@ -32,6 +32,10 @@
"Name": "azqueue",
"CoverageGoal": 0.60
},
{
"Name": "azdatalake",
"CoverageGoal": 0.60
},
{
"Name": "azfile",
"CoverageGoal": 0.75
Expand Down
7 changes: 7 additions & 0 deletions sdk/storage/azdatalake/CHANGELOG.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,7 @@
# Release History

## 0.1.0-beta.1 (2023-08-16)

### Features Added

* This is the initial preview release of the `azdatalake` library
21 changes: 21 additions & 0 deletions sdk/storage/azdatalake/LICENSE.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1,21 @@
MIT License

Copyright (c) Microsoft Corporation. All rights reserved.

Permission is hereby granted, free of charge, to any person obtaining a copy
of this software and associated documentation files (the "Software"), to deal
in the Software without restriction, including without limitation the rights
to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
copies of the Software, and to permit persons to whom the Software is
furnished to do so, subject to the following conditions:

The above copyright notice and this permission notice shall be included in all
copies or substantial portions of the Software.

THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
SOFTWARE
280 changes: 280 additions & 0 deletions sdk/storage/azdatalake/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,280 @@
# ADLS Gen2 Storage SDK for Go

> Service Version: 2020-10-02

Azure Data Lake Storage Gen2 (ADLS Gen2) is Microsoft's hierarchical object storage solution for the cloud with converged capabilities with Azure Blob Storage.
For example, Data Lake Storage Gen2 provides file system semantics, file-level security, and scale.
Because these capabilities are built on Blob storage, you also get low-cost, tiered storage, with high availability/disaster recovery capabilities.
ADLS Gen2 makes Azure Storage the foundation for building enterprise data lakes on Azure.
Designed from the start to service multiple petabytes of information while sustaining hundreds of gigabits of throughput, ADLS Gen2 allows you to easily manage massive amounts of data.

[Source code][source] | [API reference documentation][docs] | [REST API documentation][rest_docs]

## Getting started

### Install the package

Install the ADLS Gen2 Storage SDK for Go with [go get][goget]:

```Powershell
go get github.com/Azure/azure-sdk-for-go/sdk/storage/azdatalake
```

If you're going to authenticate with Azure Active Directory (recommended), install the [azidentity][azidentity] module.
```Powershell
go get github.com/Azure/azure-sdk-for-go/sdk/azidentity
```

### Prerequisites

A supported [Go][godevdl] version (the Azure SDK supports the two most recent Go releases).

You need an [Azure subscription][azure_sub] and a
[Storage Account][storage_account_docs] to use this package.

To create a new Storage Account, you can use the [Azure Portal][storage_account_create_portal],
[Azure PowerShell][storage_account_create_ps], or the [Azure CLI][storage_account_create_cli].
Here's an example using the Azure CLI:

```Powershell
az storage account create --name MyStorageAccount --resource-group MyResourceGroup --location westus --sku Standard_LRS
```

### Authenticate the client

In order to interact with the ADLS Gen2 Storage service, you'll need to create an instance of the `Client` type. The [azidentity][azidentity] module makes it easy to add Azure Active Directory support for authenticating Azure SDK clients with their corresponding Azure services.

```go
// create a credential for authenticating with Azure Active Directory
cred, err := azidentity.NewDefaultAzureCredential(nil)
// TODO: handle err

// create a service.Client for the specified storage account that uses the above credential
client, err := service.NewClient("https://MYSTORAGEACCOUNT.dfs.core.windows.net/", cred, nil)
// TODO: handle err
// you can also create filesystem, file and directory clients
```

Learn more about enabling Azure Active Directory for authentication with Azure Storage in [our documentation][storage_ad] and [our samples](#next-steps).

## Key concepts

ADLS Gen2 provides:
- Hadoop-compatible access
- Hierarchical directory structure
- Optimized cost and performance
- Finer grain security model
- Massive scalability

ADLS Gen2 storage is designed for:

- Serving images or documents directly to a browser.
- Storing files for distributed access.
- Streaming video and audio.
- Writing to log files.
- Storing data for backup and restore, disaster recovery, and archiving.
- Storing data for analysis by an on-premises or Azure-hosted service.

ADLS Gen2 storage offers three types of resources:

- The _storage account_
- One or more _filesystems_ in a storage account
- One or more _files_ or _directories_ in a filesystem

Instances of the `Client` type provide methods for manipulating filesystems and paths within a storage account.
The storage account is specified when the `Client` is constructed. The clients available are referenced below.
Use the appropriate client constructor function for the authentication mechanism you wish to use.

### Goroutine safety
We guarantee that all client instance methods are goroutine-safe and independent of each other ([guideline](https://azure.github.io/azure-sdk/golang_introduction.html#thread-safety)). This ensures that the recommendation of reusing client instances is always safe, even across goroutines.

### About metadata
ADLS Gen2 metadata name/value pairs are valid HTTP headers and should adhere to all restrictions governing HTTP headers. Metadata names must be valid HTTP header names, may contain only ASCII characters, and should be treated as case-insensitive. Base64-encode or URL-encode metadata values containing non-ASCII characters.

### Additional concepts
<!-- CLIENT COMMON BAR -->
[Client options](https://pkg.go.dev/github.com/Azure/azure-sdk-for-go/sdk/azcore/policy#ClientOptions) |
[Accessing the response](https://pkg.go.dev/github.com/Azure/azure-sdk-for-go/sdk/azcore/runtime#WithCaptureResponse) |
[Handling failures](https://pkg.go.dev/github.com/Azure/azure-sdk-for-go/sdk/azcore#ResponseError) |
[Logging](https://pkg.go.dev/github.com/Azure/azure-sdk-for-go/sdk/azcore/log)
<!-- CLIENT COMMON BAR -->

## Examples

### Creating and uploading a file (assuming filesystem exists)

```go
const (
path = "https://MYSTORAGEACCOUNT.dfs.core.windows.net/sample-fs/sample-file"
)

// authenticate with Azure Active Directory
cred, err := azidentity.NewDefaultAzureCredential(nil)
// TODO: handle error

// create a client for the specified storage account
client, err := file.NewClient(path, cred, nil)
// TODO: handle error

_, err = client.Create(context.TODO(), nil)
// TODO: handle error

// open the file for reading
fh, err := os.OpenFile(sampleFile, os.O_RDONLY, 0)
// TODO: handle error
defer fh.Close()

// upload the file to the specified filesystem with the specified file name
_, err = client.UploadFile(context.TODO(), fh, nil)
// TODO: handle error
```

### Downloading a file

```go
const (
path = "https://MYSTORAGEACCOUNT.dfs.core.windows.net/sample-fs/cloud.jpg"
)

// authenticate with Azure Active Directory
cred, err := azidentity.NewDefaultAzureCredential(nil)
// TODO: handle error

// create a client for the specified storage account
client, err := file.NewClient(path, cred, nil)
// TODO: handle error

// create or open a local file where we can download the file
file, err := os.Create("cloud.jpg")
// TODO: handle error
defer file.Close()

// download the file
_, err = client.DownloadFile(context.TODO(), file, nil)
// TODO: handle error
```

### Creating and deleting a filesystem

```go
const (
fs = "https://MYSTORAGEACCOUNT.dfs.core.windows.net/sample-fs"
)

// authenticate with Azure Active Directory
cred, err := azidentity.NewDefaultAzureCredential(nil)
// TODO: handle error

// create a client for the specified storage account
client, err := filesystem.NewClient(fs, cred, nil)
// TODO: handle error

_, err = client.Create(context.TODO(), nil)
// TODO: handle error

_, err = client.Delete(context.TODO(), nil)
// TODO: handle error
```

### Enumerating paths (assuming filesystem exists)

```go
const (
fs = "https://MYSTORAGEACCOUNT.dfs.core.windows.net/sample-fs"
)

// authenticate with Azure Active Directory
cred, err := azidentity.NewDefaultAzureCredential(nil)
// TODO: handle error

// create a filesystem client for the specified storage account
client, err := filesystem.NewClient(fs, cred, nil)
// TODO: handle error

// path listings are returned across multiple pages
pager := client.NewListPathsPager(true, nil)

// continue fetching pages until no more remain
for pager.More() {
// advance to the next page
page, err := pager.NextPage(context.TODO())
// TODO: handle error

// print the path names for this page
for _, path := range page.PathList.Paths {
fmt.Println(*path.Name)
fmt.Println(*path.IsDirectory)
}
}
```

## Troubleshooting

All Datalake service operations will return an
[*azcore.ResponseError][azcore_response_error] on failure with a
populated `ErrorCode` field. Many of these errors are recoverable.
The [datalakeerror][datalake_error] package provides the possible Storage error codes
along with various helper facilities for error handling.


### Specialized clients

The ADLS Gen2 Storage SDK for Go provides specialized clients in various subpackages.

The [file][file] package contains APIs related to file path types.

The [directory][directory] package contains APIs related to directory path types.

The [lease][lease] package contains clients for managing leases on paths (paths represent both directory and file paths) and filesystems. Please see the [reference docs](https://docs.microsoft.com/rest/api/storageservices/lease-blob#remarks) for general information on leases.

The [filesystem][filesystem] package contains APIs specific to filesystems. This includes APIs setting access policies or properties, and more.

The [service][service] package contains APIs specific to Datalake service. This includes APIs for manipulating filesystems, retrieving account information, and more.

The [sas][sas] package contains utilities to aid in the creation and manipulation of Shared Access Signature tokens.
See the package's documentation for more information.


You can find additional context and examples in our samples for each subpackage (named examples_test.go).

## Contributing

See the [Storage CONTRIBUTING.md][storage_contrib] for details on building,
testing, and contributing to this library.

This project welcomes contributions and suggestions. Most contributions require
you to agree to a Contributor License Agreement (CLA) declaring that you have
the right to, and actually do, grant us the rights to use your contribution. For
details, visit [cla.microsoft.com][cla].

This project has adopted the [Microsoft Open Source Code of Conduct][coc].
For more information see the [Code of Conduct FAQ][coc_faq]
or contact [[email protected]][coc_contact] with any
additional questions or comments.

<!-- LINKS -->
[source]: https://github.com/Azure/azure-sdk-for-go/tree/main/sdk/storage/azdatalake
[docs]: https://pkg.go.dev/github.com/Azure/azure-sdk-for-go/sdk/storage/azdatalake
[rest_docs]: https://docs.microsoft.com/rest/api/storageservices/data-lake-storage-gen2
[godevdl]: https://go.dev/dl/
[goget]: https://pkg.go.dev/cmd/go#hdr-Add_dependencies_to_current_module_and_install_them
[storage_account_docs]: https://docs.microsoft.com/azure/storage/common/storage-account-overview
[storage_account_create_ps]: https://docs.microsoft.com/azure/storage/common/storage-quickstart-create-account?tabs=azure-powershell
[storage_account_create_cli]: https://docs.microsoft.com/azure/storage/common/storage-quickstart-create-account?tabs=azure-cli
[storage_account_create_portal]: https://docs.microsoft.com/azure/storage/common/storage-quickstart-create-account?tabs=azure-portal
[azure_sub]: https://azure.microsoft.com/free/
[azidentity]: https://pkg.go.dev/github.com/Azure/azure-sdk-for-go/sdk/azidentity
[storage_ad]: https://docs.microsoft.com/azure/storage/common/storage-auth-aad
[azcore_response_error]: https://pkg.go.dev/github.com/Azure/azure-sdk-for-go/sdk/azcore#ResponseError
[datalake_error]: https://github.com/Azure/azure-sdk-for-go/tree/main/sdk/storage/azdatalake/datalakeerror/error_codes.go
[filesystem]: https://github.com/Azure/azure-sdk-for-go/tree/main/sdk/storage/azdatalake/filesystem/client.go
[lease]: https://github.com/Azure/azure-sdk-for-go/tree/main/sdk/storage/azdatalake/lease
[file]: https://github.com/Azure/azure-sdk-for-go/tree/main/sdk/storage/azdatalake/file/client.go
[directory]: https://github.com/Azure/azure-sdk-for-go/tree/main/sdk/storage/azdatalake/directory/client.go
[sas]: https://github.com/Azure/azure-sdk-for-go/tree/main/sdk/storage/azdatalake/sas
[service]: https://github.com/Azure/azure-sdk-for-go/tree/main/sdk/storage/azdatalake/service/client.go
[storage_contrib]: https://github.com/Azure/azure-sdk-for-go/blob/main/CONTRIBUTING.md
[cla]: https://cla.microsoft.com
[coc]: https://opensource.microsoft.com/codeofconduct/
[coc_faq]: https://opensource.microsoft.com/codeofconduct/faq/
[coc_contact]: mailto:[email protected]
6 changes: 6 additions & 0 deletions sdk/storage/azdatalake/assets.json
Original file line number Diff line number Diff line change
@@ -0,0 +1,6 @@
{
"AssetsRepo": "Azure/azure-sdk-assets",
"AssetsRepoPrefixPath": "go",
"TagPrefix": "go/storage/azdatalake",
"Tag": "go/storage/azdatalake_c3c16cffab"
}
28 changes: 28 additions & 0 deletions sdk/storage/azdatalake/ci.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,28 @@
trigger:
branches:
include:
- main
- feature/*
- hotfix/*
- release/*
paths:
include:
- sdk/storage/azdatalake

pr:
branches:
include:
- main
- feature/*
- hotfix/*
- release/*
paths:
include:
- sdk/storage/azdatalake


stages:
- template: /eng/pipelines/templates/jobs/archetype-sdk-client.yml
parameters:
ServiceDirectory: 'storage/azdatalake'
RunLiveTests: true
Loading