Skip to content

Commit

Permalink
Merge pull request #1472 from Azure/dev
Browse files Browse the repository at this point in the history
Release 10.11
  • Loading branch information
zezha-msft authored Jun 15, 2021
2 parents 067b2fc + 151eec3 commit 87cba06
Show file tree
Hide file tree
Showing 52 changed files with 1,798 additions and 336 deletions.
16 changes: 16 additions & 0 deletions ChangeLog.md
Original file line number Diff line number Diff line change
@@ -1,6 +1,22 @@

# Change Log

## Version 10.11.0

### New features
1. Improved performance for copying small blobs (with size less than `256MiB`) with [Put Blob from URL](https://docs.microsoft.com/en-us/rest/api/storageservices/put-blob-from-url).
1. Added mirror mode support in sync operation via `mirror-mode` flag. The new mode disables last-modified-time based comparisons and overwrites the conflicting files and blobs at the destination if this flag is set to true.
1. Added flag `disable-auto-decoding` to avoid automatic decoding of URL-encoded illegal characters when uploading from Windows. These illegal characters could have encoded as a result of downloading them onto Windows which does not support them.
1. Support custom mime type mapping via environment variable `AZCOPY_CONTENT_TYPE_MAP`.
1. Output message on the CLI when AzCopy detects a proxy for each domain.
1. Interpret DFS endpoints as Blob endpoint automatically when performing service-to-service copy.

### Bug fixes
1. Tolerate enumeration errors for Azure Files and not fail the entire job when a directory is deleted/modified during scanning.
1. Log skipped transfers to the scanning log.
1. Fixed pipe upload by adding missing fields such as Metadata, Blob Index Tags, Client Provided Key, Blob Access Tier, etc.
1. Fixed issue of clean up for the benchmark command.

## Version 10.10.0

### New features
Expand Down
107 changes: 88 additions & 19 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,6 +1,7 @@
# AzCopy v10

AzCopy v10 is a command-line utility that you can use to copy data to and from containers and file shares in Azure Storage accounts. AzCopy V10 presents easy-to-use commands that are optimized for performance.
AzCopy v10 is a command-line utility that you can use to copy data to and from containers and file shares in Azure Storage accounts.
AzCopy V10 presents easy-to-use commands that are optimized for high performance and throughput.

## Features and capabilities

Expand All @@ -12,33 +13,93 @@ AzCopy v10 is a command-line utility that you can use to copy data to and from c

:white_check_mark: Download files and directories.

:white_check_mark: Copy containers, directories and blobs between storage accounts (Blobs only).
:white_check_mark: Copy containers, directories and blobs between storage accounts (Service to Service).

:white_check_mark: Synchronize containers with local file systems and visa versa (Blobs only).
:white_check_mark: Synchronize data between Local <=> Blob Storage, Blob Storage <=> File Storage, and Local <=> File Storage.

:white_check_mark: Copy objects, directories, and buckets from Amazon Web Services (AWS) (Blobs only).
:white_check_mark: Delete blobs or files from an Azure storage account

:white_check_mark: List files in a container (Blobs only).
:white_check_mark: Copy objects, directories, and buckets from Amazon Web Services (AWS) to Azure Blob Storage (Blobs only).

:white_check_mark: Remove files from a container (Blobs only).
:white_check_mark: Copy objects, directories, and buckets from Google Cloud Platform (GCP) to Azure Blob Storage (Blobs only).

:white_check_mark: List files in a container.

:white_check_mark: Recover from failures by restarting previous jobs.

## Download AzCopy
The latest binary for AzCopy along with installation instructions may be found
[here](https://docs.microsoft.com/en-us/azure/storage/common/storage-use-azcopy-v10).

## Find help

For complete guidance, visit any of these articles on the docs.microsoft.com website.

:eight_spoked_asterisk: [Get started with AzCopy (download links here)](https://docs.microsoft.com/azure/storage/common/storage-use-azcopy-v10)

:eight_spoked_asterisk: [Transfer data with AzCopy and blob storage](https://docs.microsoft.com/azure/storage/common/storage-use-azcopy-blobs)
:eight_spoked_asterisk: [Upload files to Azure Blob storage by using AzCopy](https://docs.microsoft.com/en-us/azure/storage/common/storage-use-azcopy-blobs-upload)

:eight_spoked_asterisk: [Download blobs from Azure Blob storage by using AzCopy](https://docs.microsoft.com/en-us/azure/storage/common/storage-use-azcopy-blobs-download)

:eight_spoked_asterisk: [Copy blobs between Azure storage accounts by using AzCopy](https://docs.microsoft.com/en-us/azure/storage/common/storage-use-azcopy-blobs-copy)

:eight_spoked_asterisk: [Synchronize between Local File System/Azure Blob Storage (Gen1)/Azure File Storage by using AzCopy](https://docs.microsoft.com/en-us/azure/storage/common/storage-use-azcopy-blobs-synchronize)

:eight_spoked_asterisk: [Transfer data with AzCopy and file storage](https://docs.microsoft.com/en-us/azure/storage/common/storage-use-azcopy-files)

:eight_spoked_asterisk: [Transfer data with AzCopy and file storage](https://docs.microsoft.com/azure/storage/common/storage-use-azcopy-files)
:eight_spoked_asterisk: [Transfer data with AzCopy and Amazon S3 buckets](https://docs.microsoft.com/en-us/azure/storage/common/storage-use-azcopy-s3)

:eight_spoked_asterisk: [Transfer data with AzCopy and Amazon S3 buckets](https://docs.microsoft.com/azure/storage/common/storage-use-azcopy-s3)
:eight_spoked_asterisk: [Transfer data with AzCopy and Google GCP buckets](https://docs.microsoft.com/en-us/azure/storage/common/storage-use-azcopy-google-cloud)

:eight_spoked_asterisk: [Use data transfer tools in Azure Stack Hub Storage](https://docs.microsoft.com/en-us/azure-stack/user/azure-stack-storage-transfer)

:eight_spoked_asterisk: [Configure, optimize, and troubleshoot AzCopy](https://docs.microsoft.com/azure/storage/common/storage-use-azcopy-configure)

### Find help from your command prompt
:eight_spoked_asterisk: [AzCopy WiKi](https://github.com/Azure/azure-storage-azcopy/wiki)

## Supported Operations

The general format of the AzCopy commands is: `azcopy [command] [arguments] --[flag-name]=[flag-value]`

* `bench` - Runs a performance benchmark by uploading or downloading test data to or from a specified destination

* `copy` - Copies source data to a destination location. The supported directions are:
- Local File System <-> Azure Blob (SAS or OAuth authentication)
- Local File System <-> Azure Files (Share/directory SAS authentication)
- Local File System <-> Azure Data Lake Storage (ADLS Gen2) (SAS, OAuth, or SharedKey authentication)
- Azure Blob (SAS or public) -> Azure Blob (SAS or OAuth authentication)
- Azure Blob (SAS or public) -> Azure Files (SAS)
- Azure Files (SAS) -> Azure Files (SAS)
- Azure Files (SAS) -> Azure Blob (SAS or OAuth authentication)
- AWS S3 (Access Key) -> Azure Block Blob (SAS or OAuth authentication)
- Google Cloud Storage (Service Account Key) -> Azure Block Blob (SAS or OAuth authentication) [Preview]

* `sync` - Replicate source to the destination location. The supported directions are:
- Local File System <-> Azure Blob (SAS or OAuth authentication)
- Local File System <-> Azure Files (Share/directory SAS authentication)
- Azure Blob (SAS or public) -> Azure Files (SAS)

* `login` - Log in to Azure Active Directory (AD) to access Azure Storage resources.

* `logout` - Log out to terminate access to Azure Storage resources.

* `list` - List the entities in a given resource

* `doc` - Generates documentation for the tool in Markdown format

* `env` - Shows the environment variables that you can use to configure the behavior of AzCopy.

* `help` - Help about any command

* `jobs` - Sub-commands related to managing jobs

* `load` - Sub-commands related to transferring data in specific formats

* `make` - Create a container or file share.

* `remove` - Delete blobs or files from an Azure storage account

## Find help from your command prompt

For convenience, consider adding the AzCopy directory location to your system path for ease of use. That way you can type `azcopy` from any directory on your system.

Expand All @@ -54,22 +115,30 @@ If you choose not to add AzCopy to your path, you'll have to change directories

### What is the difference between `sync` and `copy`?

The `copy` command is a simple transferring operation, it scans the source and attempts to transfer every single file/blob. The supported source/destination pairs are listed in the help message of the tool. On the other hand, `sync` makes sure that whatever is present in the source will be replicated to the destination. If your goal is to simply move some files, then `copy` is definitely the right command, since it offers much better performance.
* The `copy` command is a simple transferring operation. It scans/enumerates the source and attempts to transfer every single file/blob present on the source to the destination.
The supported source/destination pairs are listed in the help message of the tool.

* On the other hand, `sync` scans/enumerates both the source, and the destination to find the incremental change.
It makes sure that whatever is present in the source will be replicated to the destination. For `sync`,

For `sync`, last modified times are used to determine whether to transfer the same file present at both the source and the destination. If the use case is to incrementally transfer data
then `sync` is the better choice, since only the modified/missing files are transferred.
* If your goal is to simply move some files, then `copy` is definitely the right command, since it offers much better performance.
If the use case is to incrementally transfer data (files present only on source) then `sync` is the better choice, since only the modified/missing files will be transferred.
Since `sync` enumerates both source and destination to find the incremental change, it is relatively slower as compared to `copy`

### Will `copy` overwrite my files?

By default, AzCopy will overwrite the files at the destination if they already exist. To avoid this behavior, please use the flag `--overwrite=false`.

### Will 'sync' delete files in the destination if they no longer exist in the source location?
### Will `sync` overwrite my files?

By default, the 'sync' command doesn't delete files in the destination unless you use an optional flag with the command. To learn more, see [Synchronize files](https://docs.microsoft.com/azure/storage/common/storage-use-azcopy-blobs#synchronize-files).
By default, AzCopy `sync` use last-modified-time to determine whether to transfer the same file present at both the source, and the destination.
i.e, If the source file is newer compared to the destination file, we overwrite the destination
You can change this default behaviour and overwrite files at the destination by using the flag `--mirror-mode=true`

## Download AzCopy
The latest binary install for AzCopy along with installation instructions may be found
[here](https://docs.microsoft.com/en-us/azure/storage/common/storage-use-azcopy-v10).
### Will 'sync' delete files in the destination if they no longer exist in the source location?

By default, the 'sync' command doesn't delete files in the destination unless you use an optional flag with the command.
To learn more, see [Synchronize files](https://docs.microsoft.com/en-us/azure/storage/common/storage-use-azcopy-blobs-synchronize).

## How to contribute to AzCopy v10

Expand All @@ -83,4 +152,4 @@ provided by the bot. You will only need to do this once across all repos using o

This project has adopted the [Microsoft Open Source Code of Conduct](https://opensource.microsoft.com/codeofconduct/).
For more information see the [Code of Conduct FAQ](https://opensource.microsoft.com/codeofconduct/faq/) or
contact [[email protected]](mailto:[email protected]) with any additional questions or comments.
contact [[email protected]](mailto:[email protected]) with any additional questions or comments.
2 changes: 2 additions & 0 deletions azure-pipelines.yml
Original file line number Diff line number Diff line change
Expand Up @@ -101,6 +101,8 @@ jobs:
env:
AZCOPY_E2E_ACCOUNT_KEY: $(AZCOPY_E2E_ACCOUNT_KEY)
AZCOPY_E2E_ACCOUNT_NAME: $(AZCOPY_E2E_ACCOUNT_NAME)
CPK_ENCRYPTION_KEY: $(CPK_ENCRYPTION_KEY)
CPK_ENCRYPTION_KEY_SHA256: $(CPK_ENCRYPTION_KEY_SHA256)
- script: |
go build -o test-validator ./testSuite/
mkdir test-temp
Expand Down
50 changes: 48 additions & 2 deletions cmd/copy.go
Original file line number Diff line number Diff line change
Expand Up @@ -147,6 +147,9 @@ type rawCopyCmdArgs struct {
// whether to include blobs that have metadata 'hdi_isfolder = true'
includeDirectoryStubs bool

// whether to disable automatic decoding of illegal chars on Windows
disableAutoDecoding bool

// Optional flag to encrypt user data with user provided key.
// Key is provide in the REST request itself
// Provided key (EncryptionKey and EncryptionKeySHA256) and its hash will be fetched from environment variables
Expand Down Expand Up @@ -252,6 +255,20 @@ func (raw rawCopyCmdArgs) cook() (cookedCopyCmdArgs, error) {
azcopyScanningLogger.CloseLog()
})

/* We support DFS by using blob end-point of the account. We replace dfs by blob in src and dst */
if src,dst := inferArgumentLocation(raw.src), inferArgumentLocation(raw.dst);
src == common.ELocation.BlobFS() || dst == common.ELocation.BlobFS() {
if src == common.ELocation.BlobFS() && dst != common.ELocation.Local() {
raw.src = strings.Replace(raw.src, ".dfs", ".blob", 1)
glcm.Info("Switching to use blob endpoint on source account.")
}

if dst == common.ELocation.BlobFS() && src != common.ELocation.Local() {
raw.dst = strings.Replace(raw.dst, ".dfs", ".blob", 1)
glcm.Info("Switching to use blob endpoint on destination account.")
}
}

fromTo, err := validateFromTo(raw.src, raw.dst, raw.fromTo) // TODO: src/dst
if err != nil {
return cooked, err
Expand Down Expand Up @@ -510,6 +527,7 @@ func (raw rawCopyCmdArgs) cook() (cookedCopyCmdArgs, error) {
cooked.noGuessMimeType = raw.noGuessMimeType
cooked.preserveLastModifiedTime = raw.preserveLastModifiedTime
cooked.includeDirectoryStubs = raw.includeDirectoryStubs
cooked.disableAutoDecoding = raw.disableAutoDecoding

if cooked.fromTo.To() != common.ELocation.Blob() && raw.blobTags != "" {
return cooked, errors.New("blob tags can only be set when transferring to blob storage")
Expand Down Expand Up @@ -1054,6 +1072,9 @@ type cookedCopyCmdArgs struct {
// whether to include blobs that have metadata 'hdi_isfolder = true'
includeDirectoryStubs bool

// whether to disable automatic decoding of illegal chars on Windows
disableAutoDecoding bool

cpkOptions common.CpkOptions
}

Expand Down Expand Up @@ -1182,9 +1203,33 @@ func (cca *cookedCopyCmdArgs) processRedirectionUpload(blobResource common.Resou

// step 2: leverage high-level call in Blob SDK to upload stdin in parallel
blockBlobUrl := azblob.NewBlockBlobURL(*u, p)
metadataString := cca.metadata
metadataMap := common.Metadata{}
if len(metadataString) > 0 {
for _, keyAndValue := range strings.Split(metadataString, ";") { // key/value pairs are separated by ';'
kv := strings.Split(keyAndValue, "=") // key/value are separated by '='
metadataMap[kv[0]] = kv[1]
}
}
blobTags := cca.blobTags
bbAccessTier := azblob.DefaultAccessTier
if cca.blockBlobTier != common.EBlockBlobTier.None() {
bbAccessTier = azblob.AccessTierType(cca.blockBlobTier.String())
}
_, err = azblob.UploadStreamToBlockBlob(ctx, os.Stdin, blockBlobUrl, azblob.UploadStreamToBlockBlobOptions{
BufferSize: int(blockSize),
MaxBuffers: pipingUploadParallelism,
BufferSize: int(blockSize),
MaxBuffers: pipingUploadParallelism,
Metadata: metadataMap.ToAzBlobMetadata(),
BlobTagsMap: blobTags.ToAzBlobTagsMap(),
BlobHTTPHeaders: azblob.BlobHTTPHeaders{
ContentType: cca.contentType,
ContentLanguage: cca.contentLanguage,
ContentEncoding: cca.contentEncoding,
ContentDisposition: cca.contentDisposition,
CacheControl: cca.cacheControl,
},
BlobAccessTier: bbAccessTier,
ClientProvidedKeyOptions: common.GetClientProvidedKey(cca.cpkOptions),
})

return err
Expand Down Expand Up @@ -1755,6 +1800,7 @@ func init() {
cpCmd.PersistentFlags().StringVar(&raw.blobTags, "blob-tags", "", "Set tags on blobs to categorize data in your storage account")
cpCmd.PersistentFlags().BoolVar(&raw.s2sPreserveBlobTags, "s2s-preserve-blob-tags", false, "Preserve index tags during service to service transfer from one blob storage to another")
cpCmd.PersistentFlags().BoolVar(&raw.includeDirectoryStubs, "include-directory-stub", false, "False by default to ignore directory stubs. Directory stubs are blobs with metadata 'hdi_isfolder:true'. Setting value to true will preserve directory stubs during transfers.")
cpCmd.PersistentFlags().BoolVar(&raw.disableAutoDecoding, "disable-auto-decoding", false, "False by default to enable automatic decoding of illegal chars on Windows. Can be set to true to disable automatic decoding.")
// s2sGetPropertiesInBackend is an optional flag for controlling whether S3 object's or Azure file's full properties are get during enumerating in frontend or
// right before transferring in ste(backend).
// The traditional behavior of all existing enumerator is to get full properties during enumerating(more specifically listing),
Expand Down
14 changes: 7 additions & 7 deletions cmd/copyEnumeratorInit.go
Original file line number Diff line number Diff line change
Expand Up @@ -232,7 +232,7 @@ func (cca *cookedCopyCmdArgs) initEnumerator(jobPartOrder common.CopyJobPartOrde

if err != nil {
if _, ok := seenFailedContainers[object.containerName]; !ok {
WarnStdoutAndJobLog(fmt.Sprintf("failed to add transfers from container %s as it has an invalid name. Please manually transfer from this container to one with a valid name.", object.containerName))
WarnStdoutAndScanningLog(fmt.Sprintf("failed to add transfers from container %s as it has an invalid name. Please manually transfer from this container to one with a valid name.", object.containerName))
seenFailedContainers[object.containerName] = true
}
return nil
Expand Down Expand Up @@ -370,7 +370,7 @@ func (cca *cookedCopyCmdArgs) createDstContainer(containerName string, dstWithSA
dstCredInfo := common.CredentialInfo{}

// 3minutes is enough time to list properties of a container, and create new if it does not exist.
ctx, _ := context.WithTimeout(parentCtx, time.Minute * 3)
ctx, _ := context.WithTimeout(parentCtx, time.Minute*3)
if dstCredInfo, _, err = getCredentialInfoForLocation(ctx, cca.fromTo.To(), cca.destination.Value, cca.destination.SAS, false, cca.cpkOptions); err != nil {
return err
}
Expand Down Expand Up @@ -481,7 +481,7 @@ var reverseEncodedChars = map[string]rune{
"%2A": '*',
}

func pathEncodeRules(path string, fromTo common.FromTo, source bool) string {
func pathEncodeRules(path string, fromTo common.FromTo, disableAutoDecoding bool, source bool) string {
loc := common.ELocation.Unknown()

if source {
Expand All @@ -501,8 +501,8 @@ func pathEncodeRules(path string, fromTo common.FromTo, source bool) string {
}
}

// If uploading from Windows or downloading from files, decode unsafe chars
} else if (!source && fromTo.From() == common.ELocation.Local() && runtime.GOOS == "windows") || (!source && fromTo.From() == common.ELocation.File()) {
// If uploading from Windows or downloading from files, decode unsafe chars if user enables decoding
} else if ((!source && fromTo.From() == common.ELocation.Local() && runtime.GOOS == "windows") || (!source && fromTo.From() == common.ELocation.File())) && !disableAutoDecoding {

for encoded, c := range reverseEncodedChars {
for k, p := range pathParts {
Expand Down Expand Up @@ -548,7 +548,7 @@ func (cca *cookedCopyCmdArgs) makeEscapedRelativePath(source bool, dstIsDir bool
}
}

return pathEncodeRules(relativePath, cca.fromTo, source)
return pathEncodeRules(relativePath, cca.fromTo, cca.disableAutoDecoding, source)
}

// If it's out here, the object is contained in a folder, or was found via a wildcard, or object.isSourceRootFolder == true
Expand Down Expand Up @@ -590,7 +590,7 @@ func (cca *cookedCopyCmdArgs) makeEscapedRelativePath(source bool, dstIsDir bool
relativePath = "/" + rootDir + relativePath
}

return pathEncodeRules(relativePath, cca.fromTo, source)
return pathEncodeRules(relativePath, cca.fromTo, cca.disableAutoDecoding, source)
}

// we assume that preserveSmbPermissions and preserveSmbInfo have already been validated, such that they are only true if both resource types support them
Expand Down
Loading

0 comments on commit 87cba06

Please sign in to comment.