Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: Azure Batch worker pool supports managed identity #5670

Open
wants to merge 5 commits into
base: master
Choose a base branch
from

Conversation

adamrtalbot
Copy link
Collaborator

@adamrtalbot adamrtalbot commented Jan 14, 2025

This feature adds an option to tell Azure Batch the worker pool has a managed identity attached.

azcopy needs to be updated, since the current version isn't recent enough.

When using this feature, you can tell Nextflow that a node pool has a managed identity attached:

azure {

    storage {
        accountName = 'seqeralabs'
    }

    batch {
        location = 'eastus'
        accountName = 'seqeralabs'
        copyToolInstallMode = "off"
        pools {
            "hello-world-entra-mi" {
                vmType = "standard_e2ds_v5"
                managedIdentityId = "myManagedIdentityId"
            }

        }
    }
}

it will then use the native azcopy managed identity authentication to download and upload files.

I haven't handled all the old code yet, so Nextflow will generate a SAS which it only uses sometimes.

Importantly, this 'fixes' #5669, because it doesn't need a SAS token to access the files (tested).

To use:

  1. Create a node pool in Azure Batch and attach a user-assigned managed identity. Follow the instructions here or try this if you want to use Terraform
  2. Change the start task to have the following settings to update azcopy:
    • Command line: bash -c "tar -xzvf azcopy.tar.gz && chmod +x azcopy*/azcopy && mkdir -p $AZ_BATCH_NODE_SHARED_DIR/bin/ && cp azcopy*/azcopy $AZ_BATCH_NODE_SHARED_DIR/bin/"
    • Resource file URL: https://aka.ms/downloadazcopy-v10-linux, path azcopy.tar.gz
  3. Retrieve the managed identity client ID
  4. Use the following Nextflow config, where the pool name is my-pool:
process.queue = 'my-pool'
azure {

    storage {
        accountName = 'seqeralabs'
    }

    batch {
        location = 'eastus'
        accountName = 'seqeralabs'
        copyToolInstallMode = "off"
        pools {
            "my-pool" {
                managedIdentityId = "myManagedIdentityId"
            }

        }
    }
}
  1. Launch pipeline and 🤞

To do:

  • See if we can apply the managed ID to pools created by Nextflow. This might need an update from Microsoft
  • Documentation
  • Remove creation of SAS when using managed identity to make everything more secure and allow this to work if we switch off SAS tokens

This feature adds an option to tell Azure Batch the worker pool has a managed identity attached.

Currently, the only feature is it passes the managed identity client id to the task as an environment variable but this could be extended to support file staging in or out.

Signed-off-by: adamrtalbot <[email protected]>
Copy link

netlify bot commented Jan 14, 2025

Deploy Preview for nextflow-docs-staging canceled.

Name Link
🔨 Latest commit 8d28def
🔍 Latest deploy log https://app.netlify.com/sites/nextflow-docs-staging/deploys/6786adb2c42d0600088f4d81

Signed-off-by: adamrtalbot <[email protected]>
No longer requires SAS tokens for file staging, it can use the managed identity to authenticate.

Signed-off-by: adamrtalbot <[email protected]>
@adamrtalbot adamrtalbot requested a review from a team as a code owner January 14, 2025 18:18
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
2 participants