Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG]Resource Processor error running porter install when creating Base Workspace #1067

Closed
AidenQueensU opened this issue Nov 4, 2021 · 5 comments · Fixed by Azure/azure-cli#20219 or #1071
Assignees
Labels
bug Something isn't working

Comments

@AidenQueensU
Copy link

Fresh TRE deploy from commit a23b19c

Describe the bug
After deploying TRE, and publishing and registering the included base workspace bundle, attempting to create a workspace from the base bundle fails. The API says the deployment failed, and returns this error:

e51bc50c-7b3f-453a-bcec-30a3266982e8: Error context message = Error: could not save the pending action's status, the bundle was not executed:could not read storage schema document:Failed to login with Azure CLI:Error parsing Token Expiration Date \"\":Error parsing expiration date \"\".  

CloudShell Error: parsing time \"\" as \"2006-01-02T15:04:05Z07:00\": cannot parse \"\" as \"2006\"

CLI Error: parsing time \"\" as \"2006-01-02 15:04:05.999999\": cannot parse \"\" as \"2006\"

azureconfig.Config{EnvConnectionString:\"\", StorageAccount:\"treprotostorage\", StorageAccountResourceGroup:\"it-treproto-tst-rg\", StorageAccountSubscriptionId:\"\", EnvAzurePrefix:\"\", Vault:\"\"}

az login --identity -u df...94
&& az acr login --name treprotoacr
&& porter install \"e51bc50c-7b3f-453a-bcec-30a3266982e8\"  --reference treprotoacr.azurecr.io/tre-workspace-base:v0.1.5
	--param address_space=\"10.1.4.0/24\" --param arm_use_msi=\"true\" --param azure_location=\"canadacentral\" --param id=\"e51bc50c-7b3f-453a-bcec-30a3266982e8\"
	--param tfstate_container_name=\"tfstate\" --param tfstate_resource_group_name=\"it-treproto-tst-rg\" --param tfstate_storage_account_name=\"treprotostorage\"
	--param tre_id=\"treproto\" --cred ./vmss_porter/azure.json --allow-docker-host-access --force 
&& porter show e51bc50c-7b3f-453a-bcec-30a3266982e8"

I have run the commands listed in the error individually from the resource processor container, and it is the porter install that yields this error.

For what it's worth, here's the JSON used to register the bundle:

{
  "name": "tre-workspace-base",
  "description": "A base Azure TRE workspace",
  "version": "0.1.5",
  "porterVersion": "v0.38.7",
  "parameters": [
    {
      "name": "address_space",
      "type": "string",
      "default": null,
      "applyTo": "All Actions",
      "description": "VNet address space for the workspace services",
      "required": true
    },
    {
      "name": "arm_use_msi",
      "type": "string",
      "default": false,
      "applyTo": "All Actions",
      "description": "",
      "required": false
    },
    {
      "name": "azure_location",
      "type": "string",
      "default": null,
      "applyTo": "All Actions",
      "description": "Azure location (region) to deploy to",
      "required": true
    },
    {
      "name": "id",
      "type": "string",
      "default": null,
      "applyTo": "All Actions",
      "description": "the resource ID for this installation",
      "required": true
    },
    {
      "name": "tfstate_container_name",
      "type": "string",
      "default": "tfstate",
      "applyTo": "All Actions",
      "description": "The name of the Terraform state storage container",
      "required": false
    },
    {
      "name": "tfstate_resource_group_name",
      "type": "string",
      "default": null,
      "applyTo": "All Actions",
      "description": "Resource group containing the Terraform state storage account",
      "required": true
    },
    {
      "name": "tfstate_storage_account_name",
      "type": "string",
      "default": null,
      "applyTo": "All Actions",
      "description": "The name of the Terraform state storage account",
      "required": true
    },
    {
      "name": "tre_id",
      "type": "string",
      "default": null,
      "applyTo": "All Actions",
      "description": "The ID of the parent TRE instance e.g., mytre-dev-3142",
      "required": true
    }
  ],
  "credentials": [
    {
      "name": "azure_client_id",
      "description": "",
      "required": true,
      "applyTo": "All Actions"
    },
    {
      "name": "azure_client_secret",
      "description": "",
      "required": true,
      "applyTo": "All Actions"
    },
    {
      "name": "azure_subscription_id",
      "description": "",
      "required": true,
      "applyTo": "All Actions"
    },
    {
      "name": "azure_tenant_id",
      "description": "",
      "required": true,
      "applyTo": "All Actions"
    }
  ],
  "json_schema": {
    "$schema": "http://json-schema.org/draft-07/schema",
    "$id": "https://github.com/microsoft/AzureTRE/templates/workspaces/base/template_schema.json",
    "type": "object",
    "title": "Base Workspace",
    "description": "This workspace template is the foundation for TRE workspaces and workspace services.",
    "required": [],
    "properties": {}
  },
  "resourceType": "workspace",
  "current": "true"
}

Steps to reproduce
Follow the "installing-base-workspace" guide in the docs: https://github.com/microsoft/AzureTRE/blob/main/docs/tre-admins/setup-instructions/installing-base-workspace.md

To reproduce in the resource processor container, connect via Bastion to the Resource Processor VMSS instance, exec into the container, and run the commands the error states individually. Note that you'll need to fudge an environment variable in the container as runner.py does (ln 303) when using MSI auth: export ARM_CLIENT_SECRET=""

@AidenQueensU AidenQueensU added the bug Something isn't working label Nov 4, 2021
@marrobi marrobi self-assigned this Nov 4, 2021
@marrobi
Copy link
Member

marrobi commented Nov 4, 2021

Thanks @AidenQueensU .

What happens if you try just az login from the resource processor VM using it's MSI.

So similar format to:

    Log in using a VM's user assigned identity. Client or object ids of the service identity also
    work
        az login --identity -u /subscriptions/<subscriptionId>/resourcegroups/myRG/providers/Microso        ft.ManagedIdentity/userAssignedIdentities/myID

Can you also give the output of az --version

I've not seen the issue and am running the same code, however there was a new Azure CLI release not long ago - https://github.com/Azure/azure-cli/releases/tag/azure-cli-2.30.0 that might have broken something.

If that is the case we will have to try downgrading the az cli version.

@AidenQueensU
Copy link
Author

@marrobi
az --version

root@c633dac7563c:/app# az --version
azure-cli                         2.30.0

core                              2.30.0
telemetry                          1.0.6

Python location '/opt/az/bin/python3'
Extensions directory '/root/.azure/cliextensions'

Python (Linux) 3.6.10 (default, Oct 29 2021, 10:11:23) 
[GCC 8.3.0]

Legal docs and information: aka.ms/AzureCliLegal


Your CLI is up-to-date.

Please let us know how we are doing: https://aka.ms/azureclihats
and let us know if you're interested in trying out our newest features: https://aka.ms/CLIUXstudy

Login with identity

root@c633dac7563c:/app# az login --identity -u /subscriptions/02...71/resourceGroups/rg-treproto/providers/Microsoft.ManagedIdentity/userAssignedIdentities/id-vm
ss-treproto
[
  {
    "environmentName": "AzureCloud",
    "homeTenantId": "d6...5c",
    "id": "02...71",
    "isDefault": true,
    "managedByTenants": [],
    "name": "Queen's - ITS Testing",
    "state": "Enabled",
    "tenantId": "d6...5c",
    "user": {
      "assignedIdentityInfo": "MSIResource-/subscriptions/02...71/resourceGroups/rg-treproto/providers/Microsoft.ManagedIdentity/userAssignedIdentities/id-vmss-treproto",
      "name": "userAssignedIdentity",
      "type": "servicePrincipal"
    }
  }
]

logging in with az login --identity -u <MSI_Client_ID> gives the same result as above. The error would occur when running the porter install command listed in the error manually in the container

@marrobi
Copy link
Member

marrobi commented Nov 4, 2021

@AidenQueensU ok, I have managed to reproduce.

I upgraded the az cli to 2.30 in my resource processor docker image and got the same error.

I downgraded using:

apt-get -y install azure-cli=2.29.2-1~buster --allow-downgrades

all working again.

I will look to pin the Azure CLI version in the docker image, and get in touch with the porter team to see if we can resolve longer term.

Thanks for reporting this.

@AidenQueensU
Copy link
Author

Confirmed working with the CLI downgrade 👍 Thank you!

@squillace
Copy link

The issue is in the azure cli: Azure/azure-cli#20211 (comment). There is already a PR to fix it: Azure/azure-cli#20219. I would pin, as you've suggested, until the CLI fix rolls in.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
3 participants