Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Internal Server Error on what-if analysis after upgrading to 2.28.0 from 2.26.0 #19850

Open
Gordonby opened this issue Oct 11, 2021 · 19 comments
Labels
ARM az resource/group/lock/tag/deployment/policy/managementapp/account management-group Service Attention This issue is responsible by Azure service team.

Comments

@Gordonby
Copy link

az feedback auto-generates most of the information requested below, as of CLI version 2.0.62

Describe the bug
Performing a what-if analysis results in an internal server error, i'm unable to decipher the problem. The actual deployment then continues to works fine on 2.28.0.

Reverting to AZ 2.26.0 results in the what-if and the deployment both working.

See the full What-If debug log here: https://github.com/Azure/Aks-Construction/runs/3863296553?check_suite_focus=true#step:11:146

DEBUG: cli.azure.cli.core.sdk.policies: ***"status":"Failed","error":***"code":"InternalServerError","message":"Encountered internal server error while processing the deployment what-if request. Diagnostic information: timestamp '20211011T200722Z', scope '***', tracking id '74f6c7cc-d330-4264-ad96-77c3a8ae7f55', request correlation id '2fd86693-ec6c-4931-9c2a-7e8b413496d5'."***
DEBUG: cli.azure.cli.core.util: azure.cli.core.util.handle_exception is called with an exception:
DEBUG: cli.azure.cli.core.util: Traceback (most recent call last):
  File "/usr/local/lib/python3.9/site-packages/azure/cli/core/commands/__init__.py", line 691, in _run_job
    result = cmd_copy(params)
  File "/usr/local/lib/python3.9/site-packages/azure/cli/core/commands/__init__.py", line 328, in __call__
    return self.handler(*args, **kwargs)
  File "/usr/local/lib/python3.9/site-packages/azure/cli/core/commands/command_operation.py", line 121, in handler
    return op(**command_args)
  File "/usr/local/lib/python3.9/site-packages/azure/cli/command_modules/resource/custom.py", line 772, in what_if_deploy_arm_template_at_resource_group
    return _what_if_deploy_arm_template_at_resource_group_core(cmd, resource_group_name,
  File "/usr/local/lib/python3.9/site-packages/azure/cli/command_modules/resource/custom.py", line 795, in _what_if_deploy_arm_template_at_resource_group_core
    what_if_result = _what_if_deploy_arm_template_core(cmd.cli_ctx, what_if_poller, no_pretty_print, exclude_change_types)
  File "/usr/local/lib/python3.9/site-packages/azure/cli/command_modules/resource/custom.py", line 897, in _what_if_deploy_arm_template_core
    raise CLIError(err_message)
knack.util.CLIError: InternalServerError - Encountered internal server error while processing the deployment what-if request. Diagnostic information: timestamp '20211011T200722Z', scope '***', tracking id '74f6c7cc-d330-4264-ad96-77c3a8ae7f55', request correlation id '2fd86693-ec6c-4931-9c2a-7e8b413496d5'.


During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/usr/local/lib/python3.9/site-packages/azure/cli/core/commands/arm.py", line 109, in handle_template_based_exception
    raise CLIError(ex.inner_exception.error.message)
AttributeError: 'CLIError' object has no attribute 'inner_exception'`

To Reproduce
Use the bicep files and the parameter file.
Run the following command, substituting two subnet resource id's and an arbitary resource name.

az deployment group what-if --debug -f bicep/main.bicep -g $RG -p .github/workflows_dep/AksDeploy-ByoVnet.parameters.json -p resourceName=$RESNAME byoAKSSubnetId=*** byoAGWSubnetId=*** 

Expected behavior
I'd like a clear error message of what's actually failed, or for it just to work as it did in the previous CLI version.

Environment summary
GitHub Az CLI Action.

Additional context
https://github.com/Azure/Aks-Construction/runs/3863296553?check_suite_focus=true#step:11:146

@ghost ghost added the needs-triage This is a new issue that needs to be triaged to the appropriate team. label Oct 11, 2021
@yonzhan yonzhan added the ARM az resource/group/lock/tag/deployment/policy/managementapp/account management-group label Oct 11, 2021
@ghost ghost removed the needs-triage This is a new issue that needs to be triaged to the appropriate team. label Oct 11, 2021
@yonzhan yonzhan added needs-triage This is a new issue that needs to be triaged to the appropriate team. Service Attention This issue is responsible by Azure service team. labels Oct 11, 2021
@ghost ghost removed the needs-triage This is a new issue that needs to be triaged to the appropriate team. label Oct 11, 2021
@ghost
Copy link

ghost commented Oct 11, 2021

Thanks for the feedback! We are routing this to the appropriate team for follow-up. cc @armleads-azure.

Issue Details

az feedback auto-generates most of the information requested below, as of CLI version 2.0.62

Describe the bug
Performing a what-if analysis results in an internal server error, i'm unable to decipher the problem. The actual deployment then continues to works fine on 2.28.0.

Reverting to AZ 2.26.0 results in the what-if and the deployment both working.

See the full What-If debug log here: https://github.com/Azure/Aks-Construction/runs/3863296553?check_suite_focus=true#step:11:146

DEBUG: cli.azure.cli.core.sdk.policies: ***"status":"Failed","error":***"code":"InternalServerError","message":"Encountered internal server error while processing the deployment what-if request. Diagnostic information: timestamp '20211011T200722Z', scope '***', tracking id '74f6c7cc-d330-4264-ad96-77c3a8ae7f55', request correlation id '2fd86693-ec6c-4931-9c2a-7e8b413496d5'."***
DEBUG: cli.azure.cli.core.util: azure.cli.core.util.handle_exception is called with an exception:
DEBUG: cli.azure.cli.core.util: Traceback (most recent call last):
  File "/usr/local/lib/python3.9/site-packages/azure/cli/core/commands/__init__.py", line 691, in _run_job
    result = cmd_copy(params)
  File "/usr/local/lib/python3.9/site-packages/azure/cli/core/commands/__init__.py", line 328, in __call__
    return self.handler(*args, **kwargs)
  File "/usr/local/lib/python3.9/site-packages/azure/cli/core/commands/command_operation.py", line 121, in handler
    return op(**command_args)
  File "/usr/local/lib/python3.9/site-packages/azure/cli/command_modules/resource/custom.py", line 772, in what_if_deploy_arm_template_at_resource_group
    return _what_if_deploy_arm_template_at_resource_group_core(cmd, resource_group_name,
  File "/usr/local/lib/python3.9/site-packages/azure/cli/command_modules/resource/custom.py", line 795, in _what_if_deploy_arm_template_at_resource_group_core
    what_if_result = _what_if_deploy_arm_template_core(cmd.cli_ctx, what_if_poller, no_pretty_print, exclude_change_types)
  File "/usr/local/lib/python3.9/site-packages/azure/cli/command_modules/resource/custom.py", line 897, in _what_if_deploy_arm_template_core
    raise CLIError(err_message)
knack.util.CLIError: InternalServerError - Encountered internal server error while processing the deployment what-if request. Diagnostic information: timestamp '20211011T200722Z', scope '***', tracking id '74f6c7cc-d330-4264-ad96-77c3a8ae7f55', request correlation id '2fd86693-ec6c-4931-9c2a-7e8b413496d5'.


During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/usr/local/lib/python3.9/site-packages/azure/cli/core/commands/arm.py", line 109, in handle_template_based_exception
    raise CLIError(ex.inner_exception.error.message)
AttributeError: 'CLIError' object has no attribute 'inner_exception'`

To Reproduce
Use the bicep files and the parameter file.
Run the following command, substituting two subnet resource id's and an arbitary resource name.

az deployment group what-if --debug -f bicep/main.bicep -g $RG -p .github/workflows_dep/AksDeploy-ByoVnet.parameters.json -p resourceName=$RESNAME byoAKSSubnetId=*** byoAGWSubnetId=*** 

Expected behavior
I'd like a clear error message of what's actually failed, or for it just to work as it did in the previous CLI version.

Environment summary
GitHub Az CLI Action.

Additional context
https://github.com/Azure/Aks-Construction/runs/3863296553?check_suite_focus=true#step:11:146

Author: Gordonby
Assignees: -
Labels:

Service Attention, ARM, needs-triage

Milestone: -

@yonzhan
Copy link
Collaborator

yonzhan commented Oct 11, 2021

route to service team

@zhoxing-ms
Copy link
Contributor

@shenglol Could you please help to have a look at this issue?

@Gordonby
Copy link
Author

I can confirm this is also an issue on 2.29.0

@Gordonby
Copy link
Author

@shenglol
Copy link
Contributor

I'm still trying to figure out where the internal error comes from. It doesn't seem to be returned by the What-If service.

@shenglol
Copy link
Contributor

Found the root cause. This is indeed a bug in the What-If engine. We added normalization process for Azure KeyVault access policies in 2021-01-01, but there's a case we failed to handle. I'm going to fix it.

@Gordonby
Copy link
Author

Hi @shenglol - my workaround of using an older version of the Az CLI is broken because of Azure/cli#56 - So i'm forced to use v2.30.0 of the Az CLI until either

  1. This feature is implemented Support azcliversion login#164
  2. I stop using the Azure Login action, decompose the json secret myself and just do an az login command myself.

Do you have a view of what CLI version your fix will make it into?

@shenglol
Copy link
Contributor

This is a bug in our service with API version 2021-01-01 and after. We have checked in a fix and it should be rolled out in a about 2 weeks. Once it's rolled out any CLI version should work.

@centur
Copy link

centur commented Dec 13, 2021

Hi, any update on the rollout progress ? I'm getting the same error with az version:

{
  "azure-cli": "2.31.0",
  "azure-cli-core": "2.31.0",
  "azure-cli-telemetry": "1.0.6",
  "extensions": {}
}

Is there any workaround I can apply while fix is rolling e.g. switch specific bicep resources to a different api version or something else ?

Interesting, though --what-if flag in az deployment sub create --what-if is working fine for me but getting the below error message when trying to deploy my bicep with az deployment sub create --confirm-with-what-if:

InternalServerError - Encountered internal server error while processing the deployment what-if request. Diagnostic information: timestamp '20211213T025031Z', scope '/subscriptions/***', tracking id 'de477509-e614-4795-8cdc-91d5e082f640', request correlation id '4500c47a-afa4-4215-948d-a7078e308894'.

Any hints for workaround are greatly appreciated.

@centur
Copy link

centur commented Dec 13, 2021

Additional note: when I run az deployment with --debug (or --verbose) flag - bicep code works fine and everything is deployed.
If I run it without --debug (which is preferrable way as it's less noisy in CI/logs) -
this issue strikes back:

InternalServerError - Encountered internal server error while processing the deployment what-if request. Diagnostic information: timestamp '20211213T030820Z', scope '/subscriptions/***', tracking id '2bc75173-fc1b-4c6c-9949-dda4dddb1a89', request correlation id '719c792b-e545-42ef-89b7-66521e78376d'.

UPD: Getting some inconsistent results - sometimes --verbose helps, sometimes it doesn't. Feels like a subtle race condition...

@shenglol
Copy link
Contributor

@centur This is most likely because the requests were picked up by worker jobs in different regions. Up to this point, the fix is rolled out to 5 regions, and it still needs more time to be fully deployed.

@centur
Copy link

centur commented Dec 14, 2021

@shenglol Can I somehow affect what workers pick up my jobs ? We have resources in AU East and Southeast regions and it's affecting our productivity a lot - deployments are failing randomly and there is nothing we can do to work around it.

Upd: also prob ignore my guesses about how to get bicep code working with --debug or --verbose flags - it's ended up being a mere co-incidence that it was working for me on the day of testing and after that I'm getting all kind of random results. Those flags are more of a cargo cult practice I inferred when tried to find some predictability in results on my end. They don't work the way I thought they were

@Gordonby
Copy link
Author

I'm now seeing

InternalServerError - Encountered internal server error while processing the deployment what-if request.

On previously working actions that were using 2.30.0 with no template/parameter changes
Moving to 2.31.0 has not fixed the issue.

@shenglol
Copy link
Contributor

@Gordonby I just realized you might be using the MSFT tenant which is onboarded to deployments preview features...and this appears to be a new bug we just identified in the recently added preview feature that enables reference function preview in What-If.

The bug is basically an unhandled edge where a null ref exception will be thrown if the referenced resource in the template does not contain the properties property. Unfortunately, that happens to be the case in your generated ARM template file which contains two Microsoft.ManagedIdentity/userAssignedIdentities whose properties is not emitted by Bicep because it is read-only.

The current workaround would be to replace all user assigned identity property accesses in main.bicep with full mode reference functions to opt out reference function evaluation in What-If. For example:

appGwIdentity.properties.principalId
=>
reference(appGwIdentity.id, appGwIdentity.apiVersion, 'Full').properties.principalId

We have committed a fix for this, but given the upcoming holiday deployment freeze, it might take an extended time for the fix to be rolled out. My apologies for any inconvenience caused!

@shenglol
Copy link
Contributor

@shenglol Can I somehow affect what workers pick up my jobs ? We have resources in AU East and Southeast regions and it's affecting our productivity a lot - deployments are failing randomly and there is nothing we can do to work around it.

@centur There's no way to control that. Do you mind sharing your ARM template and emailing me it at [email protected]? I am curious to see if I can provide a workaround, but I won't be able to tell without seeing the contents of the template.

@Gordonby
Copy link
Author

@Gordonby I just realized you might be using the MSFT tenant which is onboarded to deployments preview features...and this appears to be a new bug we just identified in the recently added preview feature that enables reference function preview in What-If.

The bug is basically an unhandled edge where a null ref exception will be thrown if the referenced resource in the template does not contain the properties property. Unfortunately, that happens to be the case in your generated ARM template file which contains two Microsoft.ManagedIdentity/userAssignedIdentities whose properties is not emitted by Bicep because it is read-only.

The current workaround would be to replace all user assigned identity property accesses in main.bicep with full mode reference functions to opt out reference function evaluation in What-If. For example:

appGwIdentity.properties.principalId
=>
reference(appGwIdentity.id, appGwIdentity.apiVersion, 'Full').properties.principalId

We have committed a fix for this, but given the upcoming holiday deployment freeze, it might take an extended time for the fix to be rolled out. My apologies for any inconvenience caused!

Yes - i'm using the Microsoft tenant. Thanks for the workaround note.

@harrchen88
Copy link

I got the exact same error when running with what-if but fine without it. Weird thing is this only happens in one subscription but not in the other. The error is below. My az cli version is 2.32.0 and bicep version is 0.4.1318.
InternalServerError - Encountered internal server error while processing the deployment what-if request. Diagnostic information: timestamp '20220511T184620Z', scope '/subscriptions/...' tracking id 'fa4a34cc-0121-42b3-a833-da0ad799b803', request correlation id 'a8a944cf-b20b-4199-a310-7205e8b437ac'.

@shenglol
Copy link
Contributor

@harrchen88 Could you share your Bicep template?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
ARM az resource/group/lock/tag/deployment/policy/managementapp/account management-group Service Attention This issue is responsible by Azure service team.
Projects
None yet
Development

No branches or pull requests

6 participants