Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Change In API behavior? #7207

Closed
dbilleci-lightstream opened this issue Sep 12, 2019 · 14 comments
Closed

Change In API behavior? #7207

dbilleci-lightstream opened this issue Sep 12, 2019 · 14 comments
Assignees

Comments

@dbilleci-lightstream
Copy link

I'm using PULUMI which is based on Terraform providers for azure. I've checked their bugs page and don't see anything related.

I'm running into an issue where if I create a simple VM which gets linked to an existing load balancer, and then try to destroy it, I get a new error message which I believe is from the API:

Created resources:

azure:core:ResourceGroup
azure:network:NetworkInterface
azure:network:NetworkInterfaceBackendAddressPoolAssociation
azure:compute:VirtualMachine

Then, on teardown of resources, I get this failure:

  pulumi:pulumi:Stack (compute-api-compute-api-b-stage-eus2):
    error: update failed

  azure:network:NetworkInterfaceBackendAddressPoolAssociation (api-B-P-0-Stage-Eus2-Assoc):
    error: Plan apply failed: deleting urn:pulumi:compute-api-b-stage-eus2::compute-api::azure:network/networkInterfaceBackendAddressPoolAssociation:NetworkInterfaceBackendAddressPoolAssociation::api-B-P-0-Stage-Eus2-Assoc: Error waiting for removal of Backend Address Pool Association for NIC "api-B-Vm-0-Stage-Eus2-Nic" (Resource Group "api-B-Stage-Eus2-Rg"): Code="OperationNotAllowed" Message="Operation 'startTenantUpdate' is not allowed on VM 'api-B-0-Stage-Eus2-Vm' since the VM is marked for deletion. You can only retry the Delete operation (or wait for an ongoing one to complete)." Details=[]

I've not changed my infra code, and I got this on 2 different provider versions, one from about 2 months ago (that has been working) and now on the newest provider version today, same error.

Any ideas?

@visokoo
Copy link

visokoo commented Sep 13, 2019

I, too am getting the same error message but with NetworkInterfaceApplicationSecurityGroupAssociation instead.

azure:network:NetworkInterfaceApplicationSecurityGroupAssociation (justcoyote-wo-rg-0-asg-association):
error: Plan apply failed: deleting urn:pulumi:test-idca_pulumi-bastion-no-rg-import-23d7xc2e::bastion::idca:modules:bastion:Bastion$azure:network/networkInterfaceApplicationSecurityGroupAssociation:NetworkInterfaceApplicationSecurityGroupAssociation::justcoyote-wo-rg-0-asg-association: Error waiting for removal of Application Security Group for NIC "justcoyote-wo-rg-0-nic" (Resource Group "justcoyote-wo-rg-rg"): Code="OperationNotAllowed" Message="Operation 'startTenantUpdate' is not allowed on VM 'justcoyote-wo-rg-0-vm' since the VM is marked for deletion. You can only retry the Delete operation (or wait for an ongoing one to complete)." Details=[]

This was working fine for me about a month ago. Did some API behavior change? Feels like a race condition. Also only happens to me on destroy.

@dbilleci-lightstream
Copy link
Author

Ok cool, then I'm not crazy.. good to know. We deploy pretty much 5x a day, and the first day this seemed to hit us was within the last 72 hours.

@weidongxu-microsoft
Copy link
Member

@dbilleci-lightstream

I think this should be raised on terraform or pulumi SDK. Is it this github https://github.com/terraform-providers/terraform-provider-azurerm ?

This API spec is for everything from Azure portal to CLI/powershell/go/java/dotnet client. If this got problem everyone is in trouble hence it is unlikely the cause of your issue (personally I just deleted a few resources on portal and see no issue).

@dbilleci-lightstream
Copy link
Author

Alrighty, I'll raise it Pulumi first. Honestly I think it's something on API side, just because the portal doesn't throw an error doesn't mean there was a chance on the API spec, but I'll go ahead and do that too and let the finger pointing begin 👇 👈 👉

@dbilleci-lightstream
Copy link
Author

I, too am getting the same error message but with NetworkInterfaceApplicationSecurityGroupAssociation instead.
This was working fine for me about a month ago. Did some API behavior change? Feels like a race condition. Also only happens to me on destroy.

@visokoo are you using terraform or pulumi ?

@weidongxu-microsoft
Copy link
Member

weidongxu-microsoft commented Sep 13, 2019

@dbilleci-lightstream

This spec basically just dictate the endpoint, and that request JSON should look like this and response JSON would look like that. It really does not care how's backend implementation.

The spec is more like Interface in programming language. If you cannot call something it is spec problem. If you call something and sometime it fails at runtime it is more likely implementation problem.

In your case (and @visokoo 's) it seems the "startTenantUpdate" is kind of get involved (though no one seems calling it explicitly). So looks to me either SDK calls it unwittingly, or backend is trying to do this to your resource without you calling it. Either is runtime problem.

@dbilleci-lightstream
Copy link
Author

Is there a better Microsoft github where I should post this?

Ok so if it is a backend runtime problem, where is the appropriate place to report to Microsoft a backend runtime problem?

@weidongxu-microsoft
Copy link
Member

weidongxu-microsoft commented Sep 13, 2019

Calling Azure customer support is usually the easiest way (if it does not cost anything :-)).

If you are using tool provided by Azure team, raise an issue there would also help (e.g. https://github.com/Azure/azure-cli if you are using Azure CLI). Team maintaining the SDK would try to figure out which service to contact.
However I could not find Microsoft provided SDK (if any) underlying Pulumi or Terraform.

I will try to involve people I know that works on AME terraform project but I am not sure whether they are related or not.

@mybayern1974
Is there any github that related to Pulumi or Terraform user?

abhinavdahiya added a commit to abhinavdahiya/installer that referenced this issue Sep 13, 2019
…is terminated

It looks like an internal Azure race is causing a cryptic message like

```
Error: Error waiting for removal of Backend Address Pool Association for NIC \"ci-op-8qv3w054-282fe-2222c-bootstrap-nic\" (Resource Group \"ci-op-8qv3w054-282fe-2222c-rg\"): Code=\"OperationNotAllowed\" Message=\"Operation 'startTenantUpdate' is not allowed on VM 'ci-op-8qv3w054-282fe-2222c-bootstrap' since the VM is marked for deletion. You can only retry the Delete operation (or wait for an ongoing one to complete).\" Details=[]
```

when we update the NIC and the machine attached to it.

`azurerm_network_interface_backend_address_pool_association.` depends on NIC but is not related to the machine NIC is attached to, the VM might be shutting down while this update is happeninig. This depends_on makes sure that VM is destroyed before we try to delete this association, preventing the race.

Similar error seen by people like Azure/azure-rest-api-specs#7207
@dbilleci-lightstream
Copy link
Author

dbilleci-lightstream commented Sep 13, 2019

Calling customer support.. seriously? That's the best you can do? Have you ever tried to call customer support?

Edit: sorry, sounded a bit harsh - my point is if you have 3+ people here reporting the issue, cant you escalate on your side, I'm assuming you are employed by Microsoft?

@visokoo
Copy link

visokoo commented Sep 13, 2019

I, too am getting the same error message but with NetworkInterfaceApplicationSecurityGroupAssociation instead.
This was working fine for me about a month ago. Did some API behavior change? Feels like a race condition. Also only happens to me on destroy.

@visokoo are you using terraform or pulumi ?

Pulumi. It honestly feels like an intermittent failure. The destroy doesn't fail every time which leads me to think in the direction of flaky API. @dbilleci-lightstream, if you post on Pulumi's Github issues, can you link back to here?

@dbilleci-lightstream
Copy link
Author

Yes, I did post there: pulumi/pulumi-azure#365

He asked for code to reproduce it, but I'll have to write special code to do so as it's in the middle of a big chunk of "other stuff" in our automation pipeline.

If you have a simpler example in the meantime, please feel free to post it there!

@mybayern1974
Copy link

@dbilleci-lightstream , @ArcturusZhang can help take a look at the two terraform issues: 2491 and 4330 which are motivated by the issue filed at pulumi issue 365. I cannot tell right now whether it's rest api related before investigating the terraform behavior, so i suggest close the issue here, instead, use the terraform issue to track. We can reopen this issue once we find the root cause is at here. Do you agree with this? If yes we may close this issue here in 3 days.

@dbilleci-lightstream
Copy link
Author

dbilleci-lightstream commented Sep 18, 2019 via email

@mybayern1974
Copy link

Close per the author agrees. I would reopen this once finding the root cause is at rest api here.

jhixson74 pushed a commit to jhixson74/installer that referenced this issue Dec 6, 2019
…is terminated

It looks like an internal Azure race is causing a cryptic message like

```
Error: Error waiting for removal of Backend Address Pool Association for NIC \"ci-op-8qv3w054-282fe-2222c-bootstrap-nic\" (Resource Group \"ci-op-8qv3w054-282fe-2222c-rg\"): Code=\"OperationNotAllowed\" Message=\"Operation 'startTenantUpdate' is not allowed on VM 'ci-op-8qv3w054-282fe-2222c-bootstrap' since the VM is marked for deletion. You can only retry the Delete operation (or wait for an ongoing one to complete).\" Details=[]
```

when we update the NIC and the machine attached to it.

`azurerm_network_interface_backend_address_pool_association.` depends on NIC but is not related to the machine NIC is attached to, the VM might be shutting down while this update is happeninig. This depends_on makes sure that VM is destroyed before we try to delete this association, preventing the race.

Similar error seen by people like Azure/azure-rest-api-specs#7207
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants