-
Notifications
You must be signed in to change notification settings - Fork 4.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
azurerm_network_security_group_association - prevent deadlock between association and network interface creation #4501
Conversation
Update from master TF repo
…_interface, caused by them locking the same resources, but in a different order.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fantastic spot @nitzanm!
Thank you very much for this fix, LGTM and I'll review all other lock orders in the provider now.
…ndle private_dns zone Using the upstream azurerm provider is not possible for now because of following reasons: 1) There is not srv record resource for private dns zone 2) The version of provider that has the private dns zone resources `1.34.0` has a lot of bugs like * hashicorp/terraform-provider-azurerm#4452 * hashicorp/terraform-provider-azurerm#4453 * hashicorp/terraform-provider-azurerm#4501 Some of these bugs are fixed, and some are in flight. Another reason moving to `1.36.0` which might have all the fixes we need is the provider has moved to using `standalone terraform plugin SDK v1.1.1` [1]. Because we vendor both terraform and providers, this causes errors like `panic: gob: registering duplicate types for "github.com/zclconf/go-cty/cty.primitiveType": cty.primitiveType != cty.primitiveType` Therefore, we would have to move towards a single vendor for terraform and plugins for correct inter-operation, which is tricker due to conflicts elsewhere A simple 4 resource plugin that re-uses the already vendored azurerm provider as library and carries over the required resources seems like an easy fix for now. [1]: hashicorp/terraform-provider-azurerm#4474
This has been released in version 1.36.0 of the provider. Please see the Terraform documentation on provider versioning or reach out if you need any assistance upgrading. As an example: provider "azurerm" {
version = "~> 1.36.0"
}
# ... other configuration ... |
I'm going to lock this issue because it has been closed for 30 days ⏳. This helps our maintainers find and focus on the active issues. If you feel this issue should be reopened, we encourage creating a new issue linking back to this one for added context. If you feel I made an error 🤖 🙉 , please reach out to my human friends 👉 [email protected]. Thanks! |
I believe we have found a bug in the Azure Resource Manager provider for Terraform, having to do with the azurerm_subnet_network_security_group_association resource and its interaction with azurerm_network_interface.
The issue is that the azurerm_subnet_network_security_group_association resource, when being created, locks the following resources, in the following order:
However, the azurerm_network_interface resource, when being created, locks the following resources, in the following order:
You will notice that both lock the virtual network and subnet, but in the opposite order. This means that if both resources happen to be created at the same time in two threads, the following could happen:
In this situation, the two resources will wait indefinitely for each other until Terraform is terminated - a deadlock. Our trigger for finding this bug was that Terraform would be trying to create interfaces and associations forever. Of course, this only happens intermittently, because it depends on the exact order above happening. For example (resource names redacted):
To fix this, we simply need to lock the three resources in the same order across both resources. As a workaround, we are currently having all network interfaces depend on our NSG association, since that will ensure both are never created at the same time.
As an aside, is there any way to review all locks across the provider, to ensure this situation doesn't happen elsewhere? Perhaps there could be a policy to always lock resources in alphabetical order so that in the future this doesn't happen again?