-
Notifications
You must be signed in to change notification settings - Fork 4.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Can create 3 subnets with NSG and Route Table associations, but no more #2489
Comments
I was experiencing the same issue, I was able to workaround this by adding depends_on for both nsg and route association For the example, above, I would make it: resource "azurerm_subnet_route_table_association" "routetableassociation3" { I did it for all my associations, and it was no longer stuck |
I'm currently running into this behavior with tf v0.11.13 and provider.azurerm v1.23.0. Adding depends_on to the associations did not help in my case. The trace log never shows tf actually sending a PUT request to create the subnet. Adding --parallelism=1 does fix it for me though. Looking closer at the logs, it looks like a deadlock is happening with |
I was able to get this to work by removing @tombuildsstuff are those vnet locks necessary in |
Since the change that made this work for me was removing Maybe it's not even the vnet lock that causes the problem, but the nsg/route table locks (or a combination of both the vnet lock and the other locks?) I would expect the dag to figure this all out and prevent the deadlocks entirely, but that doesn't seem to be happening here. @tombuildsstuff I noticed you're the original committer for those |
I'm facing a similar issue where creating 3 subnets and their network security group associations work, but as soon as I add a 4th set, it gets stuck in an infinite waiting loop. Both the subnet creation and NSG association creation lock on the vnet + NSG, but they each request the lock in reverse order. This appears to result in a deadlock where a subnet creation step has locked on the vnet, but is waiting for the NSG lock to become unlocked. In the meantime, an NSG association creation step has locked on the NSG, but is waiting for the vnet lock to be unlocked. Corresponding code in the provider: I wonder if this deadlock could be similar to what you're seeing @Moeser - hypothetically, setting parallelism=1 and/or removing the locks removes the deadlock condition. I'm seeing this on terraform 0.11.11, with terraform-provider-azurerm_v1.30.1_x4. Interestingly, this seems to happen frequently (approx. 80% of the time) when running on linux. I've tried to reproduce with same versions and scripts on darwin (Mac), but haven't been able to. Here's the corresponding lock logs (with indexes added at the end of each line to indicate count) Lock requests for the vnet:
Lock requests for the NSG:
|
Yes, you appear to be running into the same deadlock as me. If you'd like, you could try building the azurerm module with the changes from my pull request #3673 to verify that it fixes the bug for you.
This definitely happens to me on darwin too. You might want to try forcing parallelism back to default (10) on your mac. There are combinations of parallelism that will avoid this bug, such as 1, or larger than subnets * 3 (which is why the default of 10 triggers on 4 subnets but not 3). Parallelism values that are exact multiples of 3 (such as 9, 12, etc.) may be less likely to trigger the bug as well, but I haven't verified that. |
Thank you very much for the fix in provider version 1.33.1... I'm not sure if there is still a lingering issue. |
Hi @ewierschke , the deprecated notices and cycling behavior you are seeing are part of a separate issue and more clearly documented in issue #3054 . The summary is that the next major (2.0) release of the azurerm provider will have a behavioral change to how subnets are associated with nsgs and route tables. The warnings could probably use a link or some better info on the expected way to define those resources until 2.0 comes out, but again, that's a separate issue from the deadlocks discussed here. |
@tombuildsstuff we should re-open this since the locking change was rolled back. |
I tried removing the
|
I am facing something similar, as I described in the issue #4471, I have 4 |
I have found that pinning to |
@pedrohdz have just tested on 1.33.0 with 4 subnets and it is still the same - the resources are still stuck being modified/created. |
I have exactly the same issue when I create new subnets with a security group association. Will it help to use the (deprecated) UPDATE: I have now tested with the deprecated property and it works much better 🎉 When will we be unable to use the property |
Issue still open. As we experience this issue in all our environments, can you please work on a fix? |
@ClaudiaBaur I have temporary "fixed" it with the deprecated property: |
@sorenhansendk, yes we already use the workaround in all environments. However, this fct is marked as deprecated and our landscapes are live, please make sure that it remains in for some more time... Might we get a problem when updating the Azure SDKs in near future or do you make sure that nothing breaks? Do you have kind of a roadmap for deprecation and/or fix? |
I am not sure if we're facing this issue as well. |
We seem to also be hit by this when we try to use any version >1.33.0. Currently using terraform 0.11.14 |
@dubuc have you tried v1.36.0 ? There was a change that should fix some of the deadlocks in pull 4501 |
Anyone still running into this issue, try 1.36.0 or newer. The change in #4501 should have reduced the deadlocks |
I believe we hit the same issue with 1.36.0. I will use the latest release and try again tomorrow when I get to the office. @Moeser sorry about the delay, and thank you for the reply. |
I've found a work-around inspired by the comment by @Moeser than it works with parallelism turned off. If you use depends_on to avoid parallelism, it doesn't hang. If you each subnet dependent on the previous subnet and each network security group association dependent on the previous one. For example, the following snippet shows dependencies that avoid parallelism. There are 4 subnets: web, management, netscaler and default. Based on the dependencies, they will be in that order.
|
@embik we should try this approach |
Forcing the resources to serialize via The locks were introduced to work around a bug in the azure API, where it will return a 409 error if multiple subnet/nsg/route changes are made at the same time. The locks work around the issue by forcing the vnet related resources to serialize. I still think the long term solution is to reduce the locks to one or remove them entirely, but that can't happen until the 409 errors are handled in a retryable way. |
Workaround is also difficult to be applied when you use multiple modules in your code to create subnet with NSG & UDR. If these modules run in parallel, you are in problem as depends_on does not work with module. |
I was experiencing this hang as well, upgrading to 1.40.0 seems to have resolved my deadlocking issue. https://github.com/terraform-providers/terraform-provider-azurerm/releases |
Hi there, I'm with 1.40.0 as well and I have similar issue. I figured it always happens when I I had to add this crazy
into
Is it known yet what is causing such deadlocks? Is it too much parallelism? |
The version 1.40.0 doesn't solve our deadlock issue. :( |
Same issue exists in |
Sadly I have experienced this in AzureRM 2.0.0 . It loops forever during a destroy of 1 nic and 1 nsg association. Interestingly, it only seems to happen when I have a azurerm firewall that is also attached to the same vnet. The firewall gets deleted successfully , but this lock issue occurs in any case relating to the nic and nsg assoc. If I remove the firewall from the config, all is well every time. |
Same can be said for nat gateway - if you try to create one whilst also creating several NICs for instance, and you get a deadlock condition. Definitely an issue. If I add depends_on to the creation of the azurerm_nat_gateway specifying the NICs it goes through just fine. |
Same deadlock issue in azurerm 2.7.0. I could only resolve it by serializing the creation of all azurerm_subnet_network_security_group_association and azurerm_subnet_nat_gateway_association items. |
same issue as @samartzidis , I have both azurerm_subnet_network_security_group_association and azurerm_subnet_nat_gateway_association, 3 of each for 3 subnets. Have to comment them out, create all stack, then uncomment and add the associations. Will try the "depends_on" now, thanks for suggestions :) |
I am still running into this issue with 2.39.0, even when using Edit: sorry, my bad, it turns out I have overlooked one resource. |
Nope, cheered too soon. Deadlock is still occurring even with |
Still an issue in 2.67.0, even when using @tombuildsstuff / @katbyte, could you take a look at @sanderginn PR #12267, please, to add your knowledge of if performing subnet locks after vnet locks, instead of before, solves the issue? If not, could you recommend who would be the best person to review it? |
This functionality has been released in v2.68.0 of the Terraform Provider. Please see the Terraform documentation on provider versioning or reach out if you need any assistance upgrading. For further feature requests or bug reports with this functionality, please create a new GitHub issue following the template. Thank you! |
I'm going to lock this issue because it has been closed for 30 days ⏳. This helps our maintainers find and focus on the active issues. |
Community Note
Terraform (and AzureRM Provider) Version
Terraform v0.11.10
Affected Resource(s)
Terraform Configuration Files
Debug Output
https://gist.github.com/ewierschke/075040ee240e8c51d0ddb63e1e0779ea
Panic Output
No Panic
Expected Behavior
All subnets should be created against existing VNET with appropriate association of a pre-created NSG and appropriate association of a newly created route table.
Actual Behavior
Only 3 subnets are created and terraform gets stuck in ~infinite 'Still creating...' loop. Terrform sees one of four subnets as still creating along with 3 route table associations and 2 NSG associations.
Steps to Reproduce
terraform apply
Important Factoids
Not sure how important, but in a larger deployment am trying to create 12+ subnets at once (what is provided is what I have been able to narrow it down to). Was able to move the subnet and associations segment into a module and more than 3 subnets get created (~8 +/-) but still gets stuck in similar loop.
If this code is executed with the subnet3 resource and association resources commented out, the run succeeds (limiting to 3 new subnets to create).
subnet_names and subnet_prefixes are lists in my variables file.
The VNET already exists with 0 subnets and two address spaces.
Two subnets in one address space and one in the other address space get successfully created.
The NSG to associate is already pre-created that is to be associated with the new subnets.
I don't appear to be hitting API rate limits per the debug output.
If the above code is executed with
-parallelism=1
the apply succeeds.Not sure what I might be missing here or if maybe there is a limitation on the Microsoft.Network virtualNetworks API?
References
The text was updated successfully, but these errors were encountered: