Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Can create 3 subnets with NSG and Route Table associations, but no more #2489

Closed
ewierschke opened this issue Dec 11, 2018 · 39 comments · Fixed by #3673 or #12267
Closed

Can create 3 subnets with NSG and Route Table associations, but no more #2489

ewierschke opened this issue Dec 11, 2018 · 39 comments · Fixed by #3673 or #12267

Comments

@ewierschke
Copy link

ewierschke commented Dec 11, 2018

Community Note

  • Please vote on this issue by adding a 👍 reaction to the original issue to help the community and maintainers prioritize this request
  • Please do not leave "+1" or "me too" comments, they generate extra noise for issue followers and do not help prioritize the request
  • If you are interested in working on this issue or have submitted a pull request, please leave a comment

Terraform (and AzureRM Provider) Version

Terraform v0.11.10

  • provider.azurerm v1.19.0

Affected Resource(s)

  • azurerm_subnet
  • azurerm_route_table
  • azurerm_subnet_network_security_group_association
  • azurerm_subnet_route_table_association

Terraform Configuration Files

provider "azurerm" {
  subscription_id = "${var.sub_id}"
  tenant_id       = "${var.tenant_id}"
  client_id       = "${var.tf_sp_appid}"
  client_secret   = "${var.tf_sp_secret}"
}

data "azurerm_resource_group" "vnet_rg" {
  name     = "${var.vnet_rg_name}"
}

data "azurerm_virtual_network" "existing_vnet" {
  name     = "${var.existing_vnet_name}"
  resource_group_name = "${var.vnet_rg_name}"
}

data "azurerm_network_security_group" "required_nsg" {
  name     = "${var.required_nsg_name}"
  resource_group_name = "${var.vnet_rg_name}"
}
# ##
resource "azurerm_subnet" "subnet0" {
  name                 = "${var.subnet_names[0]}"
  resource_group_name  = "${var.vnet_rg_name}"
  virtual_network_name = "${var.existing_vnet_name}"
  address_prefix       = "${var.subnet_prefixes[0]}"
  network_security_group_id = "${data.azurerm_network_security_group.required_nsg.id}"
  route_table_id       = "${azurerm_route_table.routetable0.id}"
}

resource "azurerm_route_table" "routetable0" {
  name     = "${var.existing_vnet_name}-${var.subnet_names[0]}-Routetable"
  location     = "${data.azurerm_resource_group.vnet_rg.location}"
  resource_group_name     = "${var.vnet_rg_name}"
  
  tags {
    environment = "${var.tag_environment_name}"
  }
}

resource "azurerm_subnet_network_security_group_association" "nsgassociation0" {
  subnet_id                 = "${azurerm_subnet.subnet0.id}"
  network_security_group_id = "${data.azurerm_network_security_group.required_nsg.id}"
}

resource "azurerm_subnet_route_table_association" "routetableassociation0" {
  subnet_id                 = "${azurerm_subnet.subnet0.id}"
  route_table_id = "${azurerm_route_table.routetable0.id}"
}
# ##
resource "azurerm_subnet" "subnet1" {
  name                 = "${var.subnet_names[1]}"
  resource_group_name  = "${var.vnet_rg_name}"
  virtual_network_name = "${var.existing_vnet_name}"
  address_prefix       = "${var.subnet_prefixes[1]}"
  network_security_group_id = "${data.azurerm_network_security_group.required_nsg.id}"
  route_table_id       = "${azurerm_route_table.routetable1.id}"
}

resource "azurerm_route_table" "routetable1" {
  name     = "${var.existing_vnet_name}-${var.subnet_names[1]}-Routetable"
  location     = "${data.azurerm_resource_group.vnet_rg.location}"
  resource_group_name     = "${var.vnet_rg_name}"
  
  tags {
    environment = "${var.tag_environment_name}"
  }
}

resource "azurerm_subnet_network_security_group_association" "nsgassociation1" {
  subnet_id                 = "${azurerm_subnet.subnet1.id}"
  network_security_group_id = "${data.azurerm_network_security_group.required_nsg.id}"
}

resource "azurerm_subnet_route_table_association" "routetableassociation1" {
  subnet_id                 = "${azurerm_subnet.subnet1.id}"
  route_table_id = "${azurerm_route_table.routetable1.id}"
}
# ##
resource "azurerm_subnet" "subnet2" {
  name                 = "${var.subnet_names[2]}"
  resource_group_name  = "${var.vnet_rg_name}"
  virtual_network_name = "${var.existing_vnet_name}"
  address_prefix       = "${var.subnet_prefixes[2]}"
  network_security_group_id = "${data.azurerm_network_security_group.required_nsg.id}"
  route_table_id       = "${azurerm_route_table.routetable2.id}"
}

resource "azurerm_route_table" "routetable2" {
  name     = "${var.existing_vnet_name}-${var.subnet_names[2]}-Routetable"
  location     = "${data.azurerm_resource_group.vnet_rg.location}"
  resource_group_name     = "${var.vnet_rg_name}"
  
  tags {
    environment = "${var.tag_environment_name}"
  }
}

resource "azurerm_subnet_network_security_group_association" "nsgassociation2" {
  subnet_id                 = "${azurerm_subnet.subnet2.id}"
  network_security_group_id = "${data.azurerm_network_security_group.required_nsg.id}"
}

resource "azurerm_subnet_route_table_association" "routetableassociation2" {
  subnet_id                 = "${azurerm_subnet.subnet2.id}"
  route_table_id = "${azurerm_route_table.routetable2.id}"
}
# ##
resource "azurerm_subnet" "subnet3" {
  name                 = "${var.subnet_names[3]}"
  resource_group_name  = "${var.vnet_rg_name}"
  virtual_network_name = "${var.existing_vnet_name}"
  address_prefix       = "${var.subnet_prefixes[3]}"
  network_security_group_id = "${data.azurerm_network_security_group.required_nsg.id}"
  route_table_id       = "${azurerm_route_table.routetable3.id}"
}

resource "azurerm_route_table" "routetable3" {
  name     = "${var.existing_vnet_name}-${var.subnet_names[3]}-Routetable"
  location     = "${data.azurerm_resource_group.vnet_rg.location}"
  resource_group_name     = "${var.vnet_rg_name}"
  
  tags {
    environment = "${var.tag_environment_name}"
  }
}

resource "azurerm_subnet_network_security_group_association" "nsgassociation3" {
  subnet_id                 = "${azurerm_subnet.subnet3.id}"
  network_security_group_id = "${data.azurerm_network_security_group.required_nsg.id}"
}

resource "azurerm_subnet_route_table_association" "routetableassociation3" {
  subnet_id                 = "${azurerm_subnet.subnet3.id}"
  route_table_id = "${azurerm_route_table.routetable3.id}"
}

Debug Output

https://gist.github.com/ewierschke/075040ee240e8c51d0ddb63e1e0779ea

Panic Output

No Panic

Expected Behavior

All subnets should be created against existing VNET with appropriate association of a pre-created NSG and appropriate association of a newly created route table.

Actual Behavior

Only 3 subnets are created and terraform gets stuck in ~infinite 'Still creating...' loop. Terrform sees one of four subnets as still creating along with 3 route table associations and 2 NSG associations.

azurerm_subnet.subnet2: Still creating... (6m50s elapsed)
azurerm_subnet_route_table_association.routetableassociation0: Still creating... (6m40s elapsed)
azurerm_subnet_network_security_group_association.nsgassociation3: Still creating... (6m30s elapsed)
azurerm_subnet_route_table_association.routetableassociation3: Still creating... (6m30s elapsed)
azurerm_subnet_network_security_group_association.nsgassociation1: Still creating... (6m20s elapsed)
azurerm_subnet_route_table_association.routetableassociation1: Still creating... (6m20s elapsed)
azurerm_subnet.subnet2: Still creating... (7m0s elapsed)
azurerm_subnet_route_table_association.routetableassociation0: Still creating... (6m50s elapsed)
azurerm_subnet_route_table_association.routetableassociation3: Still creating... (6m40s elapsed)
azurerm_subnet_network_security_group_association.nsgassociation3: Still creating... (6m40s elapsed)
azurerm_subnet_route_table_association.routetableassociation1: Still creating... (6m30s elapsed)
azurerm_subnet_network_security_group_association.nsgassociation1: Still creating... (6m30s elapsed)
azurerm_subnet.subnet2: Still creating... (7m10s elapsed)

Steps to Reproduce

  1. terraform apply

Important Factoids

Not sure how important, but in a larger deployment am trying to create 12+ subnets at once (what is provided is what I have been able to narrow it down to). Was able to move the subnet and associations segment into a module and more than 3 subnets get created (~8 +/-) but still gets stuck in similar loop.

If this code is executed with the subnet3 resource and association resources commented out, the run succeeds (limiting to 3 new subnets to create).

subnet_names and subnet_prefixes are lists in my variables file.

variable "subnet_names" {
  default = [
    "subnet0", 
    "subnet1", 
    "subnet2", 
    "subnet3", 
    "subnet4", 
    "subnet5", 
    "subnet6", 
    "subnet7", 
    "subnet8", 
    "subnet9", 
    "subnet10", 
    "subnet11", 
    "subnet12", 
    "subnet13"
    ]
}

variable "subnet_prefixes" {
  default = [
    "10.7.0.0/24", 
    "10.7.1.0/24", 
    "192.168.0.0/24", 
    "192.168.1.0/24", 
    "192.168.2.0/24", 
    "192.168.3.0/24", 
    "192.168.4.0/24", 
    "192.168.5.0/24", 
    "192.168.6.0/24", 
    "192.168.7.0/24", 
    "192.168.8.0/24", 
    "192.168.9.0/24", 
    "192.168.10.0/24", 
    "192.168.11.0/24"
    ]
}

The VNET already exists with 0 subnets and two address spaces.
Two subnets in one address space and one in the other address space get successfully created.
The NSG to associate is already pre-created that is to be associated with the new subnets.

I don't appear to be hitting API rate limits per the debug output.

If the above code is executed with -parallelism=1 the apply succeeds.

Not sure what I might be missing here or if maybe there is a limitation on the Microsoft.Network virtualNetworks API?

References

  • #0000
@hramos05
Copy link

hramos05 commented Jan 28, 2019

I was experiencing the same issue, I was able to workaround this by adding depends_on for both nsg and route association

For the example, above, I would make it:
resource "azurerm_subnet_network_security_group_association" "nsgassociation3" {
subnet_id = "${azurerm_subnet.subnet3.id}"
network_security_group_id = "${data.azurerm_network_security_group.required_nsg.id}"
depends_on = ["azurerm_subnet.subnet3"]
}

resource "azurerm_subnet_route_table_association" "routetableassociation3" {
subnet_id = "${azurerm_subnet.subnet3.id}"
route_table_id = "${azurerm_route_table.routetable3.id}"
depends_on = ["azurerm_subnet.subnet3"]
}

I did it for all my associations, and it was no longer stuck

@Moeser
Copy link
Contributor

Moeser commented May 7, 2019

I'm currently running into this behavior with tf v0.11.13 and provider.azurerm v1.23.0. Adding depends_on to the associations did not help in my case. The trace log never shows tf actually sending a PUT request to create the subnet. Adding --parallelism=1 does fix it for me though.

Looking closer at the logs, it looks like a deadlock is happening with azureRMLockByName() (there's a Locking message for the vnet, but no following Locked message, and something else had successfully Locked it earlier, with no following Unlocked message. This would be a lot easier to decipher if we had locking resource names in the trace logs...)

@Moeser
Copy link
Contributor

Moeser commented May 8, 2019

I was able to get this to work by removing route_table_id and network_security_group_id from the azurerm_subnet definitions. However, that triggers another issue referenced in #2358, so I had to add a lifecycle/ignore_changes block to the subnets to prevent terraform from continually trying to reset the route_table_id and network_security_group_id.

@tombuildsstuff are those vnet locks necessary in azurerm_subnet_network_security_group_association and azurerm_subnet_route_table_association? Maybe subnet locks would be sufficient and avoid deadlocks here?

@Moeser
Copy link
Contributor

Moeser commented May 8, 2019

Since the change that made this work for me was removing route_table_id and network_security_group_id from the azurerm_subnet resource, I took a look at the azurerm_subnet code. It looks like removing those causes terraform to skip the nsg and route table locks for the subnet: https://github.com/terraform-providers/terraform-provider-azurerm/blob/54ce52395ce71bcde2e0e983e06061159faea106/azurerm/resource_arm_subnet.go#L165

Maybe it's not even the vnet lock that causes the problem, but the nsg/route table locks (or a combination of both the vnet lock and the other locks?) I would expect the dag to figure this all out and prevent the deadlocks entirely, but that doesn't seem to be happening here. @tombuildsstuff I noticed you're the original committer for those azurerm_subnet_*_association resources, so I thought maybe you'd have some ideas on how to better track down and/or avoid the deadlock here.

@xtreme-jon-ji
Copy link

I'm facing a similar issue where creating 3 subnets and their network security group associations work, but as soon as I add a 4th set, it gets stuck in an infinite waiting loop.

Both the subnet creation and NSG association creation lock on the vnet + NSG, but they each request the lock in reverse order. This appears to result in a deadlock where a subnet creation step has locked on the vnet, but is waiting for the NSG lock to become unlocked. In the meantime, an NSG association creation step has locked on the NSG, but is waiting for the vnet lock to be unlocked.

Corresponding code in the provider:
https://github.com/terraform-providers/terraform-provider-azurerm/blob/v1.30.1/azurerm/resource_arm_subnet_network_security_group_association.go#L60-L68
https://github.com/terraform-providers/terraform-provider-azurerm/blob/v1.30.1/azurerm/resource_arm_subnet.go#L148-L167

I wonder if this deadlock could be similar to what you're seeing @Moeser - hypothetically, setting parallelism=1 and/or removing the locks removes the deadlock condition.

I'm seeing this on terraform 0.11.11, with terraform-provider-azurerm_v1.30.1_x4.

Interestingly, this seems to happen frequently (approx. 80% of the time) when running on linux. I've tried to reproduce with same versions and scripts on darwin (Mac), but haven't been able to.

Here's the corresponding lock logs (with indexes added at the end of each line to indicate count)

Lock requests for the vnet:

2019-06-10T16:42:30.653Z [DEBUG] plugin.terraform-provider-azurerm_v1.30.1_x4: 2019/06/10 16:42:30 [DEBUG] Locking "azurerm_virtual_network.the-example-virtual-network" 0
2019-06-10T16:42:30.653Z [DEBUG] plugin.terraform-provider-azurerm_v1.30.1_x4: 2019/06/10 16:42:30 [DEBUG] Locked "azurerm_virtual_network.the-example-virtual-network" 0
2019-06-10T16:42:30.655Z [DEBUG] plugin.terraform-provider-azurerm_v1.30.1_x4: 2019/06/10 16:42:30 [DEBUG] Locking "azurerm_virtual_network.the-example-virtual-network" 1
2019-06-10T16:42:30.655Z [DEBUG] plugin.terraform-provider-azurerm_v1.30.1_x4: 2019/06/10 16:42:30 [DEBUG] Locking "azurerm_virtual_network.the-example-virtual-network" 2
2019-06-10T16:42:30.838Z [DEBUG] plugin.terraform-provider-azurerm_v1.30.1_x4: 2019/06/10 16:42:30 [DEBUG] Locking "azurerm_virtual_network.the-example-virtual-network" 3
2019-06-10T16:42:41.137Z [DEBUG] plugin.terraform-provider-azurerm_v1.30.1_x4: 2019/06/10 16:42:41 [DEBUG] Unlocking "azurerm_virtual_network.the-example-virtual-network" 0
2019-06-10T16:42:41.137Z [DEBUG] plugin.terraform-provider-azurerm_v1.30.1_x4: 2019/06/10 16:42:41 [DEBUG] Unlocked "azurerm_virtual_network.the-example-virtual-network" 0
2019-06-10T16:42:41.137Z [DEBUG] plugin.terraform-provider-azurerm_v1.30.1_x4: 2019/06/10 16:42:41 [DEBUG] Locked "azurerm_virtual_network.the-example-virtual-network" 1
2019-06-10T16:42:41.171Z [DEBUG] plugin.terraform-provider-azurerm_v1.30.1_x4: 2019/06/10 16:42:41 [DEBUG] Locking "azurerm_virtual_network.the-example-virtual-network" 4
2019-06-10T16:42:51.465Z [DEBUG] plugin.terraform-provider-azurerm_v1.30.1_x4: 2019/06/10 16:42:51 [DEBUG] Unlocking "azurerm_virtual_network.the-example-virtual-network" 1
2019-06-10T16:42:51.465Z [DEBUG] plugin.terraform-provider-azurerm_v1.30.1_x4: 2019/06/10 16:42:51 [DEBUG] Unlocked "azurerm_virtual_network.the-example-virtual-network" 1
2019-06-10T16:42:51.465Z [DEBUG] plugin.terraform-provider-azurerm_v1.30.1_x4: 2019/06/10 16:42:51 [DEBUG] Locked "azurerm_virtual_network.the-example-virtual-network" 2
2019-06-10T16:42:51.500Z [DEBUG] plugin.terraform-provider-azurerm_v1.30.1_x4: 2019/06/10 16:42:51 [DEBUG] Locking "azurerm_virtual_network.the-example-virtual-network" 5
2019-06-10T16:42:51.503Z [DEBUG] plugin.terraform-provider-azurerm_v1.30.1_x4: 2019/06/10 16:42:51 [DEBUG] Locking "azurerm_virtual_network.the-example-virtual-network" 6
2019-06-10T16:43:01.775Z [DEBUG] plugin.terraform-provider-azurerm_v1.30.1_x4: 2019/06/10 16:43:01 [DEBUG] Unlocking "azurerm_virtual_network.the-example-virtual-network" 2
2019-06-10T16:43:01.775Z [DEBUG] plugin.terraform-provider-azurerm_v1.30.1_x4: 2019/06/10 16:43:01 [DEBUG] Unlocked "azurerm_virtual_network.the-example-virtual-network" 2
2019-06-10T16:43:01.775Z [DEBUG] plugin.terraform-provider-azurerm_v1.30.1_x4: 2019/06/10 16:43:01 [DEBUG] Locked "azurerm_virtual_network.the-example-virtual-network" 3

Logs for requester #3:
2019-06-10T16:42:30.838Z [DEBUG] plugin.terraform-provider-azurerm_v1.30.1_x4: 2019/06/10 16:42:30 [INFO] preparing arguments for Azure ARM Subnet creation.
2019-06-10T16:42:30.838Z [DEBUG] plugin.terraform-provider-azurerm_v1.30.1_x4: 2019/06/10 16:42:30 [DEBUG] Locking "azurerm_virtual_network.the-example-virtual-network"

...

2019-06-10T16:43:01.775Z [DEBUG] plugin.terraform-provider-azurerm_v1.30.1_x4: 2019/06/10 16:43:01 [DEBUG] Locked "azurerm_virtual_network.luckylake9847-virtual-network"
2019-06-10T16:43:01.775Z [DEBUG] plugin.terraform-provider-azurerm_v1.30.1_x4: 2019/06/10 16:43:01 [DEBUG] Locking "azurerm_network_security_group.the-example-security-group"

Lock requests for the NSG:

2019-06-10T16:42:20.120Z [DEBUG] plugin.terraform-provider-azurerm_v1.30.1_x4: 2019/06/10 16:42:20 [DEBUG] Locking "azurerm_network_security_group.the-example-security-group" 0
2019-06-10T16:42:20.120Z [DEBUG] plugin.terraform-provider-azurerm_v1.30.1_x4: 2019/06/10 16:42:20 [DEBUG] Locked "azurerm_network_security_group.the-example-security-group" 0
2019-06-10T16:42:30.818Z [DEBUG] plugin.terraform-provider-azurerm_v1.30.1_x4: 2019/06/10 16:42:30 [DEBUG] Unlocking "azurerm_network_security_group.the-example-security-group" 0
2019-06-10T16:42:30.818Z [DEBUG] plugin.terraform-provider-azurerm_v1.30.1_x4: 2019/06/10 16:42:30 [DEBUG] Unlocked "azurerm_network_security_group.the-example-security-group" 0
2019-06-10T16:42:41.171Z [DEBUG] plugin.terraform-provider-azurerm_v1.30.1_x4: 2019/06/10 16:42:41 [DEBUG] Locking "azurerm_network_security_group.the-example-security-group" 1
2019-06-10T16:42:41.171Z [DEBUG] plugin.terraform-provider-azurerm_v1.30.1_x4: 2019/06/10 16:42:41 [DEBUG] Locked "azurerm_network_security_group.the-example-security-group" 1
2019-06-10T16:43:01.775Z [DEBUG] plugin.terraform-provider-azurerm_v1.30.1_x4: 2019/06/10 16:43:01 [DEBUG] Locking "azurerm_network_security_group.the-example-security-group" 2
2019-06-10T16:43:01.826Z [DEBUG] plugin.terraform-provider-azurerm_v1.30.1_x4: 2019/06/10 16:43:01 [DEBUG] Locking "azurerm_network_security_group.the-example-security-group" 3

Logs for requester #1:
2019-06-10T16:42:41.171Z [DEBUG] plugin.terraform-provider-azurerm_v1.30.1_x4: 2019/06/10 16:42:41 [INFO] preparing arguments for Subnet <-> Network Security Group Association creation.
2019-06-10T16:42:41.171Z [DEBUG] plugin.terraform-provider-azurerm_v1.30.1_x4: 2019/06/10 16:42:41 [DEBUG] Locking "azurerm_network_security_group.the-example-security-group"
2019-06-10T16:42:41.171Z [DEBUG] plugin.terraform-provider-azurerm_v1.30.1_x4: 2019/06/10 16:42:41 [DEBUG] Locked "azurerm_network_security_group.the-example-security-group"
2019-06-10T16:42:41.171Z [DEBUG] plugin.terraform-provider-azurerm_v1.30.1_x4: 2019/06/10 16:42:41 [DEBUG] Locking "azurerm_virtual_network.the-example-virtual-network"

@Moeser
Copy link
Contributor

Moeser commented Jun 14, 2019

I wonder if this deadlock could be similar to what you're seeing @Moeser - hypothetically, setting parallelism=1 and/or removing the locks removes the deadlock condition.

Yes, you appear to be running into the same deadlock as me. If you'd like, you could try building the azurerm module with the changes from my pull request #3673 to verify that it fixes the bug for you.

Interestingly, this seems to happen frequently (approx. 80% of the time) when running on linux. I've tried to reproduce with same versions and scripts on darwin (Mac), but haven't been able to.

This definitely happens to me on darwin too. You might want to try forcing parallelism back to default (10) on your mac. There are combinations of parallelism that will avoid this bug, such as 1, or larger than subnets * 3 (which is why the default of 10 triggers on 4 subnets but not 3). Parallelism values that are exact multiples of 3 (such as 9, 12, etc.) may be less likely to trigger the bug as well, but I haven't verified that.

@ewierschke
Copy link
Author

Thank you very much for the fix in provider version 1.33.1... I'm not sure if there is still a lingering issue.
The network_security_group_id and route_table_id parameters on the azurerm_subnet resource are marked as deprecated however, when commenting out those two parameters in my above configuration I get into a state where each terraform apply results in the nsg and route table removing and then re-adding. (ie run terraform apply, result=nsg/route table attached to subnet... re-run terraform apply, result=nsg/route table detached) If I leave the deprecated paramters uncommented, the apply cycle completes as expected, keeping those resources attached to the subnet. Are those network_security_group_id and route_table_id parameters still intended to be deprecated?

@Moeser
Copy link
Contributor

Moeser commented Aug 30, 2019

Hi @ewierschke , the deprecated notices and cycling behavior you are seeing are part of a separate issue and more clearly documented in issue #3054 . The summary is that the next major (2.0) release of the azurerm provider will have a behavioral change to how subnets are associated with nsgs and route tables. The warnings could probably use a link or some better info on the expected way to define those resources until 2.0 comes out, but again, that's a separate issue from the deadlocks discussed here.

@Moeser
Copy link
Contributor

Moeser commented Sep 18, 2019

@tombuildsstuff we should re-open this since the locking change was rolled back.

@pedrohdz
Copy link
Contributor

I tried removing the network_security_group_id from my azurerm_subnet resource and the wheels are still spinning anyways (still stuck waiting).

Terraform v0.11.14
+ provider.azurerm v1.34.0

@ak58588
Copy link

ak58588 commented Oct 1, 2019

I am facing something similar, as I described in the issue #4471, I have 4 azurerm_subnet resources already existing and want to attach each subnet to the only network_security_group by creating azurerm_subnet_network_security_group_association and adding network_security_group_id as a property to each subnet. Several subnets are stuck in status "Still modifying..." forever as well as some associations are stuck forever in status "Still creating...". My current workaround is to add the network_security_group_id as a property to subnet ONLY, without creating the associations.

@pedrohdz
Copy link
Contributor

pedrohdz commented Oct 1, 2019

I have found that pinning to v1.33.0 resolves the issue, as a work around.

@ak58588
Copy link

ak58588 commented Oct 2, 2019

@pedrohdz have just tested on 1.33.0 with 4 subnets and it is still the same - the resources are still stuck being modified/created.

@sorenhansendk
Copy link

sorenhansendk commented Oct 2, 2019

I have exactly the same issue when I create new subnets with a security group association. Will it help to use the (deprecated) network_security_group_id property on the azurerm_subnet resource?

UPDATE: I have now tested with the deprecated property and it works much better 🎉 When will we be unable to use the property network_security_group_id on azurerm_subnet?

@ClaudiaBaur
Copy link

Issue still open. As we experience this issue in all our environments, can you please work on a fix?
Thanks!

@sorenhansendk
Copy link

@ClaudiaBaur I have temporary "fixed" it with the deprecated property: network_security_group_id - you can also do that, until a fix is released 😄

@ClaudiaBaur
Copy link

@sorenhansendk, yes we already use the workaround in all environments. However, this fct is marked as deprecated and our landscapes are live, please make sure that it remains in for some more time... Might we get a problem when updating the Azure SDKs in near future or do you make sure that nothing breaks? Do you have kind of a roadmap for deprecation and/or fix?
Thanks,

@whites11
Copy link

I am not sure if we're facing this issue as well.
We're trying to create 4 subnets and the process gets stuck at creating network interfaces using version > 1.33.1
Interesting fact, it fails 100% of the time at first execution, but it succeeds most of the time with a second try.
It is 100% replicable in our case so I'm able to test any patches.

@dubuc
Copy link

dubuc commented Oct 31, 2019

We seem to also be hit by this when we try to use any version >1.33.0.

Currently using terraform 0.11.14

@Moeser
Copy link
Contributor

Moeser commented Nov 14, 2019

@dubuc have you tried v1.36.0 ? There was a change that should fix some of the deadlocks in pull 4501

@Moeser
Copy link
Contributor

Moeser commented Nov 14, 2019

Anyone still running into this issue, try 1.36.0 or newer. The change in #4501 should have reduced the deadlocks

@dubuc
Copy link

dubuc commented Dec 6, 2019

I believe we hit the same issue with 1.36.0. I will use the latest release and try again tomorrow when I get to the office. @Moeser sorry about the delay, and thank you for the reply.

@jeremybeavon
Copy link

I've found a work-around inspired by the comment by @Moeser than it works with parallelism turned off. If you use depends_on to avoid parallelism, it doesn't hang. If you each subnet dependent on the previous subnet and each network security group association dependent on the previous one. For example, the following snippet shows dependencies that avoid parallelism. There are 4 subnets: web, management, netscaler and default. Based on the dependencies, they will be in that order.

resource "azurerm_subnet" "web" {
  name                      = "web"
  ...
  network_security_group_id = azurerm_network_security_group.web_subnet_nsg.id
}

resource "azurerm_subnet" "management" {
  name                      = "management"
  ...
  network_security_group_id = azurerm_network_security_group.management_subnet_nsg.id
  depends_on                = [azurerm_subnet.web]
}

resource "azurerm_subnet" "netscaler" {
  name                      = "netscaler"
  ...
  network_security_group_id = azurerm_network_security_group.netscaler_subnet_nsg.id
  depends_on                = [azurerm_subnet.management]
}

resource "azurerm_subnet" "default" {
  name                      = "default"
  ...
  network_security_group_id = azurerm_network_security_group.default_subnet_nsg.id
  depends_on                = [azurerm_subnet.netscaler]
}

resource "azurerm_subnet_network_security_group_association" "default" {
  subnet_id                 = azurerm_subnet.default.id
  network_security_group_id = azurerm_network_security_group.default_subnet_nsg.id
  depends_on                = [azurerm_subnet.default]
}

resource "azurerm_subnet_network_security_group_association" "web" {
  subnet_id                 = azurerm_subnet.web.id
  network_security_group_id = azurerm_network_security_group.web_subnet_nsg.id
  depends_on                = [azurerm_subnet_network_security_group_association.default]
}

resource "azurerm_subnet_network_security_group_association" "management" {
  subnet_id                 = azurerm_subnet.management.id
  network_security_group_id = azurerm_network_security_group.management_subnet_nsg.id
  depends_on                = [azurerm_subnet_network_security_group_association.web]
}

resource "azurerm_subnet_network_security_group_association" "netscaler" {
  subnet_id                 = azurerm_subnet.netscaler.id
  network_security_group_id = azurerm_network_security_group.netscaler_subnet_nsg.id
  depends_on                = [azurerm_subnet_network_security_group_association.management]
}

@dubuc
Copy link

dubuc commented Dec 16, 2019

@embik we should try this approach

@Moeser
Copy link
Contributor

Moeser commented Dec 17, 2019

Forcing the resources to serialize via depends_on helps, but people who are using count to make multiple copies will find that harder to do. Serializing via depends_on is a good temporary fix if that works for you.

The locks were introduced to work around a bug in the azure API, where it will return a 409 error if multiple subnet/nsg/route changes are made at the same time. The locks work around the issue by forcing the vnet related resources to serialize. I still think the long term solution is to reduce the locks to one or remove them entirely, but that can't happen until the 409 errors are handled in a retryable way.

@slawchod
Copy link

slawchod commented Jan 7, 2020

Workaround is also difficult to be applied when you use multiple modules in your code to create subnet with NSG & UDR.

If these modules run in parallel, you are in problem as depends_on does not work with module.

@mough
Copy link

mough commented Jan 9, 2020

I was experiencing this hang as well, upgrading to 1.40.0 seems to have resolved my deadlocking issue. https://github.com/terraform-providers/terraform-provider-azurerm/releases

@Tbohunek
Copy link
Contributor

Hi there,

I'm with 1.40.0 as well and I have similar issue. I figured it always happens when I terraform apply subnet creation and nsg association at once. More specifically, if it's two or more of each.
I was creating 10 subnets and then 10 associations, and it managed to create two subnets before it started creating NSG associations for them. That was exactly the point when it started to hang.

I had to add this crazy

depends_on = ["azurerm_subnet.subnet1","azurerm_subnet.subnet2","azurerm_subnet.subnet3","azurerm_subnet.subnet4","azurerm_subnet.subnet5","azurerm_subnet.subnet6","azurerm_subnet.subnet7","azurerm_subnet.subnet8","azurerm_subnet.subnet9","azurerm_subnet.subnet10"]

into

resource "azurerm_subnet_network_security_group_association" "nsgAssociation1" {
  subnet_id                 = azurerm_subnet.subnet1.id
  network_security_group_id = azurerm_network_security_group.nsg1.id
 }

Is it known yet what is causing such deadlocks? Is it too much parallelism?

@ak58588
Copy link

ak58588 commented Jan 17, 2020

The version 1.40.0 doesn't solve our deadlock issue. :(

@elsesiy
Copy link

elsesiy commented Feb 12, 2020

Same issue exists in 1.43.0, only works with the proposed workaround of forcing the sequential creation of the respective resources.

@sdcscripts
Copy link

Sadly I have experienced this in AzureRM 2.0.0 . It loops forever during a destroy of 1 nic and 1 nsg association. Interestingly, it only seems to happen when I have a azurerm firewall that is also attached to the same vnet. The firewall gets deleted successfully , but this lock issue occurs in any case relating to the nic and nsg assoc. If I remove the firewall from the config, all is well every time.

@sdcscripts
Copy link

Same can be said for nat gateway - if you try to create one whilst also creating several NICs for instance, and you get a deadlock condition. Definitely an issue. If I add depends_on to the creation of the azurerm_nat_gateway specifying the NICs it goes through just fine.

@samartzidis
Copy link

samartzidis commented Apr 28, 2020

Same deadlock issue in azurerm 2.7.0. I could only resolve it by serializing the creation of all azurerm_subnet_network_security_group_association and azurerm_subnet_nat_gateway_association items.

@Dmitry1987
Copy link

same issue as @samartzidis , I have both azurerm_subnet_network_security_group_association and azurerm_subnet_nat_gateway_association, 3 of each for 3 subnets. Have to comment them out, create all stack, then uncomment and add the associations.

Will try the "depends_on" now, thanks for suggestions :)

@sanderginn
Copy link

sanderginn commented Dec 9, 2020

I am still running into this issue with 2.39.0, even when using depends_on for all subnet/xxx associations. I'm out of ideas and this is preventing my CI job from terminating successfully, so I would really appreciate some input.

Edit: sorry, my bad, it turns out I have overlooked one resource. depends_on does seem to do the trick for now but it would be nice to have a long term solution eventually.

@sanderginn
Copy link

Nope, cheered too soon. Deadlock is still occurring even with depends_on on all subnet related resources.

@kellystuard
Copy link

Still an issue in 2.67.0, even when using depends_on. Goes away if I reduce the count of azurerm_subnet_network_security_group_association from two (2) to one (1).

@tombuildsstuff / @katbyte, could you take a look at @sanderginn PR #12267, please, to add your knowledge of if performing subnet locks after vnet locks, instead of before, solves the issue? If not, could you recommend who would be the best person to review it?

@jackofallops jackofallops added this to the v2.68.0 milestone Jul 15, 2021
@github-actions
Copy link

This functionality has been released in v2.68.0 of the Terraform Provider. Please see the Terraform documentation on provider versioning or reach out if you need any assistance upgrading.

For further feature requests or bug reports with this functionality, please create a new GitHub issue following the template. Thank you!

@github-actions
Copy link

I'm going to lock this issue because it has been closed for 30 days ⏳. This helps our maintainers find and focus on the active issues.
If you have found a problem that seems similar to this, please open a new issue and complete the issue template so we can capture all the details necessary to investigate further.

@github-actions github-actions bot locked as resolved and limited conversation to collaborators Aug 16, 2021
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.