Azurerm VM refresh with delete_os_disk_on_termination=true is failing with a cannot find storage account error. #102

hashibot · 2017-06-13T21:14:34Z

This issue was originally opened by @djsly as hashicorp/terraform#15228. It was migrated here as part of the provider split. The original body of the issue is below.

Terraform Version

0.9.8

Affected Resource(s)

Please list the resources as a list, for example:

azurerm_virtual_machine

Terraform Configuration Files

provider "azurerm" {
  subscription_id = "${var.subscription_id}"
  client_id       = "${var.client_id}"
  client_secret   = "${var.client_secret}"
  tenant_id       = "${var.tenant_id}"
}

# Create a resource group
resource "azurerm_resource_group" "test" {
  name     = "${var.resource_group}"
  location = "${var.azure_location}"
}

resource "azurerm_virtual_network" "test" {
    name = "slyvnet"
    address_space = "${var.vnet_address_space}"
    location = "${var.azure_location}"
    resource_group_name = "${azurerm_resource_group.test.name}"
}

resource "azurerm_subnet" "test" {
    name = "slyvnetsub"
    resource_group_name = "${azurerm_resource_group.test.name}"
    virtual_network_name = "${azurerm_virtual_network.test.name}"
    address_prefix = "10.1.0.0/24"
}

resource "azurerm_network_interface" "test" {
    count = "${var.counts}"
    name = "slyni${count.index}"
    location = "${var.azure_location}"
    resource_group_name = "${azurerm_resource_group.test.name}"

    ip_configuration {
        name = "testconfiguration1"
        subnet_id = "${azurerm_subnet.test.id}"
        private_ip_address_allocation = "dynamic"
    }
}

resource "azurerm_storage_account" "test" {
    count = "${var.counts}"
    name = "slysa${count.index}"
    resource_group_name = "${azurerm_resource_group.test.name}"
    location = "${var.azure_location}"
    account_type = "Standard_LRS"

}

resource "azurerm_storage_container" "test" {
    count = "${var.counts}"
    name = "vhds"
    resource_group_name = "${azurerm_resource_group.test.name}"
    storage_account_name = "${azurerm_storage_account.test.*.name[count.index]}"
    container_access_type = "private"
}

resource "azurerm_virtual_machine" "test" {
    count = "${var.counts}"
    name = "slyvm${count.index}"
    location = "${var.azure_location}"
    resource_group_name = "${azurerm_resource_group.test.name}"
    network_interface_ids = ["${azurerm_network_interface.test.*.id[count.index]}"]
    vm_size = "Standard_A0"
    delete_os_disk_on_termination = "true"

    storage_image_reference {
        publisher = "Canonical"
        offer = "UbuntuServer"
        sku = "14.04.2-LTS"
        version = "latest"
    }

    storage_os_disk {
        name = "myosdisk1"
        vhd_uri = "${azurerm_storage_account.test.*.primary_blob_endpoint[count.index]}${azurerm_storage_container.test.*.name[count.index]}/myosdisk1.vhd"
        caching = "ReadWrite"
        create_option = "FromImage"
    }

    os_profile {
        computer_name = "hostnamee${count.index}"
        admin_username = "testadmin"
        admin_password = "Password1234!"
    }

    os_profile_linux_config {
        disable_password_authentication = false
    }
}

Debug Output

https://gist.github.com/djsly/11300a541a92432002a843509b1fb1ed

Expected Behavior

the VM refresh should delete the os_disk and proceed with the deletion

Actual Behavior

Errors out trying to delete the blob

Steps to Reproduce

Please list the steps required to reproduce the issue, for example:

terraform apply
change the hostname of the VM
terraform apply

The text was updated successfully, but these errors were encountered:

tombuildsstuff · 2017-07-07T22:47:54Z

Hey @djsly

Thanks for opening this issue :)

I've spent some time looking into this but I'm struggling to reproduce this issue - the error message being returned states that the Storage Account doesn't exist (or there's an eventual consistency bug in the API) however I'd expect to be able to reproduce this (and I've been unsuccessful so far).

So that we can investigate this further - would you be able to answer the following:

it appears you're using a script to invoke Terraform - out of interest is it possible that this was run twice?
is it possible that the Storage Account was deleted via another means (i.e. in the portal?)

Thanks!

bpoland · 2017-07-10T14:27:51Z

I am also seeing this issue pretty frequently. We are using an outside script to invoke Terraform but definitely not running it twice. And I have confirmed that the storage account was not deleted outside Terraform before running the destroy (the OS disk for the VM being destroyed is in that storage account so I wouldn't have been able to delete it before the VM was destroyed).

Interestingly enough, a second attempt to destroy seems to succeed every time, so this does seem like some sort of consistency/timing issue.

I am going to see if I can reproduce when running terraform manually to destroy.

djsly · 2017-07-10T16:05:56Z

I still can reproduce using the exact same Config File

Error applying plan:

1 error(s) occurred:

* azurerm_virtual_machine.test (destroy): 1 error(s) occurred:

* azurerm_virtual_machine.test: Error deleting OS Disk VHD: Error finding resource group for storage account slysa0: Wrong number of results making resource request for query name eq 'slysa0' and resourceType eq 'Microsoft.Storage/storageAccounts': 0

Terraform does not automatically rollback in the face of errors.
Instead, your Terraform state file has been partially updated with
any resources that successfully completed. Please address the error
above and apply again to incrementally change your infrastructure.
✘-1 ~/github/sylvain_boily/terraform-playground/issue-102 [master|…2]

djsly · 2017-07-10T16:06:59Z

the provided configuration files from the OP are missing this

variable "subscription_id" {}
variable "client_id" {}
variable "client_secret" {}
variable "tenant_id" {}

variable "counts" {
 default = "1"
}
variable "resource_group" {
 default = "test"
}
variable "azure_location" {
 default = "eastus2"
}
variable "vnet_address_space" {
 default = ["10.1.0.0/24"]
}

provider "azurerm" {
  subscription_id = "${var.subscription_id}"
  client_id       = "${var.client_id}"
  client_secret   = "${var.client_secret}"
  tenant_id       = "${var.tenant_id}"
}

djsly · 2017-07-10T16:16:09Z

what I did was to run terraform apply, once completed, edited the azure.tf to update the name of the VM

resource "azurerm_virtual_machine" "test" {
    count = "${var.counts}"
    name = "<NEWNAME>${count.index}"

and reran terraform apply

bpoland · 2017-07-10T16:42:47Z

In my case, there are no changes to the VM name or anything. I've just deployed a VM and then later want to destroy it, and that's when I see the error. Then when I try destroying again, it succeeds.

bpoland · 2017-07-10T17:18:43Z

Actually I realized we are detaching a secondary disk right before we delete the VM (this is done using the azure CLI). I wonder if Azure is still propagating that change when the delete comes in?

I am going to try a delete without the secondary disk involved, and one with the secondary disk, and see if that seems to be related.

@djsly did you make any changes to your VM or its configuration outside of terraform?

bpoland · 2017-07-10T17:37:47Z

I was able to reproduce even without a secondary disk, so that doesn't seem to be it. I created a VM with terraform, waited for a few minutes and then ran "terraform destroy" and saw the issue. I am using a custom VHD file for my VMs, could that be it? It looks like @djsly is also using a custom VHD file.

One other interesting thing -- I noticed that when this error happens, even after I run terraform again to destroy, my VM's OS disk still remains in the storage account (I have delete_os_disk_on_termination set to true). Is this error happening when terraform tries to delete the OS disk after terminating the VM? It seems like the second time through, during the refresh it doesn't find the VM in Azure and so it doesn't try again to destroy the OS disk?

djsly · 2017-07-10T17:46:28Z

@bpoland

@djsly did you make any changes to your VM or its configuration outside of terraform?

No, I only use Terraform CLI and never log on to the portal.

It looks like @djsly is also using a custom VHD file.

I'm using the official Ubuntu Image as my Base Image for the sake of this example. So no custom VHD

bpoland · 2017-07-10T17:54:07Z

Ah sorry I see the Ubuntu image in your terraform config above.

@tombuildsstuff did you have delete_os_disk_on_termination = "true" when you were trying it out?

I am struggling to find any common "weird stuff" between @djsly and my configs that could explain why we are the only ones seeing this.

I started to see this issue maybe 3 weeks ago or so, and it didn't seem to be triggered by any changes to my configs (or a new version of terraform). So I was thinking maybe something changed on the Azure side. I just added a retry since that seemed to work (and hoped that Azure would fix things). It would still be nice to know for sure.

djsly · 2017-07-10T18:10:18Z

FYI: I simply used the official Azure example from terraform's website and I added delete_os_disk_on_termination = "true"

bpoland · 2017-07-27T13:32:42Z

I've pasted some debug output that I'm getting here: https://gist.github.com/bpoland/dd300ccc387a1671b060d01adb4734e6

A colleague noticed that the response from Azure includes no results but does include a "nextLink" -- is it possible the results are paginated and terraform needs to get the next "page" of results to find the storage account? The Azure subscription I'm working in has a lot of resources so maybe others don't see this if they have fewer resources. @djsly are there a lot of resources in the Azure subscription you're using?

djsly · 2017-07-27T14:30:02Z

I'm not sure what a lot refers too :) but we have a total of 975 items and 156 storage accounts.

I guess it could be identified as a lot hehe

bpoland · 2017-07-27T14:31:13Z

Haha yeah hard to say what "a lot" is :)

@tombuildsstuff when you were trying to reproduce, how many resources did you have in your azure subscription? Any thoughts about the pagination? Thanks!

JunyiYi · 2018-03-26T23:58:45Z

Hi @djsly , I used your tf files and the following steps:

run terraform apply
Change VM name
run terraform apply

But the issue is not reproduced Apply complete! Resources: 1 added, 0 changed, 1 destroyed. Can you confirm whether the issue still exists?

djsly · 2018-03-27T00:05:14Z

Hi @JunyiYi , we moved to Managed disk so we haven't exercised this logic path for a while. I do not mind closing it as it was probably fixed by now :)

bpoland · 2018-03-27T13:06:42Z

Has anyone made any changes that they think should fix this problem? I think you need to be using a subscription with a lot of storage accounts in order to see the problem, because some results coming back from Azure are paginated and that causes terraform to not be able to find the storage account.

JunyiYi · 2018-03-27T18:32:51Z

Thanks @djsly, let me close this issue now. @bpoland , my subscription contains 46 storage accounts. And could you please create a new issue with your terraform HCL and reproduce steps. Let's track only one issue in this thread. Thanks.

bpoland · 2018-03-27T19:25:56Z

@JunyiYi the issue I experienced is the exact one that @djsly reported in this issue. It seems that in order to reproduce you need to have a large number of storage accounts in the subscription. Could you try creating 50 or 100 more storage accounts temporarily in your subscription and then see if you can reproduce it?

djsly · 2018-03-27T19:45:13Z

@bpoland is correct, we used to have over 200 storage account (one per VM)

ljfranklin · 2018-05-01T15:59:56Z

We're still seeing this as well under Terraform v0.11.3, not sure what the provider version was. Same boat as everyone else, destroy fails when delete_os_disk_on_termination=true and we have a large number of storage accounts (>100).

bpoland · 2018-05-01T17:06:58Z

@JunyiYi @tombuildsstuff would you be able to reopen this issue since it was never actually fixed?

markround · 2018-09-13T17:59:48Z

I know it's bad form to "bump" or add a "me too", but I just ran into this bug. Please could it be re-opened as it's not fixed?

I have some 60-odd Azure storage accounts holding disk images, delete_os_disk_on_termination=true in the VM config and am seeing this every time when I try and delete a VM which has it's disk image in a storage account name starting with an x, but others (e.g. starting with d or whatever) work fine.

It therefore looks to be exactly the same issue with Terraform not paginating results returned from the Azure API so it assumes the storage account does not exist.

bpoland · 2018-09-13T19:55:20Z

I ended up moving to managed disks but we still had problems with it before we switched. My workaround was to add a separate azurerm_storage_blob resource for the OS disk:

resource "azurerm_storage_blob" "vm_os" {
    name = "${var.vm_name}-os.vhd"
    resource_group_name = "${var.azure_resource_group}"
    storage_account_name = "${var.storage_account_name}"
    storage_container_name = "vhds"
}

Then in the VM itself turn delete_os_disk_on_termination off and add depends_on = [ "azurerm_storage_blob.vm_os" ]

But this is absolutely still an issue with the provider.

markround · 2018-09-20T12:45:00Z

@JunyiYi Could this be re-opened please ? Or would you prefer me to create a new (duplicate) issue ? As mentioned above, we're seeing this exact same issue, and for various reasons cannot move to managed disks to work around the problem, or switch to a separate blob store resource.

ghost · 2019-03-06T14:34:19Z

I'm going to lock this issue because it has been closed for 30 days ⏳. This helps our maintainers find and focus on the active issues.

If you feel this issue should be reopened, we encourage creating a new issue linking back to this one for added context. If you feel I made an error 🤖 🙉 , please reach out to my human friends 👉 [email protected]. Thanks!

hashibot added the bug label Jun 13, 2017

tombuildsstuff added the waiting-response label Jul 7, 2017

grubernaut removed the waiting-response label Jul 28, 2017

rcarun added service/virtual-machine M1 labels Oct 11, 2017

rcarun added this to the M1 milestone Oct 11, 2017

VaijanathB assigned metacpp Feb 13, 2018

achandmsft added technical-debt and removed M1 labels Mar 8, 2018

achandmsft modified the milestones: M1, 1.4.0 Mar 8, 2018

achandmsft added microsoft/1 msft and removed msft technical-debt labels Mar 8, 2018

achandmsft added M1-priority and removed M1 labels Mar 8, 2018

achandmsft unassigned metacpp Mar 10, 2018

metacpp assigned JunyiYi Mar 21, 2018

JunyiYi added waiting-response not-reproduced labels Mar 26, 2018

JunyiYi removed the waiting-response label Mar 27, 2018

JunyiYi closed this as completed Mar 27, 2018

tombuildsstuff modified the milestones: 1.4.0, 1.3.3 Apr 17, 2018

ghost locked and limited conversation to collaborators Mar 6, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Azurerm VM refresh with delete_os_disk_on_termination=true is failing with a cannot find storage account error. #102

Azurerm VM refresh with delete_os_disk_on_termination=true is failing with a cannot find storage account error. #102

hashibot commented Jun 13, 2017

tombuildsstuff commented Jul 7, 2017

bpoland commented Jul 10, 2017

djsly commented Jul 10, 2017

djsly commented Jul 10, 2017

djsly commented Jul 10, 2017

bpoland commented Jul 10, 2017

bpoland commented Jul 10, 2017

bpoland commented Jul 10, 2017

djsly commented Jul 10, 2017

bpoland commented Jul 10, 2017

djsly commented Jul 10, 2017

bpoland commented Jul 27, 2017 •

edited

Loading

djsly commented Jul 27, 2017

bpoland commented Jul 27, 2017

JunyiYi commented Mar 26, 2018

djsly commented Mar 27, 2018

bpoland commented Mar 27, 2018

JunyiYi commented Mar 27, 2018 •

edited

Loading

bpoland commented Mar 27, 2018

djsly commented Mar 27, 2018

ljfranklin commented May 1, 2018

bpoland commented May 1, 2018

markround commented Sep 13, 2018

bpoland commented Sep 13, 2018 •

edited

Loading

markround commented Sep 20, 2018

ghost commented Mar 6, 2019

Azurerm VM refresh with delete_os_disk_on_termination=true is failing with a cannot find storage account error. #102

Azurerm VM refresh with delete_os_disk_on_termination=true is failing with a cannot find storage account error. #102

Comments

hashibot commented Jun 13, 2017

Terraform Version

Affected Resource(s)

Terraform Configuration Files

Debug Output

Expected Behavior

Actual Behavior

Steps to Reproduce

tombuildsstuff commented Jul 7, 2017

bpoland commented Jul 10, 2017

djsly commented Jul 10, 2017

djsly commented Jul 10, 2017

djsly commented Jul 10, 2017

bpoland commented Jul 10, 2017

bpoland commented Jul 10, 2017

bpoland commented Jul 10, 2017

djsly commented Jul 10, 2017

bpoland commented Jul 10, 2017

djsly commented Jul 10, 2017

bpoland commented Jul 27, 2017 • edited Loading

djsly commented Jul 27, 2017

bpoland commented Jul 27, 2017

JunyiYi commented Mar 26, 2018

djsly commented Mar 27, 2018

bpoland commented Mar 27, 2018

JunyiYi commented Mar 27, 2018 • edited Loading

bpoland commented Mar 27, 2018

djsly commented Mar 27, 2018

ljfranklin commented May 1, 2018

bpoland commented May 1, 2018

markround commented Sep 13, 2018

bpoland commented Sep 13, 2018 • edited Loading

markround commented Sep 20, 2018

ghost commented Mar 6, 2019

bpoland commented Jul 27, 2017 •

edited

Loading

JunyiYi commented Mar 27, 2018 •

edited

Loading

bpoland commented Sep 13, 2018 •

edited

Loading