Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Azurerm VM refresh with delete_os_disk_on_termination=true is failing with a cannot find storage account error. #102

Closed
hashibot opened this issue Jun 13, 2017 · 26 comments · Fixed by #3838

Comments

@hashibot
Copy link

This issue was originally opened by @djsly as hashicorp/terraform#15228. It was migrated here as part of the provider split. The original body of the issue is below.


Terraform Version

0.9.8

Affected Resource(s)

Please list the resources as a list, for example:

  • azurerm_virtual_machine

Terraform Configuration Files

provider "azurerm" {
  subscription_id = "${var.subscription_id}"
  client_id       = "${var.client_id}"
  client_secret   = "${var.client_secret}"
  tenant_id       = "${var.tenant_id}"
}

# Create a resource group
resource "azurerm_resource_group" "test" {
  name     = "${var.resource_group}"
  location = "${var.azure_location}"
}

resource "azurerm_virtual_network" "test" {
    name = "slyvnet"
    address_space = "${var.vnet_address_space}"
    location = "${var.azure_location}"
    resource_group_name = "${azurerm_resource_group.test.name}"
}

resource "azurerm_subnet" "test" {
    name = "slyvnetsub"
    resource_group_name = "${azurerm_resource_group.test.name}"
    virtual_network_name = "${azurerm_virtual_network.test.name}"
    address_prefix = "10.1.0.0/24"
}

resource "azurerm_network_interface" "test" {
    count = "${var.counts}"
    name = "slyni${count.index}"
    location = "${var.azure_location}"
    resource_group_name = "${azurerm_resource_group.test.name}"

    ip_configuration {
        name = "testconfiguration1"
        subnet_id = "${azurerm_subnet.test.id}"
        private_ip_address_allocation = "dynamic"
    }
}

resource "azurerm_storage_account" "test" {
    count = "${var.counts}"
    name = "slysa${count.index}"
    resource_group_name = "${azurerm_resource_group.test.name}"
    location = "${var.azure_location}"
    account_type = "Standard_LRS"

}

resource "azurerm_storage_container" "test" {
    count = "${var.counts}"
    name = "vhds"
    resource_group_name = "${azurerm_resource_group.test.name}"
    storage_account_name = "${azurerm_storage_account.test.*.name[count.index]}"
    container_access_type = "private"
}

resource "azurerm_virtual_machine" "test" {
    count = "${var.counts}"
    name = "slyvm${count.index}"
    location = "${var.azure_location}"
    resource_group_name = "${azurerm_resource_group.test.name}"
    network_interface_ids = ["${azurerm_network_interface.test.*.id[count.index]}"]
    vm_size = "Standard_A0"
    delete_os_disk_on_termination = "true"

    storage_image_reference {
        publisher = "Canonical"
        offer = "UbuntuServer"
        sku = "14.04.2-LTS"
        version = "latest"
    }

    storage_os_disk {
        name = "myosdisk1"
        vhd_uri = "${azurerm_storage_account.test.*.primary_blob_endpoint[count.index]}${azurerm_storage_container.test.*.name[count.index]}/myosdisk1.vhd"
        caching = "ReadWrite"
        create_option = "FromImage"
    }

    os_profile {
        computer_name = "hostnamee${count.index}"
        admin_username = "testadmin"
        admin_password = "Password1234!"
    }

    os_profile_linux_config {
        disable_password_authentication = false
    }
}

Debug Output

https://gist.github.com/djsly/11300a541a92432002a843509b1fb1ed

Expected Behavior

the VM refresh should delete the os_disk and proceed with the deletion

Actual Behavior

Errors out trying to delete the blob

Steps to Reproduce

Please list the steps required to reproduce the issue, for example:

  1. terraform apply
  2. change the hostname of the VM
  3. terraform apply
@hashibot hashibot added the bug label Jun 13, 2017
@tombuildsstuff
Copy link
Contributor

Hey @djsly

Thanks for opening this issue :)

I've spent some time looking into this but I'm struggling to reproduce this issue - the error message being returned states that the Storage Account doesn't exist (or there's an eventual consistency bug in the API) however I'd expect to be able to reproduce this (and I've been unsuccessful so far).

So that we can investigate this further - would you be able to answer the following:

  • it appears you're using a script to invoke Terraform - out of interest is it possible that this was run twice?
  • is it possible that the Storage Account was deleted via another means (i.e. in the portal?)

Thanks!

@bpoland
Copy link

bpoland commented Jul 10, 2017

I am also seeing this issue pretty frequently. We are using an outside script to invoke Terraform but definitely not running it twice. And I have confirmed that the storage account was not deleted outside Terraform before running the destroy (the OS disk for the VM being destroyed is in that storage account so I wouldn't have been able to delete it before the VM was destroyed).

Interestingly enough, a second attempt to destroy seems to succeed every time, so this does seem like some sort of consistency/timing issue.

I am going to see if I can reproduce when running terraform manually to destroy.

@djsly
Copy link
Contributor

djsly commented Jul 10, 2017

I still can reproduce using the exact same Config File

Error applying plan:

1 error(s) occurred:

* azurerm_virtual_machine.test (destroy): 1 error(s) occurred:

* azurerm_virtual_machine.test: Error deleting OS Disk VHD: Error finding resource group for storage account slysa0: Wrong number of results making resource request for query name eq 'slysa0' and resourceType eq 'Microsoft.Storage/storageAccounts': 0

Terraform does not automatically rollback in the face of errors.
Instead, your Terraform state file has been partially updated with
any resources that successfully completed. Please address the error
above and apply again to incrementally change your infrastructure.
✘-1 ~/github/sylvain_boily/terraform-playground/issue-102 [master|…2]

@djsly
Copy link
Contributor

djsly commented Jul 10, 2017

the provided configuration files from the OP are missing this

variable "subscription_id" {}
variable "client_id" {}
variable "client_secret" {}
variable "tenant_id" {}

variable "counts" {
 default = "1"
}
variable "resource_group" {
 default = "test"
}
variable "azure_location" {
 default = "eastus2"
}
variable "vnet_address_space" {
 default = ["10.1.0.0/24"]
}

provider "azurerm" {
  subscription_id = "${var.subscription_id}"
  client_id       = "${var.client_id}"
  client_secret   = "${var.client_secret}"
  tenant_id       = "${var.tenant_id}"
}

@djsly
Copy link
Contributor

djsly commented Jul 10, 2017

what I did was to run terraform apply, once completed, edited the azure.tf to update the name of the VM

resource "azurerm_virtual_machine" "test" {
    count = "${var.counts}"
    name = "<NEWNAME>${count.index}"

and reran terraform apply

@bpoland
Copy link

bpoland commented Jul 10, 2017

In my case, there are no changes to the VM name or anything. I've just deployed a VM and then later want to destroy it, and that's when I see the error. Then when I try destroying again, it succeeds.

@bpoland
Copy link

bpoland commented Jul 10, 2017

Actually I realized we are detaching a secondary disk right before we delete the VM (this is done using the azure CLI). I wonder if Azure is still propagating that change when the delete comes in?

I am going to try a delete without the secondary disk involved, and one with the secondary disk, and see if that seems to be related.

@djsly did you make any changes to your VM or its configuration outside of terraform?

@bpoland
Copy link

bpoland commented Jul 10, 2017

I was able to reproduce even without a secondary disk, so that doesn't seem to be it. I created a VM with terraform, waited for a few minutes and then ran "terraform destroy" and saw the issue. I am using a custom VHD file for my VMs, could that be it? It looks like @djsly is also using a custom VHD file.

One other interesting thing -- I noticed that when this error happens, even after I run terraform again to destroy, my VM's OS disk still remains in the storage account (I have delete_os_disk_on_termination set to true). Is this error happening when terraform tries to delete the OS disk after terminating the VM? It seems like the second time through, during the refresh it doesn't find the VM in Azure and so it doesn't try again to destroy the OS disk?

@djsly
Copy link
Contributor

djsly commented Jul 10, 2017

@bpoland

@djsly did you make any changes to your VM or its configuration outside of terraform?

No, I only use Terraform CLI and never log on to the portal.

It looks like @djsly is also using a custom VHD file.

I'm using the official Ubuntu Image as my Base Image for the sake of this example. So no custom VHD

@bpoland
Copy link

bpoland commented Jul 10, 2017

Ah sorry I see the Ubuntu image in your terraform config above.

@tombuildsstuff did you have delete_os_disk_on_termination = "true" when you were trying it out?

I am struggling to find any common "weird stuff" between @djsly and my configs that could explain why we are the only ones seeing this.

I started to see this issue maybe 3 weeks ago or so, and it didn't seem to be triggered by any changes to my configs (or a new version of terraform). So I was thinking maybe something changed on the Azure side. I just added a retry since that seemed to work (and hoped that Azure would fix things). It would still be nice to know for sure.

@djsly
Copy link
Contributor

djsly commented Jul 10, 2017

FYI: I simply used the official Azure example from terraform's website and I added delete_os_disk_on_termination = "true"

@bpoland
Copy link

bpoland commented Jul 27, 2017

I've pasted some debug output that I'm getting here: https://gist.github.com/bpoland/dd300ccc387a1671b060d01adb4734e6

A colleague noticed that the response from Azure includes no results but does include a "nextLink" -- is it possible the results are paginated and terraform needs to get the next "page" of results to find the storage account? The Azure subscription I'm working in has a lot of resources so maybe others don't see this if they have fewer resources. @djsly are there a lot of resources in the Azure subscription you're using?

@djsly
Copy link
Contributor

djsly commented Jul 27, 2017

I'm not sure what a lot refers too :) but we have a total of 975 items and 156 storage accounts.

I guess it could be identified as a lot hehe

@bpoland
Copy link

bpoland commented Jul 27, 2017

Haha yeah hard to say what "a lot" is :)

@tombuildsstuff when you were trying to reproduce, how many resources did you have in your azure subscription? Any thoughts about the pagination? Thanks!

@JunyiYi
Copy link

JunyiYi commented Mar 26, 2018

Hi @djsly , I used your tf files and the following steps:

  • run terraform apply
  • Change VM name
  • run terraform apply

But the issue is not reproduced Apply complete! Resources: 1 added, 0 changed, 1 destroyed. Can you confirm whether the issue still exists?

@djsly
Copy link
Contributor

djsly commented Mar 27, 2018

Hi @JunyiYi , we moved to Managed disk so we haven't exercised this logic path for a while. I do not mind closing it as it was probably fixed by now :)

@bpoland
Copy link

bpoland commented Mar 27, 2018

Has anyone made any changes that they think should fix this problem? I think you need to be using a subscription with a lot of storage accounts in order to see the problem, because some results coming back from Azure are paginated and that causes terraform to not be able to find the storage account.

@JunyiYi
Copy link

JunyiYi commented Mar 27, 2018

Thanks @djsly, let me close this issue now. @bpoland , my subscription contains 46 storage accounts. And could you please create a new issue with your terraform HCL and reproduce steps. Let's track only one issue in this thread. Thanks.

@JunyiYi JunyiYi closed this as completed Mar 27, 2018
@bpoland
Copy link

bpoland commented Mar 27, 2018

@JunyiYi the issue I experienced is the exact one that @djsly reported in this issue. It seems that in order to reproduce you need to have a large number of storage accounts in the subscription. Could you try creating 50 or 100 more storage accounts temporarily in your subscription and then see if you can reproduce it?

@djsly
Copy link
Contributor

djsly commented Mar 27, 2018

@bpoland is correct, we used to have over 200 storage account (one per VM)

@tombuildsstuff tombuildsstuff modified the milestones: 1.4.0, 1.3.3 Apr 17, 2018
@ljfranklin
Copy link

We're still seeing this as well under Terraform v0.11.3, not sure what the provider version was. Same boat as everyone else, destroy fails when delete_os_disk_on_termination=true and we have a large number of storage accounts (>100).

@bpoland
Copy link

bpoland commented May 1, 2018

@JunyiYi @tombuildsstuff would you be able to reopen this issue since it was never actually fixed?

@markround
Copy link

I know it's bad form to "bump" or add a "me too", but I just ran into this bug. Please could it be re-opened as it's not fixed?

I have some 60-odd Azure storage accounts holding disk images, delete_os_disk_on_termination=true in the VM config and am seeing this every time when I try and delete a VM which has it's disk image in a storage account name starting with an x, but others (e.g. starting with d or whatever) work fine.

It therefore looks to be exactly the same issue with Terraform not paginating results returned from the Azure API so it assumes the storage account does not exist.

@bpoland
Copy link

bpoland commented Sep 13, 2018

I ended up moving to managed disks but we still had problems with it before we switched. My workaround was to add a separate azurerm_storage_blob resource for the OS disk:

resource "azurerm_storage_blob" "vm_os" {
    name = "${var.vm_name}-os.vhd"
    resource_group_name = "${var.azure_resource_group}"
    storage_account_name = "${var.storage_account_name}"
    storage_container_name = "vhds"
}

Then in the VM itself turn delete_os_disk_on_termination off and add depends_on = [ "azurerm_storage_blob.vm_os" ]

But this is absolutely still an issue with the provider.

@markround
Copy link

@JunyiYi Could this be re-opened please ? Or would you prefer me to create a new (duplicate) issue ? As mentioned above, we're seeing this exact same issue, and for various reasons cannot move to managed disks to work around the problem, or switch to a separate blob store resource.

@ghost
Copy link

ghost commented Mar 6, 2019

I'm going to lock this issue because it has been closed for 30 days ⏳. This helps our maintainers find and focus on the active issues.

If you feel this issue should be reopened, we encourage creating a new issue linking back to this one for added context. If you feel I made an error 🤖 🙉 , please reach out to my human friends 👉 [email protected]. Thanks!

@ghost ghost locked and limited conversation to collaborators Mar 6, 2019
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

Successfully merging a pull request may close this issue.