Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RoleDefinitionDoesNotExist with azurerm_role_definition #10442

Closed
mkprizzle opened this issue Feb 2, 2021 · 14 comments
Closed

RoleDefinitionDoesNotExist with azurerm_role_definition #10442

mkprizzle opened this issue Feb 2, 2021 · 14 comments

Comments

@mkprizzle
Copy link

mkprizzle commented Feb 2, 2021

Community Note

  • Please vote on this issue by adding a 👍 reaction to the original issue to help the community and maintainers prioritize this request
  • Please do not leave "+1" or "me too" comments, they generate extra noise for issue followers and do not help prioritize the request
  • If you are interested in working on this issue or have submitted a pull request, please leave a comment

Terraform (and AzureRM Provider) Version

Terraform v0.14.5
provider registry.terraform.io/hashicorp/azurerm v2.45.1

Affected Resource(s)

  • azurerm_role_definition

Terraform Configuration Files

resource "azurerm_role_definition" "maps_custom" {
  name               = var.maps_custom_role_name 
  scope              = data.azurerm_subscription.primary.id
  description        = "Custom role for Maps at the control plane operations level"

  permissions {
    actions     = [ "Microsoft.Maps/register/action",
                    "Microsoft.Maps/accounts/write",
                    "Microsoft.Maps/accounts/read",
                    "Microsoft.Maps/accounts/delete",
                    "Microsoft.Maps/accounts/listKeys/action",
                    "Microsoft.Maps/accounts/regenerateKey/action",
                    "Microsoft.Maps/accounts/providers/Microsoft.Insights/diagnosticSettings/read",
                    "Microsoft.Maps/accounts/providers/Microsoft.Insights/diagnosticSettings/write",
                    "Microsoft.Maps/accounts/providers/Microsoft.Insights/metricDefinitions/read",
                    "Microsoft.Maps/operations/read" ]
  }

  assignable_scopes = [
    data.azurerm_subscription.primary.id
  ]
}

Debug Output

https://gist.github.com/mkprizzle/1d1787a944eff128135603e3582f70a6

Expected Behaviour

Successful creation of custom role after apply/destroy/apply

Actual Behaviour

Custom role is created successfully after first apply. Custom role is removed successfully after first destroy. Custom role is created successfully after second apply, but not manageable or assignable by Terraform, and cannot be subsequently destroyed.

Steps to Reproduce

  1. terraform apply
  2. terraform destroy
  3. terraform apply

Important Factoids

Problem does not occur on initial creation of custom role. After destroying, applying to rebuild the custom role with the same name as the first destroyed role will cause this issue. Changing the name of the custom role between builds does not cause the issue.

References

@alastairtree
Copy link
Contributor

Getting the same error. Appears for some reasons a pipe is getting added to the role definition id:

Error: A resource with the ID "/subscriptions/66b4[REDACTED]/providers/Microsoft.Authorization/roleDefinitions/16f62d81-9a2d-21bc-8f5d-0aafec905441|/subscriptions/66b4[REDACTED]" already exists

@publysher
Copy link

publysher commented Feb 4, 2021

We are consistently getting the same error on the first apply -- up until two weeks ago, this error did not occur. Updating to the latest release (2.45.1) of the azurerm provider did not help.

Error: authorization.RoleDefinitionsClient#Get: Failure responding to request: StatusCode=404 -- Original Error: autorest/azure: Service returned an error. Status=404 Code="RoleDefinitionDoesNotExist" Message="The specified role definition with ID '2db[REDACTED]' does not exist."

  on [REDACTED].tf line 116, in resource "azurerm_role_definition" "[REDACTED]":
 116: resource "azurerm_role_definition" "[REDACTED]" {

All of this seems to indicate that it has the same root cause as the referenced #9379 (eventual consistency timing), which indicates it might have the same solution as #10134 by @jackofallops

@publysher
Copy link

Workaround:

After getting the error, run:

export SUBSCRIPTION_ID="..."         # taken from provider settings
export ROLE_ID="..."                          # taken from error message
export RESOURCE_NAME="..."          # taken from line above the error message 
terraform import "$RESOURCE_NAME" "/subscriptions/$SUBSCRIPTION_ID/providers/Microsoft.Authorization/roleDefinitions/$ROLE_ID|/subscriptions/$SUBSCRIPTION_ID"

This assumes subscription scope. Update the part after the pipe if you have a different scope.

@krowlandson
Copy link
Contributor

Possibly related as we also get the above error periodically, but have also noticed the following whilst running Terraform version 0.14.6 with AzureRM Provider version 2.46.1:

2021-02-08T16:40:35.6831945Z �[1m�[31mError: �[0m�[0m�[1mProvider produced inconsistent result after apply�[0m
2021-02-08T16:40:35.6833274Z 
2021-02-08T16:40:35.6834433Z �[0mWhen applying changes to
2021-02-08T16:40:35.6835580Z module.test_root_id_2.azurerm_role_definition.enterprise_scale["/providers/Microsoft.Authorization/roleDefinitions/7c8e66c3-9a33-578d-be50-b3a88dc69473"],
2021-02-08T16:40:35.6836341Z provider "registry.terraform.io/hashicorp/azurerm" produced an unexpected new
2021-02-08T16:40:35.6836844Z value: Root resource was present, but now absent.
2021-02-08T16:40:35.6837092Z 
2021-02-08T16:40:35.6838186Z This is a bug in the provider, which should be reported in the provider's own
2021-02-08T16:40:35.6838723Z issue tracker.

As suggested above, this may indeed be a similar issue / fix as #10134

@andydkelly-ig
Copy link

andydkelly-ig commented Feb 10, 2021

I am now also getting this error, on Terraform 0.13.6 and Azure Provider 2.46.1. I am unable to add a new Role Definition, receiving the message:

Error: Provider produced inconsistent result after apply

When applying changes to azurerm_role_definition.aktest1234, provider
"registry.terraform.io/hashicorp/azurerm" produced an unexpected new value:
Root resource was present, but now absent.

This is a bug in the provider, which should be reported in the provider's own
issue tracker.

I came across whilst trying to prove what i thought was a bug wereby Terraform destroy isn't destroying Role Definitions, which is now causing me issues when reapplying. When i use AZ CLI to list Role Definitions, i can see every Role Definition i have ever created despite the fact all the infrastructure has been deleted including the Resource Group that contained everything.

@andydkelly-ig
Copy link

andydkelly-ig commented Feb 10, 2021

In addition to the above, i have noticed that despite returning the error, the resource IS created albeit not fully. In my case at the Authorized Scope was simply the Subscription and not the Azure KeyVault in my config.

Running apply again tells me it's goint to change the Assignable Scope. Running a second time produces the error:

Error: authorization.RoleDefinitionsClient#CreateOrUpdate: Failure responding to request: StatusCode=409 -- Original Error: autorest/azure: Service returned an error. Status=409 Code="RoleDefinitionWithSameNameExists" Message="A role definition cannot be updated with a name that already exists."

However, again the change is actually made. If i then go and import the resource manually into Terraform (it's not in the state given both applies failed), it wants to make no changes which shows the assignable scope has been added correctly despite the error.

Using the AZ CLI also confirmed this by running az role definition list -n terraformbugtest each time and seeing the configuration change.

I have also gone back through each Azurerm version to 2.44.0 and the issue still persists.

@jackofallops
Copy link
Member

Getting the same error. Appears for some reasons a pipe is getting added to the role definition id:

Error: A resource with the ID "/subscriptions/66b4[REDACTED]/providers/Microsoft.Authorization/roleDefinitions/16f62d81-9a2d-21bc-8f5d-0aafec905441|/subscriptions/66b4[REDACTED]" already exists

That's an unfortunate necessity to deal with an otherwise unresolvable situation with being able to delete in some cases where we don't have enough information (where the scope is a management group, for example). You can use the resource_id property for the Azure ID reference without the pipe.

The core of this issue reported appears to be an eventual consistency issue on the delete operation, iirc there's no way to determine for certain that a role has actually been deleted, as that information is only exposed when one attempts to create an object that already exists. I'll take another look into this as soon as I can to see if that situation has changed, but I don't recall any change notes in this area from the Azure SDK.

@publysher
Copy link

The core of this issue reported appears to be an eventual consistency issue on the delete operation.

That might also be involved, but in our case it must be related to the create operation, because we start from scratch -- including the entire resource group.

@andydkelly-ig
Copy link

@publysher I experienced the same when trying to prove it out. With a brand new resource group and brand new role definition I am unable to get a successful creation

@darren-johnson
Copy link

darren-johnson commented Feb 11, 2021

I have exactly the same issue here. I created a bunch of roles using previously working code and I can't find a combination of TF or azure rm versions to get this working. I have a new subscription which I need to configure RBAC on.

UPDATE: after having issues for over 12 hours. I left the partially created roles for an hour did a destroy and then created the roles one by one and they ran in fine. I suspect something going on behind the scenes.

@alastairtree
Copy link
Contributor

@katbyte @tombuildsstuff flagging this in case you had not seen it. This issue means most people using many versions of azurerm provider and the azurerm_role_definition resource are getting frequent errors from the provider that are really awkward to resolve and require manually importing to state to work around the bug. It almost makes any deploy that references azurerm_role_definition unusable - for example my team has had many broken deploys this week and last. Is there any way someone could take a look? Thank you!

@emily599
Copy link

We are a large enterprise customer and currently running into this same issue. I understand there is a workaround by importing the state, but that defeats the purpose of running Terraform using pipelines since we would have to have someone manually importing the state every time a role definition is created.

@tombuildsstuff
Copy link
Contributor

👋

Whilst there's more 👍 on this issue - since there's more details in #10602 on reflection I'm going to close this in favour of #10602 - would you mind subscribing to that issue for updates? From our side this appears to be a case of extra eventual consistency coming from the Azure API (where Creates aren't accounted for, and Updates temporarily create a new ID and then reconcile it a few seconds later) - there's a fix for that in the form of #9850 but we've not had time to finish reconciling that yet - but for now let's track this in #10602.

Thanks!

@ghost
Copy link

ghost commented Mar 20, 2021

I'm going to lock this issue because it has been closed for 30 days ⏳. This helps our maintainers find and focus on the active issues.

If you feel this issue should be reopened, we encourage creating a new issue linking back to this one for added context. If you feel I made an error 🤖 🙉 , please reach out to my human friends 👉 [email protected]. Thanks!

@ghost ghost locked as resolved and limited conversation to collaborators Mar 20, 2021
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

No branches or pull requests

10 participants