Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

azuread_application resource: appRoles are created multiple times #308

Closed
daniel-chambers opened this issue Aug 20, 2020 · 5 comments · Fixed by #461
Closed

azuread_application resource: appRoles are created multiple times #308

daniel-chambers opened this issue Aug 20, 2020 · 5 comments · Fixed by #461

Comments

@daniel-chambers
Copy link

Community Note

  • Please vote on this issue by adding a 👍 reaction to the original issue to help the community and maintainers prioritise this request
  • Please do not leave "+1" or "me too" comments, they generate extra noise for issue followers and do not help prioritise the request
  • If you are interested in working on this issue or have submitted a pull request, please leave a comment

Terraform (and AzureAD Provider) Version

Terraform v0.12.28 (Yes, this is older than 0.13, but I believe the issue is in this provider, not in Terraform itself)
provider.azuread v0.11.0
provider.azurerm v2.23.0

Affected Resource(s)

  • azuread_application

Terraform Configuration Files

variable "SUBSCRIPTION_ID" {
  type = string
}

provider "azuread" {
  version         = "0.11.0"
  subscription_id = var.SUBSCRIPTION_ID
}

provider "azurerm" {
  version                    = "2.23.0"
  subscription_id            = var.SUBSCRIPTION_ID
  features {}
  skip_provider_registration = true
}

data "azurerm_client_config" "current" {
}

resource "azuread_application" "api" {
  name                       = "Test API"
  homepage                   = "https://my-repro.com"
  available_to_other_tenants = false
  oauth2_allow_implicit_flow = false
  identifier_uris            = ["api://my-bug-repro/test"]
  owners                     = [data.azurerm_client_config.current.object_id]

  app_role {
    allowed_member_types = ["User"]
    description          = "Review Administrators have the ability to view and modify reviews"
    display_name         = "Review Administrator"
    value                = "ReviewAdministrator"
    is_enabled           = true
  }
  app_role {
    allowed_member_types = ["User"]
    description          = "Review Readers have read-only access to reviews"
    display_name         = "Review Reader"
    value                = "ReviewReader"
    is_enabled           = true
  }

  oauth2_permissions {
    admin_consent_description  = "Allows the app to read and modify reviews on behalf of the signed-in user."
    admin_consent_display_name = "Read and modify reviews"
    is_enabled                 = true
    type                       = "User"
    user_consent_description   = "Allows the app to read and modify reviews on your behalf."
    user_consent_display_name  = "Read and modify reviews"
    value                      = "Review.ReadWrite"
  }
}

Debug Output

If you don't mind I've encrypted the debug logs (which contain stuff like our subscription ID etc and a lot of stuff I don't understand and don't want to try to sanitise) to the Hashicorp keybase user.

tflog.log.gz.encrypted.saltpack.zip

The error at the end is:

Error: patching Azure AD Application with ID "c261c681-ec7d-4854-b577-5c1e4a0570a9": graphrbac.ApplicationsClient#Patch: Failure responding to request: StatusCode=400 -- Original Error: autorest/azure: Service returned an error. Status=400 Code="Unknown" Message="Unknown service error" Details=[{"odata.error":{"code":"Request_BadRequest","date":"2020-08-20T06:05:48","message":{"lang":"en","value":"Property  value cannot be deleted or updated unless it is disabled first."},"requestId":"913e3d7e-ef08-43df-989d-9b0208147973","values":[{"item":"PropertyName","value":"None"},{"item":"PropertyErrorCode","value":"CannotDeleteEnabledEntitlement"}]}}]

Panic Output

None

Expected Behavior

The appRoles should be created once, and I should never experience the above error.

Actual Behavior

During resource creation, the appRoles are created (PATCH-ed onto the app), then they are disabled (PATCH-ed into isEnabled: false). Then they are deleted and replaced with new copies (new appRoles with new IDs are PATCH-ed onto the app).

Then, sometimes (not always) I experience a Property value cannot be deleted or updated unless it is disabled first. error. This looks like a dodgy error and behaviour from Azure AD, however I think it may be being exacerbated by the fact that we delete and recreate app roles close together while creating an app (they fact that this occurs only sometimes makes this feel like a race condition and Azure AD is full of frustrating eventual consistency).

I've had a poke through the provider code, and I believe we:

Seems to me like we could just delete the code that is creating the app roles inside resourceApplicationCreate and leave it to resourceApplicationUpdate to create them?

I'm not certain that this is the thing that's causing the above error, but I believe only the oauth2Permissions and the appRoles have this sort "disable before deleting" error, and it seems reasonable that we should only create appRoles once during application creation.

Steps to Reproduce

  1. terraform apply

If the bug does not happen, you may need to repeat the following steps as quickly as you can:
2. terraform destroy
3. terraform apply

Eventually, it happens.

@dimbleby
Copy link

dimbleby commented Mar 2, 2021

I see this issue has the upstream-terraform label. Is that right? Analysis above suggests that this is a provider issue.

If this is an upstream problem: do we have an issue at the terraform repository that I can go and vote on? Thanks!

@manicminer
Copy link
Contributor

manicminer commented Mar 2, 2021

Hi @daniel-chambers, @dimbleby, sorry for the delay in responding. Whilst the upstream-label is technically appropriate, it isn't going to aid us in the short term and since this issue was opened we have narrowed down a bunch of related issues with plans to remediate. That said...

I believe the root cause here is a lack of consistency with the AAD Graph API. As you correctly noted, we perform a blanket disabling of all roles before patching any role updates, and this behavior is baked into the create and update funcs. The reason we do this is because it's infeasible to identity roles reliably in order to selectively disable them before updating. The error you are getting is due to API consistency (or lack of), where roles are disabled but the API doesn't actually reflect that (either in the update or read direction) by the time the provider code tries to subsequently update them.

The long version is that roles are identified by their id - any of the other fields can change and the role is still considered the same role by the API. Compounding this is that id is not a user-configurable field, we generate a UUID and insert that into each role object. On top of that, roles are represented as a set in the schema because the API doesn't guarantee ordering, and using a list causes configuration drift at plan time. These facts together, means that we cannot make the id field user-configurable inside an app_role block due to Terraform limitations, and the net result is that because id is computed, the only way we have to reliably apply role updates is to disable all of them, then patch the application with the desired array of roles.

In good news, I believe your issue will be eliminated by using the azuread_application_app_role resource instead of app_role blocks in the azuread_application resource. In this 'child' resource, not only is id optional and computed (so you can specify an id if you want), but it's a top level attribute and hence not inside a set, which frees us of the above Terraform limitation. The downside is that the azuread_application_app_role resource manages a single role, and so you cannot exclude roles by leaving them out of your config, i.e. if a role gets added out of band, it will survive a terraform apply.

Furthermore, these sorts of consistency issues are greatly reduced in the Microsoft Graph API, which we are in the process of moving to, and we have plans to try to improve exactly this situation in the event that consistency errors occur. These constitute a breaking changes, so will only land in the next major provider version.

Hope this clarifies the predicament we have with this, and note that all of the above issues and consequences apply to both app roles and oauth2 permission scopes. Unfortunately there isn't much more we can do to resolve this bug until we've migrated to MS Graph or the next major version - and both of those are scheduled to happen at the same time. I'll happily keep this issue open so that we can track this particular error, and ensure we have mitigated it when that happens.

@dimbleby
Copy link

dimbleby commented Mar 2, 2021

Thanks for the update.

My terraform definitions actually currently do use the azuread_application_app_role resource, structured something like this:

resource "azuread_application" "foo" {
 ...
}
resource "azuread_application_app_role" "foo_role_one" {
  application_object_id = azuread_application.foo.0.id
  ...
}
resource "azuread_application_app_role" "foo_role_two" {
  application_object_id = azuread_application.foo.0.id
  ...
}

... and I am hitting the same error.

Actually, before you made the above update, I decided to try experimenting with inlining these as app_role - and haven't yet hit the error this way, though perhaps I have just been lucky. (I had wondered whether #367 might help). Which would be the other way round from what you wrote!

@manicminer
Copy link
Contributor

@dimbleby Sorry for the delay in replying! Actually on balance, for the suspected root cause of API inconsistency, due to our current implementation it's possible you may encounter this error with both the app_role block in the azuread_application resource, and with the azuread_application_app_role resource, depending on your specific configuration (e.g. how many roles you have defined) and how bad the API replication delay is at any given time.

Apologies for the inconvenience this causes, and that we can't get a fix out right away, but as promised we are actively working on an improved solution in v2.0.

@github-actions
Copy link

github-actions bot commented Aug 2, 2021

I'm going to lock this issue because it has been closed for 30 days ⏳. This helps our maintainers find and focus on the active issues.
If you have found a problem that seems similar to this, please open a new issue and complete the issue template so we can capture all the details necessary to investigate further.

@github-actions github-actions bot locked as resolved and limited conversation to collaborators Aug 2, 2021
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants