Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

intermittent failure to create service principal #611

Closed
dimbleby opened this issue Oct 2, 2021 · 5 comments · Fixed by #659
Closed

intermittent failure to create service principal #611

dimbleby opened this issue Oct 2, 2021 · 5 comments · Fixed by #659

Comments

@dimbleby
Copy link

dimbleby commented Oct 2, 2021

Community Note

  • Please vote on this issue by adding a 👍 reaction to the original issue to help the community and maintainers prioritise this request
  • Please do not leave "+1" or "me too" comments, they generate extra noise for issue followers and do not help prioritise the request
  • If you are interested in working on this issue or have submitted a pull request, please leave a comment

Terraform (and AzureAD Provider) Version

terraform 1.0.7, azuread 2.5.0

Affected Resource(s)

  • azuread_service_principal

Terraform Configuration Files

See #535. #581

Debug Output

https://gist.github.com/dimbleby/fc95d44a243ff8c192980f8323e7374c

Panic Output

Expected Behavior

Service principal is succesfully created (possibly after retry)

Actual Behavior

Could not create service principal

  with module.aad.azuread_service_principal.ccm[0],
on aad/main.tf line 611, in resource "azuread_service_principal" "ccm":
 611: resource "azuread_service_principal" "ccm" 

ServicePrincipalsClient.BaseClient.Post(): unexpected status 403 with OData
error: Authorization_RequestDenied: When using this permission, the backing
application of the service principal being created must in the local tenant

Steps to Reproduce

  1. terraform apply

Important Factoids

References

Alas, this is a reopening of #581 / #535.

re debug.log

  • 403 is at line 5433
  • So far as I can see in the debug.log there is no retry, though it's always possible that I'm mis-reading.
@manicminer
Copy link
Contributor

Hi @dimbleby, thanks for your patience with this issue. I'm sorry the latest release didn't eliminate this problem for you and I'll take a look at your supplied log to try and work out what's happening.

@manicminer
Copy link
Contributor

manicminer commented Oct 12, 2021

@dimbleby Thanks for your patience. I've done some more testing using an intercepting proxy to simulate the 403 response you're seeing. Although I wasn't able to use the exact same provider version due to a critical bugfix in 2.6.0, I believe the behavior is the same and I used the exact bytes of the error response as in your debug trace.

Here's what I'm seeing in my request/response inspector:

Screenshot 2021-10-12 at 23 07 13

And here's the output I'm getting:

Screenshot 2021-10-12 at 23 11 14

Alas from this I believe you're experiencing extreme replication delay in Azure AD. The provider only logs the 403 response after it has made 9 attempts to create the service principal, each time backing off exponentially to a max retry delay of 30 seconds, for a total time of around 2 minutes as can be seen in the above log screenshot.

I will look at the feasibility of increasing the retry count to better handle these scenarios of degraded API performance.

@dimbleby
Copy link
Author

Thanks - what you say sounds plausible: except, I suppose, we are both surprised that the inconsistency would persist for so long!

It's hard to be sure whether something that was already intermittent has become very intermittent, or we've just had a run of better luck - but I do think that we are seeing this failure less often than we used to. (Actually, I don't think I've seen it again since opening this instance of the issue.) So I'm willing to believe that the existing retries are helping.

@manicminer
Copy link
Contributor

One thing I've noticed is that replication delays can seemingly be isolated to a particular tenant - perhaps due to some underlying rate limiting. If I generate lots of activity in a single tenant, I've observed increasing delays that do not occur with a neigboring same-region tenant.

@github-actions
Copy link

I'm going to lock this issue because it has been closed for 30 days ⏳. This helps our maintainers find and focus on the active issues.
If you have found a problem that seems similar to this, please open a new issue and complete the issue template so we can capture all the details necessary to investigate further.

@github-actions github-actions bot locked as resolved and limited conversation to collaborators Dec 12, 2021
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants