-
Notifications
You must be signed in to change notification settings - Fork 301
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
azuread_service_principal delay before being usable #4
Comments
I had the same issue today. In my case, I fixed it by using the resource "azurerm_azuread_application" "test" {
name = "exampleTFapplication"
available_to_other_tenants = false
oauth2_allow_implicit_flow = false
}
resource "azurerm_azuread_service_principal" "test" {
application_id = "${azurerm_azuread_application.test.application_id}"
}
resource "azurerm_azuread_service_principal_password" "test" {
service_principal_id = "${azurerm_azuread_service_principal.test.id}"
value = "BVcKK237/&&)hyz@%nsadasdsa(*&^CC#Nd3"
end_date = "2020-01-01T01:02:03Z"
}
resource "azurerm_resource_group" "test" {
name = "testResourceGroup1"
location = "West US"
}
resource "azurerm_role_assignment" "test" {
scope = "${azurerm_resource_group.test.id}"
role_definition_name = "Reader"
principal_id = "${azurerm_azuread_application.test.application_id}"
} It's a weird behavior, but I got that from the Hope it helps! |
@schoren thanks for replying. I just tested this and when i tried the update I get the response: Error: Error applying plan: 1 error(s) occurred:
I then confirmed the outputs and they are both different values: Outputs: azurerm_azuread_application_id = 408b56eeXXXXXXXXXXX I looked at the azurerm_role_assignment documentation and it does specifically call out the principal ID is required. Am I missing something obvious? |
Yes, yesterday I had a similar issue. I'm checking now to see if it is still happening. In another env, I had successfully deployed and assigned roles to services principal using that method. |
Ok, now it's working with the original solution, using |
@mb290 @schoren this is also the behavior of Azure CLI when using the command References: To be clear the terraform configuration below works most of the time because it waits 30s for server replication using a hack (but sometimes it take longer than 30s, and then it fails with the same error you describe above): provider "azurerm" {
version = "~> 1.10.0"
}
data "azurerm_subscription" "current" {}
resource "random_string" "password" {
length = 32
}
resource "random_id" "name" {
byte_length = 16
}
variable "role" {
default = "Contributor"
}
variable "end_date" {
default = "2020-01-01T01:02:03Z"
}
resource "azurerm_azuread_application" "service_principal" {
name = "${random_id.name.hex}"
}
resource "azurerm_azuread_service_principal" "service_principal" {
application_id = "${azurerm_azuread_application.service_principal.application_id}"
}
resource "azurerm_azuread_service_principal_password" "service_principal" {
service_principal_id = "${azurerm_azuread_service_principal.service_principal.id}"
value = "${random_string.password.result}"
end_date = "${var.end_date}"
# wait 30s for server replication before attempting role assignment creation
provisioner "local-exec" {
command = "sleep 30"
}
}
resource "azurerm_role_assignment" "service_principal" {
scope = "${data.azurerm_subscription.current.id}"
role_definition_name = "${var.role}"
principal_id = "${azurerm_azuread_service_principal.service_principal.id}"
depends_on = ["azurerm_azuread_service_principal_password.service_principal"]
}
output "display_name" {
description = "The Display Name of the Azure Active Directory Application associated with this Service Principal."
value = "${azurerm_azuread_service_principal.service_principal.display_name}"
}
output "application_id" {
description = "The Application ID."
value = "${azurerm_azuread_application.service_principal.application_id}"
}
output "object_id" {
description = "The Object ID for the Service Principal."
value = "${azurerm_azuread_service_principal.service_principal.id}"
}
output "password" {
description = "The Password for this Service Principal."
value = "${azurerm_azuread_service_principal_password.service_principal.value}"
} While this terraform configuration don't wait for server replication using the above hack, and always fails: provider "azurerm" {
version = "~> 1.10.0"
}
data "azurerm_subscription" "current" {}
resource "random_string" "password" {
length = 32
}
resource "random_id" "name" {
byte_length = 16
}
variable "role" {
default = "Contributor"
}
variable "end_date" {
default = "2020-01-01T01:02:03Z"
}
resource "azurerm_azuread_application" "service_principal" {
name = "${random_id.name.hex}"
}
resource "azurerm_azuread_service_principal" "service_principal" {
application_id = "${azurerm_azuread_application.service_principal.application_id}"
}
resource "azurerm_azuread_service_principal_password" "service_principal" {
service_principal_id = "${azurerm_azuread_service_principal.service_principal.id}"
value = "${random_string.password.result}"
end_date = "${var.end_date}"
}
resource "azurerm_role_assignment" "service_principal" {
scope = "${data.azurerm_subscription.current.id}"
role_definition_name = "${var.role}"
principal_id = "${azurerm_azuread_service_principal.service_principal.id}"
depends_on = ["azurerm_azuread_service_principal_password.service_principal"]
}
output "display_name" {
description = "The Display Name of the Azure Active Directory Application associated with this Service Principal."
value = "${azurerm_azuread_service_principal.service_principal.display_name}"
}
output "application_id" {
description = "The Application ID."
value = "${azurerm_azuread_application.service_principal.application_id}"
}
output "object_id" {
description = "The Object ID for the Service Principal."
value = "${azurerm_azuread_service_principal.service_principal.id}"
}
output "password" {
description = "The Password for this Service Principal."
value = "${azurerm_azuread_service_principal_password.service_principal.value}"
} with the error:
Do anyone have suggestion for workaround in terraform? I don't yet understand how fix for this would be implemented in any of these resources. I really don't want to use this very ugly hack: ...
resource "azurerm_azuread_service_principal" "service_principal" {
application_id = "${azurerm_azuread_application.service_principal.application_id}"
}
resource "azurerm_azuread_service_principal_password" "service_principal" {
service_principal_id = "${azurerm_azuread_service_principal.service_principal.id}"
value = "${random_string.password.result}"
end_date = "${var.end_date}"
# wait 30s for server replication before attempting role assignment creation
provisioner "local-exec" {
command = "sleep 30"
}
}
resource "azurerm_role_assignment" "service_principal" {
scope = "${data.azurerm_subscription.current.id}"
role_definition_name = "${var.role}"
principal_id = "${azurerm_azuread_service_principal.service_principal.id}"
depends_on = ["azurerm_azuread_service_principal_password.service_principal"]
}
... Many thanks, |
@joakimhellum-in Thanks for that clarification. It is an ugly workaround, but maybe that's the best we can get. I don't have a very deep understanding of terraform and this provider's inner workings, so I cannot tell if there's a cleaner solution. For the time being, I think I'll implement what you suggested |
We really want to avoid using the Any suggestions on how to implement a fix for this in terraform is highly appreciated. Update 1: yes, have really no idea how to approach fixing this in terraform other than retrying multple times on fail like az cli does, as the error returned from the API is very generic. Maybe @tombuildsstuff could help with what direction to take here. Update 2: Update 3: Thanks again, |
@tombuildsstuff and/or anyone - would you clarify something for me? It appears (to me at least) that the solution to the various Does this mean that you couldn't do something similar to the Thanks for any insight! |
I can confirm I have the same behaviour. This is related to the time to replicate the SP through the Azure AD servers. My scenario is:
Get the error : AADSTS70001: Application with identifier 'app guid here' was not found in the directory retry 1 min later another terraform apply and everything goes through. |
Have the same issue. |
I have tried with 30s, 60s,180s and 200s and I am still getting the same issue... Using directly az-cli is what worked for me as @joakimhellum-in mentioned previously: resource "azurerm_azuread_service_principal_password" "app_spn_password" {
service_principal_id = "${azurerm_azuread_service_principal.app_spn_id.id}"
value = "${random_string.password.result}"
end_date = "${var.spn_end_date}" #2020-01-01T01:02:03Z
provisioner "local-exec" {
command = "az role assignment create --role ${var.spn_role_definition_name} --assignee-object-id ${azurerm_azuread_service_principal.app_spn_id.id} --scope ${var.spn_scope}"
}
} |
Did anybody think to query the AD servers by PowerShell to see if the SPN has been replicated through and then carry on? I am not sure if you can do this on Azure AD though... |
I'm getting |
@clstokes that sounds like the same underlying issue as this, so we can track that here. Thanks! |
I'm getting the same issue but I'm not using depends_on. I created the cluster first then added the configuration to create the role assignment. No matter how many times I try to apply it fails. |
Hi @mb290, As in 2.0 we are deprecating all Azure AD resources and data sources in the Azure RM provider in favour of this new provider I have moved the issue here. |
I can confirm that this issue still exists with the new AzureAD provider. |
I also cannot do role assignments with Terraform for Service Principals. It works fine for AAD groups but I get the Status=400 Code="PrincipalNotFound" too. The service principal has been created days ago so I don't think it is a race condition that others seem to be experiencing. If this is being tracked in another issue @tombuildsstuff can you please post the link here as I cannot find it. |
I am also encountering
In my scenario the service principle is pre-existing so it cannot be a time thing. I am attempting to give an AKS SP permission to act as "Managed Identity Operator" over a User Managed Identity. When using the respective AZ CLI command as the same user running Terraform, I have no issues.
In this example it looks like (as @liamfoneill above) the issue may lie with the azurerm_role_assignment resource. Resolved for now by running the az cli command via a local-exec. It works for now, but would much prefer to use the native resource. |
I've barely tested this, so it's probably flawed, but it worked the first time I tried it: resource "azuread_service_principal_password" "main" {
service_principal_id = "${azuread_service_principal.main.id}"
value = "${var.password}"
end_date = "${var.end_date}"
provisioner "local-exec" {
command = <<EOF
until az ad sp show --id ${azuread_service_principal.main.application_id}
do
echo "Waiting for service principal..."
sleep 3
done
EOF
}
} At least it's an idea, and someone can probably identify the flaws and improve on it. |
If you happen to be running on Windows (where
param(
[string]$ApplicationId
)
$elapsed = 0;
$delay = 3;
$limit = 5 * 60;
$checkMsg = "Checking for service principal with Application ID $ApplicationId"
Write-Host $checkMsg
$cmd = "az ad sp show --id $ApplicationId";
Invoke-Expression $cmd
while($lastExitCode -ne 0 -and $elapsed -le $limit) {
$elapsedSeconds = $elapsed + "s";
Write-Host "Service principal is not yet available. Retrying in $delay seconds... ($elapsedSeconds elapsed)"
Start-Sleep -Seconds $delay;
$elapsed += $delay;
Write-Host $checkMsg
Invoke-Expression $cmd;
}
if($lastExitCode -eq 0) {
Write-Host "Service principal is ready."
exit 0
}
Write-Host "Service principal did not become ready within the allotted time."
exit 1 resource "azuread_service_principal_password" "ad_principal_pw" {
service_principal_id = "${azuread_service_principal.ad_principal.id}"
value = "${var.password}"
end_date = "${var.end_date}"
provisioner "local-exec" {
command = ".\\wait-for-service-principal.ps1 -ApplicationId \"${azuread_application.ad_app.application_id}\""
interpreter = ["PowerShell"]
}
} |
I am having the same issue. Is there a permanent solution on the roadmap? I see this issue was removed from the 0.3.0 milestone. The work-around with the exec-local to wait for "az ad sp show --id ${azuread_service_principal.main.application_id}" does not work either. The exec returns ok, displaying the service principe, but it is yet not ready to get consumed by AKS. I guess timing/eventual consistency issue between several Azure API's. Sleep 30 was the only way forward for me. |
Hi! This also affects for AKS cluster, as the SP is not ready (or the password). |
Maybe something like this can replace resource timeout block.
|
I'm going to lock this issue because it has been closed for 30 days ⏳. This helps our maintainers find and focus on the active issues. If you feel this issue should be reopened, we encourage creating a new issue linking back to this one for added context. If you feel I made an error 🤖 🙉 , please reach out to my human friends 👉 [email protected]. Thanks! |
Community Note
Terraform Version
terraform -v
Terraform v0.11.7
Affected Resource(s)
Terraform Configuration Files
Panic Output
Error: Error applying plan:
1 error(s) occurred:
azurerm_role_assignment.test: 1 error(s) occurred:
azurerm_role_assignment.test: authorization.RoleAssignmentsClient#Create: Failure responding to request: StatusCode=400 -- Original Error: autorest/azure: Service returned an error. Status=400 Code="PrincipalNotFound" Message="Principal ######################## does not exist in the directory #######-#####-######-#########."
(sensitive details have been hashed out).
Expected Behavior
Here is a config that first creates the AzureAD application and the Service Principal. It then creates an RG followed by a role assignment. The logic here is we could have a single TF module that would allow us to on board new groups into an Azure subscription and generate them each their own SP.
Actual Behavior
When the azurerm_azuread_service_principal.test resource is created there looks to be a delay between creation and the ability to assign it it to a role and even with a depends_on that i've included in the sample code above that doesn't help. When I re-run the second time it always applies without issue as all other resources already exist.
Steps to Reproduce
terraform apply
The text was updated successfully, but these errors were encountered: