-
Notifications
You must be signed in to change notification settings - Fork 1.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Self-referential error enabling the Service Usage API #9489
Self-referential error enabling the Service Usage API #9489
Comments
Confirmed, seeing the same behaviour with the latest provider. |
@PatrickDale I just tried again on a fresh project and am no longer getting the same issue - I wonder if GCP APIs have fixed this issue quietly? |
Hey @leighmhart, it looks like this is still happening; we've just ran into the issue again today:
Unfortunately we've noticed that this doesn't happen consistently, so it's hard to reproduce. |
Ah - that makes some sense, of course the first request to fetch the operation would hit the service usage API endpoint... which was just enabled ... which therefore might not be ready to get requests yet. To fix this, I wonder if we might try to, say, just retry a few times if we get 403s on operation waits on this resource. |
resource "google_project_service" "prerequisite0" {
project = google_project.my_project.project_id
service = "serviceusage.googleapis.com"
disable_on_destroy = false
disable_dependent_services = false
}
resource "google_project_service" "prerequisite1" {
for_each = toset([
"cloudresourcemanager.googleapis.com",
])
project = google_project.my_project.project_id
service = each.value
depends_on = [google_project_service.prerequisite0]
disable_on_destroy = false
disable_dependent_services = false
}
|
@ScottSuarez: Won't the GET request against the operation still fail if there are eventual consistency issues, regardless of the set of services being enabled/disabled in a single |
After speaking with @rileykarson, I am revising my opinion, the dependent relationship chain I mentioned above should not be applicable as we BATCH these to send a single request. |
@PatrickDale, could you provide me more information on your setup. Which service account are you using to configure This is working locally/consistently for me, but I am using a service account which is located within a project where this api is already enabled. I suspect this is weird timing issue with the api since we are polling the operation, which is to say the api accepted our original request. Retrying on 403 is a good option but I want to see if I can reproduce this. |
@ScottSuarez we actually used to have the configuration set up like in your above example, but if I recall correctly ran into an issue where we got a Google API error saying that I can speak on behalf of @PatrickDale and say that our setup is a service account in |
@ericnorris I have an identical setup but wasn't able to replicate. hmm could you provide a more detailed DEBUG log of this request, potentially scrubbing any sensitive bits. you can set I would like to reproduce if possible |
also in scrubbing the logs, please retain |
@ScottSuarez unfortunately this issue is intermittent for us, and not consistent. The same operation (on different projects) has different results, so we don't have debug logs from previous occurrences, and I wouldn't know when to turn them on for future occurrences. It's worth noting that we have That said, I'm fairly convinced that what @ndmckinley said is correct. Since the code enables the API and then immediately calls the same API to check if it's enabled, it seems likely that there would be a race condition. |
I think the quota project will be @ScottSuarez: I may have pointed you on a bit of a wild goose chase given that- sorry about that! We're almost certainly dealing with operations needing the service enabled, but enabling the service not needing it- and we'll likely want to retry 403s on the operation here. |
okay I'll make that change ! |
We pushed a change just now that should release next Monday. Please let us know if you continue to experience this issue after this change hits. We were unable to reproduce the issue listed due to it's transient nature but are confident in our fix. |
Thanks @ScottSuarez! It's been happening to us once every couple days, so we'll be able to confirm whether or not this was fixed for us about a week after the release. We're also interested in seeing increased support for |
Hello there! We are facing the same issue. I'll be happy to confirm the fix next week too. Thanks! |
The new version should now be released. When you are consuming |
Hey @ScottSuarez, thanks for the help with this issue! We ran into this issue again today after upgrading to provider version Here are some
|
do you perhaps have more information here. We added retries on the polling of the operation. I want to confirm that we did poll that operation several times. This is tricky to fix since I can't repro it so I can't ensure it's fixed before I check it in. |
Only other fix I can think of would be to poll with a larger interval? |
We can enable debug output to get more information for the next run that fails. Unfortunately, that is all the output we received from this run that failed. And I definitely understand how this is tricky to test since it is really inconsistent! I did notice in the info message that enabling the service failed after 7 seconds:
Does that window line up with the amount of time you'd expect to be polling? |
Possibly, I'd have to check. Is this latency longer then the previous attempts? If you have those old logs it would be easy to see. |
It looks to be the same -- here is a gist of the first run we noticed failing, and it also failed after 7 seconds: https://gist.githubusercontent.com/PatrickDale/321a68360b28632d19a89144199058f3/raw/7cab66872dfe5c03a149e406f9bc141a7ce6bb01/gistfile1.txt |
From what I can see, everything should be working as intended.. This is tricky.. If you can get those debug logs it would be a major help. In the meantime I'll see what I can do. I might be able to spoof these operations responses with a proxy setup, but that will be expensive time-wise and I'm not sure if that is the best approach. I'm going to talk with my colleges. If you can get those detailed logs I might gain a bit more insight into whatever is happening. |
In order to parallelize this solution I've filed a bug with the internal API team that owns this. Hopefully they can also provide some sort of work around on their end to dry up this inconsistency. googler ref: b/202310048 |
unfortunately seems infeasible from the api teams response
|
After speaking with their team I am gaining some reliable means of reproduction and better ways to resolve this issue. The billing change could resolve this inconsistency. yay ... more details to come, sorry for the noise ! |
Nice! Thanks for the info @ScottSuarez! We will post debug logs here as well if we get another failure with that information. |
Hi @ScottSuarez I was able to cache this exception in a test terraform workspace. |
Hi @c4po, I've been pulled in many different directions as of late so this particular issue snuck away from me. I've set aside some time to play with it to hopefully address this during the week. The permission fix I've checked in might already resolve this issue in your scenario but I still want to see if I can get the retrial going. |
Note that releases are paused until 4.0 version of the provider which should release some time next week. |
hiya ~ was wondering if the 4.0 release ended up fixing this for you? |
so far we have not seen any issue after upgrade to 4.0. thank! |
nice ~ I'll go ahead and close this for now. If it happens feel free to comment |
I'm going to lock this issue because it has been closed for 30 days ⏳. This helps our maintainers find and focus on the active issues. |
Community Note
modular-magician
user, it is either in the process of being autogenerated, or is planned to be autogenerated soon. If an issue is assigned to a user, that user is claiming responsibility for the issue. If an issue is assigned tohashibot
, a community member has claimed the issue already.Terraform Version
Affected Resource(s)
Terraform Configuration Files
Debug Output
https://gist.github.com/PatrickDale/321a68360b28632d19a89144199058f3
Expected Behavior
Running
terraform apply
should have enabled the Service Usage API (serviceusage.googleapis.com
).Actual Behavior
terraform apply
failed with the error:Steps to Reproduce
terraform apply
Important Factoids
This operation was ran in Terraform Cloud using a service account that has permissions to enable services on this GCP project.
It seems like this error is self-referential -- it states that it cannot enable
serviceusage.googleapis.com
because the Service Usage API is disabled. From the terraform logs, it looks like the error comes from:terraform-provider-google/google/resource_google_project.go
Lines 631 to 634 in 512259e
After receiving the error, I checked the GCP console and the Service Usage API was enabled. I then ran
terraform apply
again which passed.The text was updated successfully, but these errors were encountered: