Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Error reading service account after creation - 403 Halts Execution #10227

Closed
aaron-brown opened this issue Oct 1, 2021 · 8 comments
Closed
Assignees
Labels

Comments

@aaron-brown
Copy link

aaron-brown commented Oct 1, 2021

Issue

My team and I have been creating Service Accounts for some time, but recently (perhaps maybe the past two weeks) we have encountered the following error when attempting to create Service Accounts:

│ Error: Error reading service account after creation: googleapi: Error 403: Permission iam.serviceAccounts.get is required to perform this operation on service account projects/{project}/serviceAccounts/{service account that was just created}., forbidden
│
│   with [Terraform path],
│   on [Path]/main.tf line 27, in resource "google_service_account" "our_service_account":
│   27: resource "google_service_account" "our_service_account" {

We had done numerous things to try and mitigate seeing this error (updating Provider versions, ensuring we had up-to-date gcloud components, updating terraform, etc.), and at first we believed some of the things we were doing were effective; but it kept coming back intermittently.

So I dug into the source code a little bit and found this bit of code.. If I'm understanding it correctly, it appears that after creation of a Service Account, there is a wait-period that retries upon receiving a 404. This makes sense, as it takes some time for the system to update and for the account to appear.

However, I believe that there is also a condition where the Account is created, so the 404 is not encountered; but, the IAM policies haven't been fully established. That is why we appear to be getting a 403. If we try the terraform apply a little bit later, it works without issue. Presumably because now the IAM Policies are established.

Expected behavior

Handle a 403 along with the 404 so that when the account is created, but IAM Policies have not been fully established, the apply does not halt. The 403 should eventually resolve similarly to how the 404 resolves. Otherwise, the timeout occurs.

Community Note

  • Please vote on this issue by adding a 👍 reaction to the original issue to help the community and maintainers prioritize this request.
  • Please do not leave +1 or me too comments, they generate extra noise for issue followers and do not help prioritize the request.
  • If you are interested in working on this issue or have submitted a pull request, please leave a comment.
  • If an issue is assigned to the modular-magician user, it is either in the process of being autogenerated, or is planned to be autogenerated soon. If an issue is assigned to a user, that user is claiming responsibility for the issue. If an issue is assigned to hashibot, a community member has claimed the issue already.

Terraform Version

1.0.7 / 1.0.0

Affected Resource(s)

  • google_service_account

Terraform Configuration Files

# Nothing atypical, just a standard google_service_account resource
resource "google_service_account" "service_account" {
  account_id = local.service_account_name
  display_name = local.service_account_display_name
}

Panic Output

Expected Behavior

Actual Behavior

Steps to Reproduce

  1. terraform apply

Important Factoids

References

  • #0000
@aaron-brown aaron-brown added the bug label Oct 1, 2021
@edwardmedia edwardmedia self-assigned this Oct 1, 2021
@edwardmedia
Copy link
Contributor

@aaron-brown I noticed you said it happens intermittently. Were they on the exact same code? Can you post the config ( the values for the variables are needed) along with the debug log?

@aaron-brown
Copy link
Author

Working on getting the requested info. Trying to reproduce in a simpler way if possible, I'm wondering if this happens when we are building a lot of things at the same time along with the service account(s). The issue is that for us it's intermittent so difficult to say.

At the very least I want to try to get the debug logs, we haven't done an apply where this occurs and with debug enabled, yet.

@edwardmedia
Copy link
Contributor

@aaron-brown it is possible for the time when you build a lot of things at the same time. It could be the service account has not been ready yet (may take longer time than it says done) when other processes try to use it. In this case, you could add sleep logic to workaround it.... Yes, let's review the logs to see ....

@aaron-brown
Copy link
Author

@edwardmedia:

Were they on the exact same code?

Yes. Not always the same Service Account, but it was on the same terraform script.

Can you post the config ( the values for the variables are needed) ...

I can provide the config if it is necessary, but there are at least two things that may complicate that:

  1. I would have to scrub / modify some of the values
  2. We defer some of the work we do to modules, which don't do anything special, but it does mean that the config I would provide wouldn't be "complete."

For now, in lieu of posting the config, I'll try to describe it better. The apply that's run builds about 25 objects, one of which is a node pool, several service accounts, pub / sub topics and subscriptions, and IAM setups.

When running the apply, the service account creation is mixed in with the creation of all of the other things. Keep this in mind as I answer the next part of this question...

...along with the debug log?

I'm attaching 3 (consolidated) log files that demonstrate the HTTP Conversation when creating the Service Account that failed.

File apply-01-a.log is the first run of an apply where a service account failed to create because of the 403 error. Between each of these requests were other requests being made to create the pub/sub entities and node pool entities.

File apply-02-a.log is a subsequent second run of the apply immediately after the failure run. The failed service account was identified as "tainted" and so it was deleted and recreated. Almost no other activity occurred during this time, as all of the other resources had been created already. Despite the events happening more quickly than the previous run, the service account was created successfully.

File apply-03.log is a successful run after destroying and recreating the environment again. No errors occurred, and the same service account in question was created successfully. It was still created in the mix of other resources, and this was run very shortly after the run that produced the apply-01 log files.

I did some minor scrubbing of the log files, and focused on the HTTP Conversations. If you would like the full like files, I will need to more carefully scrub them.

Let me know if this is sufficient, or if you need anything further.

apply-01-a.log
apply-01-b.log
apply-02.log

@edwardmedia
Copy link
Contributor

edwardmedia commented Oct 12, 2021

@aaron-brown I see apply-01-b.log is normal. apply-01-a.log is almost identical in term of the POST / GET requests. It is not clear to me what was wrong based on what you provided. From the steps you described, did the issue only happen at the very beginning? Are you still able to repro it now? It is possible the API was experiencing an issue.

I can't repro the issue based on the config here. I do need a config that can be used to repro the issue for further investigation

@edwardmedia
Copy link
Contributor

@aaron-brown is this still an issue?

@edwardmedia
Copy link
Contributor

@aaron-brown closing this assuming it is no longer an issue

@github-actions
Copy link

github-actions bot commented Dec 9, 2021

I'm going to lock this issue because it has been closed for 30 days ⏳. This helps our maintainers find and focus on the active issues.
If you have found a problem that seems similar to this, please open a new issue and complete the issue template so we can capture all the details necessary to investigate further.

@github-actions github-actions bot locked as resolved and limited conversation to collaborators Dec 9, 2021
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
Projects
None yet
Development

No branches or pull requests

2 participants