Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add workaround for IAM delays when creating and trying to use s2s auth policies #4478

Open
ocofaigh opened this issue Apr 5, 2023 · 3 comments
Labels
service/IAM Issues related to IAM service/Object Storage Issues related to Cloud Object Storage

Comments

@ocofaigh
Copy link
Contributor

ocofaigh commented Apr 5, 2023

  • We are using ibm_iam_authorization_policy here to create a service to a service auth policy between KMS <-> COS
  • Intermittently when attempting to create an encrypted COS bucket here using ibm_cos_bucket, it fails with the following error:
 2023/04/01 00:11:58 Terraform apply | Error: ServiceNotAuthorized: The specified COS Service Instance does not have sufficient permissions to access the resource associated with the KMS key CRN.
 2023/04/01 00:11:58 Terraform apply | 	status code: 401, request id: 71bc96a8-4cc5-445f-84a3-79d127630af1, host id: 
 2023/04/01 00:11:58 Terraform apply | 
 2023/04/01 00:11:58 Terraform apply |   with module.landing_zone.ibm_cos_bucket.buckets["workload-bucket"],
 2023/04/01 00:11:58 Terraform apply |   on ../../cos.tf line 74, in resource "ibm_cos_bucket" "buckets":
 2023/04/01 00:11:58 Terraform apply |   74: resource "ibm_cos_bucket" "buckets" {
 2023/04/01 00:11:58 Terraform apply | 
 2023/04/01 00:11:58 Terraform apply | 
 2023/04/01 00:11:58 Terraform apply | Error: ServiceNotAuthorized: The specified COS Service Instance does not have sufficient permissions to access the resource associated with the KMS key CRN.
 2023/04/01 00:11:58 Terraform apply | 	status code: 401, request id: 8d1bb913-06ed-40f3-a5bd-8067b76e5854, host id: 
 2023/04/01 00:11:58 Terraform apply | 
 2023/04/01 00:11:58 Terraform apply |   with module.landing_zone.ibm_cos_bucket.buckets["management-bucket"],
 2023/04/01 00:11:58 Terraform apply |   on ../../cos.tf line 74, in resource "ibm_cos_bucket" "buckets":
 2023/04/01 00:11:58 Terraform apply |   74: resource "ibm_cos_bucket" "buckets" {
 2023/04/01 00:11:58 Terraform apply | 
 2023/04/01 00:11:58 Terraform apply | 
 2023/04/01 00:11:58 Terraform apply | Error: ServiceNotAuthorized: The specified COS Service Instance does not have sufficient permissions to access the resource associated with the KMS key CRN.
 2023/04/01 00:11:58 Terraform apply | 	status code: 401, request id: 0ba8e3d4-787c-4a41-b89e-ecd9dcca4e2a, host id: 
 2023/04/01 00:11:58 Terraform apply | 
 2023/04/01 00:11:58 Terraform apply |   with module.landing_zone.ibm_cos_bucket.buckets["atracker-bucket"],
 2023/04/01 00:11:58 Terraform apply |   on ../../cos.tf line 74, in resource "ibm_cos_bucket" "buckets":
 2023/04/01 00:11:58 Terraform apply |   74: resource "ibm_cos_bucket" "buckets" {
  • This failure seems to be because of a delay with IAM when creating the S2S auth policy because if I wait a little bit and retry, the bucket gets created successfully.

Is it possible to add some workaround to this delay in the provider code. Perhaps a retry, or some extra validation when creating an auth policy that indeed the policy is ready for use?

My suspicion is that this is an IAM database replication issue where the auth policy exists on one database node, but is not fully replicated to the other yet, as we have seen something similar occur for other use cases too.

Community Note

  • Please vote on this issue by adding a 👍 reaction to the original issue to help the community and maintainers prioritize this request
  • Please do not leave "+1" or other comments that do not add relevant new information or questions, they generate extra noise for issue followers and do not help prioritize the request
  • If you are interested in working on this issue or have submitted a pull request, please leave a comment

Terraform CLI and Terraform IBM Provider Version

Affected Resource(s)

  • ibm_iam_authorization_policy

Terraform Configuration Files

Please include all Terraform configurations required to reproduce the bug. Bug reports without a functional reproduction may be closed without investigation.

# Copy-paste your Terraform configurations here - for large Terraform configs,
# please share a link to the ZIP file.

Debug Output

Panic Output

Expected Behavior

Actual Behavior

Steps to Reproduce

  1. terraform apply

Important Factoids

References

  • #0000
@ocofaigh
Copy link
Contributor Author

ocofaigh commented Jun 2, 2023

@vivekj3 I am tagging you since you created a fix for a very similar issue (same root cause actually) in #4556

For that issue, the flow was:

  1. auth policy gets created using POST and ends up on a particular backend cloudant instance
  2. terraform does a GET to check auth policy but the GET ends up querying a different cloudant instance where the data has not been replicated to yet and so fail.

The fix you added in 4556 will now retry the GET and hopefully pass on retry when data is replicated. I am very thankful for that fix, but here is a second use case that needs to be solved:

  1. auth policy gets created using POST and ends up on a particular backend cloudant instance
  2. the GET to check auth policy exists also passes (perhaps GET occurs on same cloudant instance)
  3. terraform proceeds with some process that requires the auth policy to be in place, but it fails with a 401 error. Some examples of the error:

Error: ServiceNotAuthorized: The specified COS Service Instance does not have sufficient permissions to access the resource associated with the KMS key CRN.

Error creating database instance: Please contact the Service Provider for this error. [400, Bad Request] We were unable to complete your request: Key not found. Databases for MongoDB may not be authorized to access the KMS instance selected for disk encryption.

My guess is that again the auth policy has not been replicated to all backend cloudant instances, and so we see this misleading error. If I wait a few seconds and re-try the terraform apply, it always works.

As a workaround in our terraform code, we have actually added a 30 second sleep after any time we create an auth policy before attempting to use it. However we have many many modules and instead of adding a workaround to all modules, I think maybe something can be done in the provider code? Perhaps also add a sleep any time an auth policy is created? Thoughts?

@IBM-diksha
Copy link
Collaborator

@ocofaigh Does this issue still exist?

@ocofaigh
Copy link
Contributor Author

@IBM-diksha yes we still intermittently see this issue as per #4478 (comment)

ocofaigh added a commit to terraform-ibm-modules/terraform-ibm-landing-zone-vpc that referenced this issue Aug 9, 2023
ocofaigh added a commit to terraform-ibm-modules/terraform-ibm-landing-zone-vsi that referenced this issue Aug 9, 2023
ocofaigh added a commit to terraform-ibm-modules/terraform-ibm-data-engine that referenced this issue Aug 9, 2023
ocofaigh added a commit to terraform-ibm-modules/terraform-ibm-client-to-site-vpn that referenced this issue Aug 9, 2023
ocofaigh added a commit to terraform-ibm-modules/terraform-ibm-icd-edb that referenced this issue Aug 9, 2023
ocofaigh added a commit to terraform-ibm-modules/terraform-ibm-event-streams that referenced this issue Aug 10, 2023
ocofaigh added a commit to terraform-ibm-modules/terraform-ibm-icd-redis that referenced this issue Aug 11, 2023
ocofaigh pushed a commit to terraform-ibm-modules/terraform-ibm-icd-rabbitmq that referenced this issue Aug 15, 2023
ocofaigh pushed a commit to terraform-ibm-modules/terraform-ibm-icd-postgresql that referenced this issue Aug 18, 2023
ocofaigh added a commit to terraform-ibm-modules/terraform-ibm-icd-etcd that referenced this issue Dec 8, 2023
ocofaigh pushed a commit to terraform-ibm-modules/terraform-ibm-cos that referenced this issue Jan 26, 2024
… for this provider [issue](IBM-Cloud/terraform-provider-ibm#4478). NOTE: Upgrades from earlier to version to this version may show a time_sleep.wait_for_authorization_policy being deleted if they are skipping authorisation policy creation. This is expected, since there is no need to delay if the authorisation policy already exists. (#518)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
service/IAM Issues related to IAM service/Object Storage Issues related to Cloud Object Storage
Projects
None yet
Development

No branches or pull requests

2 participants