Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

HCPF-1961: Implement retry mechanism for Get and Set Policy #870

Merged
merged 13 commits into from
Jul 2, 2024

Conversation

delores-hashicorp
Copy link
Contributor

@delores-hashicorp delores-hashicorp commented Jun 24, 2024

🛠️ Description

🏗️ Acceptance tests

  • Are there any feature flags that are required to use this functionality?
  • Have you added an acceptance test for the functionality being added?
  • Have you run the acceptance tests on this branch?

Output from acceptance testing:

$ make testacc TESTARGS='-run=TestAccVaultSecretsAppIamPolicyResource'

==> Checking that code complies with gofmt requirements...
golangci-lint run --config ./golangci-config.yml 
TF_ACC=1 go test ./internal/... -v -run=TestAccVaultSecretsAppIamPolicyResource -timeout 360m -parallel=10
PASS
ok      github.com/hashicorp/terraform-provider-hcp/internal/provider/resourcemanager     1.572s [no tests to run]
=== RUN   TestAccVaultSecretsAppIamPolicyResource
--- PASS: TestAccVaultSecretsAppIamPolicyResource (10.26s)
$ make testacc TESTARGS='-run=TestAccVaultSecretsAppIamBindingResource'

ok      github.com/hashicorp/terraform-provider-hcp/internal/provider/resourcemanager     1.390s [no tests to run]
=== RUN   TestAccVaultSecretsAppIamBindingResource
--- PASS: TestAccVaultSecretsAppIamBindingResource (8.67s)
$ make testacc TESTARGS='-run=TestAccProjectIamBindingResource'

=== RUN   TestAccProjectIamBindingResource
--- PASS: TestAccProjectIamBindingResource (15.90s)
PASS
ok      github.com/hashicorp/terraform-provider-hcp/internal/provider/resourcemanager   16.784s
$ make testacc TESTARGS='-run=TestAccProjectIamPolicyResource'

=== RUN   TestAccProjectIamPolicyResource
--- PASS: TestAccProjectIamPolicyResource (10.59s)
$ make testacc TESTARGS='-run=TestAccOrganizationIamBindingResource'

=== RUN   TestAccOrganizationIamBindingResource
--- PASS: TestAccOrganizationIamBindingResource (15.02s)

@@ -64,7 +67,7 @@ func (u *orgIAMPolicyUpdater) GetResourceIamPolicy(ctx context.Context) (*models
params.ID = u.client.Config.OrganizationID
res, err := u.client.Organization.OrganizationServiceGetIamPolicy(params, nil)
if err != nil {
diags.AddError("failed to retrieve organization IAM policy", err.Error())
diags.Append(customdiags.NewErrorDiagnosticWithErrorCode("failed to retrieve organization IAM policy", err.Error(), status.Code(err)))
Copy link
Contributor

@dadgar dadgar Jun 24, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So status.Code(err) won't work (99% sure). The issue is that the status.Code() method is expecting the error type to be a grpc/status type: https://github.com/grpc/grpc-go/blob/v1.64.0/status/status.go#L96-L145

Since this error is coming from an HTTP request it is not being typed as a Status error. Instead it will be typed as: https://github.com/hashicorp/hcp-sdk-go/blob/main/clients/cloud-resource-manager/stable/2019-12-10/client/organization_service/organization_service_get_iam_policy_responses.go#L114-L118

The NewErrorDiagnosticWithErrorCode() could maybe be replaced to expect a type that has a method of GetPayload() *cloud.GoogleRPCStatus or Code() int and use that instead?

Alternatively the error handling here could retrieve and pass the code by converting the error to the underlying type that has the Code() method.

@delores-hashicorp delores-hashicorp marked this pull request as ready for review June 25, 2024 13:17
@delores-hashicorp delores-hashicorp requested review from a team as code owners June 25, 2024 13:17
// Apply the policy
ep, diags = u.SetResourceIamPolicy(ctx, FromMap(ep.Etag, bindings))
if diags.HasError() {
if customdiags.HasConflictError(diags) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you add a comment here explaining that the policy object has changed and the etag is no longer valid and how the retry works.

Comment on lines 234 to 235
time.Sleep(backoff)
backoff *= 2
if backoff > maxBackoff {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hm if backoff is at maxBackoff, we will still sleep maxBackoff and then fail. Should we check we are over the max before sleeping.

ep, diags = u.SetResourceIamPolicy(ctx, FromMap(ep.Etag, bindings))
if diags.HasError() {
if customdiags.HasConflictError(diags) {
log.Printf("[DEBUG]: Operation failed due to conflicts. Operation will be restarted after %s\n", backoff)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

With log package you shouldn't need the newline character: https://go.dev/play/p/PCT9cNfOYGu

)

// ErrorDiagnosticWithErrorCode is an error diagnostic that stored the error code.
type ErrorDiagnosticWithErrorCode struct {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Two comments:

  1. Can we be explicit that this is the HTTPStatusCode. Since there is both the gRPC Status Code and the HTTP Status Code, it is slightly ambiguous what code to check.
  2. Since we are in a package called customdiags, what do you think about shortening the struct name to something like ErrorHTTPStatusCode since it will be imported as customdiags.ErrorHTTPStatusCode

type ErrorDiagnosticWithErrorCode struct {
detail string
summary string
errorCode int
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this be public so the code can be inspected?

Copy link
Contributor

@dadgar dadgar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we add documentation to:

// Replaces the existing IAM Policy attached to a resource.
SetResourceIamPolicy(ctx context.Context, policy *models.HashicorpCloudResourcemanagerPolicy) (*models.HashicorpCloudResourcemanagerPolicy, diag.Diagnostics)
to show how implementors should return the new error you added on conflict.

if customdiags.HasConflictError(diags) {
// Policy object has changed since it was last gotten and the etag is now different.
// Continuously retry getting and setting the policy with an increasing backoff period until the maximum backoff period is reached.
log.Printf("[DEBUG]: Operation failed due to conflicts. Operation will be restarted after %s", backoff)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this log line be below the maxBackoff if statement? That way it either say there operation will be retried or says it has aborted, but not both.

@delores-hashicorp delores-hashicorp force-pushed the bugfix/retry-get-policy branch from c3f345c to 6b79745 Compare June 28, 2024 11:01
Copy link
Contributor

@itsjaspermilan itsjaspermilan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just chiming in here from the identity team. Looks like our tests still pass with these changes. Was there any immediate action that we needed to take on our end to accommodate these changes?

@delores-hashicorp Did the same diags.Append(customdiags.NewErrorHTTPStatusCode(... bit need to be applied to the Group Policy resources as well?

Tests passing:

=== RUN   TestAccGroupIamPolicyResource
--- PASS: TestAccGroupIamPolicyResource (11.43s)
=== RUN   TestAccGroupIamBindingResource
--- PASS: TestAccGroupIamBindingResource (14.36s)
PASS
coverage: 70.2% of statements
ok      github.com/hashicorp/terraform-provider-hcp/internal/provider/iam       156.022s

@@ -20,12 +20,14 @@ import (
// Implementations should be created per resource and should keep track of the
// resource identifier.
type ResourceIamUpdater interface {
// Fetch the existing IAM policy attached to a resource.
// GetResourceIamPolicy fetches the existing IAM policy attached to a resource.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for cleaning up the comments ✨

@delores-hashicorp
Copy link
Contributor Author

Just chiming in here from the identity team. Looks like our tests still pass with these changes. Was there any immediate action that we needed to take on our end to accommodate these changes?

@delores-hashicorp Did the same diags.Append(customdiags.NewErrorHTTPStatusCode(... bit need to be applied to the Group Policy resources as well?

Tests passing:

=== RUN   TestAccGroupIamPolicyResource
--- PASS: TestAccGroupIamPolicyResource (11.43s)
=== RUN   TestAccGroupIamBindingResource
--- PASS: TestAccGroupIamBindingResource (14.36s)
PASS
coverage: 70.2% of statements
ok      github.com/hashicorp/terraform-provider-hcp/internal/provider/iam       156.022s

Thanks for the catch. I didn't realize it had been implemented

@delores-hashicorp delores-hashicorp force-pushed the bugfix/retry-get-policy branch from 44aeceb to c7d9a4c Compare July 1, 2024 09:46
@delores-hashicorp delores-hashicorp merged commit a99d1a0 into main Jul 2, 2024
7 checks passed
@delores-hashicorp delores-hashicorp deleted the bugfix/retry-get-policy branch July 2, 2024 15:38
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants