Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix for issue #13538 #14020

Merged

Conversation

jhujasonw
Copy link
Contributor

@jhujasonw jhujasonw commented Jul 1, 2020

Community Note

  • Please vote on this pull request by adding a 👍 reaction to the original pull request comment to help the community and maintainers prioritize this request
  • Please do not leave "+1" or other comments that do not add relevant new information or questions, they generate extra noise for pull request followers and do not help prioritize the request

Closes #13538

Release note for CHANGELOG:


Output from acceptance testing:


$ make testacc TESTARGS='-run=TestAccAWSEksFargateProfile_Multi_Profile'
==> Checking that code complies with gofmt requirements...
TF_ACC=1 go test ./... -v -count 1 -parallel 20 -run=TestAccAWSEksFargateProfile_Multi_Profile -timeout 120m
?       github.com/terraform-providers/terraform-provider-aws   [no test files]
=== RUN   TestAccAWSEksFargateProfile_Multi_Profile
=== PAUSE TestAccAWSEksFargateProfile_Multi_Profile
=== CONT  TestAccAWSEksFargateProfile_Multi_Profile
--- PASS: TestAccAWSEksFargateProfile_Multi_Profile (1252.89s)
PASS
ok      github.com/terraform-providers/terraform-provider-aws/aws       1252.966s
testing: warning: no tests to run
PASS
ok      github.com/terraform-providers/terraform-provider-aws/aws/internal/flatmap      0.002s [no tests to run]
testing: warning: no tests to run
PASS
ok      github.com/terraform-providers/terraform-provider-aws/aws/internal/keyvaluetags 0.013s [no tests to run]
testing: warning: no tests to run
PASS
ok      github.com/terraform-providers/terraform-provider-aws/aws/internal/naming       0.012s [no tests to run]
?       github.com/terraform-providers/terraform-provider-aws/aws/internal/service/apigatewayv2/waiter  [no test files]
testing: warning: no tests to run
PASS
ok      github.com/terraform-providers/terraform-provider-aws/aws/internal/service/batch/equivalency    0.006s [no tests to run]
?       github.com/terraform-providers/terraform-provider-aws/aws/internal/service/ec2/waiter   [no test files]
?       github.com/terraform-providers/terraform-provider-aws/aws/internal/service/ecs/waiter   [no test files]
testing: warning: no tests to run
PASS
ok      github.com/terraform-providers/terraform-provider-aws/aws/internal/service/eks/token    0.006s [no tests to run]
?       github.com/terraform-providers/terraform-provider-aws/aws/internal/service/guardduty/waiter     [no test files]
?       github.com/terraform-providers/terraform-provider-aws/aws/internal/service/iam/waiter   [no test files]
?       github.com/terraform-providers/terraform-provider-aws/aws/internal/service/kinesisanalytics/waiter      [no test files]
?       github.com/terraform-providers/terraform-provider-aws/aws/internal/service/kms/waiter   [no test files]
?       github.com/terraform-providers/terraform-provider-aws/aws/internal/service/neptune/waiter       [no test files]
?       github.com/terraform-providers/terraform-provider-aws/aws/internal/service/rds/waiter   [no test files]
?       github.com/terraform-providers/terraform-provider-aws/aws/internal/service/secretsmanager/waiter        [no test files]
?       github.com/terraform-providers/terraform-provider-aws/aws/internal/service/servicediscovery/waiter      [no test files]
?       github.com/terraform-providers/terraform-provider-aws/aws/internal/service/workspaces/waiter    [no test files]
testing: warning: no tests to run
PASS
ok      github.com/terraform-providers/terraform-provider-aws/aws/internal/tfawsresource        0.013s [no tests to run]
?       github.com/terraform-providers/terraform-provider-aws/awsproviderlint   [no test files]
?       github.com/terraform-providers/terraform-provider-aws/awsproviderlint/helper/awsprovidertype/keyvaluetags       [no test files]
testing: warning: no tests to run
PASS
ok      github.com/terraform-providers/terraform-provider-aws/awsproviderlint/passes    0.012s [no tests to run]
testing: warning: no tests to run
PASS
ok      github.com/terraform-providers/terraform-provider-aws/awsproviderlint/passes/AWSAT001   0.013s [no tests to run]
testing: warning: no tests to run
PASS
ok      github.com/terraform-providers/terraform-provider-aws/awsproviderlint/passes/AWSR001    0.003s [no tests to run]
testing: warning: no tests to run
PASS
ok      github.com/terraform-providers/terraform-provider-aws/awsproviderlint/passes/AWSR002    0.011s [no tests to run]
testing: warning: no tests to run
PASS
ok      github.com/terraform-providers/terraform-provider-aws/awsproviderlint/passes/fmtsprintfcallexpr 0.003s [no tests to run]


@jhujasonw jhujasonw requested a review from a team July 1, 2020 18:15
@ghost ghost added size/XS Managed by automation to categorize the size of a PR. service/eks Issues and PRs that pertain to the eks service. needs-triage Waiting for first response or review from a maintainer. labels Jul 1, 2020
@bflad bflad added bug Addresses a defect in current functionality. proposal Proposes new design or functionality. and removed needs-triage Waiting for first response or review from a maintainer. proposal Proposes new design or functionality. labels Jul 1, 2020
Copy link
Contributor

@bflad bflad left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @jhujasonw 👋 Thank you for submitting this. Just a drive-by comment about this -- our typical approach for handling resource serialization needs is to utilize the mutexkv library (awsMutexKV in this codebase) keyed against something each would have in common, rather than introducing long retry loops which can be error prone and inconsistent in user experience. In this case, my suggestion would be to use a mutex such as the below in the Create and Delete functions before the API calls:

mutexKey := fmt.Sprintf("%s-fargate-profiles", d.Get("cluster_name").(string))
awsMutexKV.Lock(mutexKey)
defer awsMutexKV.Unlock(mutexKey)

It would be great to also introduce a covering acceptance test for this situation to prevent any regressions. Copy-paste-modify of an existing test and test configuration should get us close. Please see the Contributing Guide section on Acceptance Testing for more information.

@bflad bflad linked an issue Jul 1, 2020 that may be closed by this pull request
@jhujasonw
Copy link
Contributor Author

Hi @bflad,
I'm relatively new to the provider stuff, so I'm definitely happy to refactor how that is done. Mutex is definitely the Right Way. That should be a quick change. The acceptance test may take me a bit more to sort through the docs and figure out how you all would like that done, so I'll get on it. Thanks for the pointers. Let me know if there is anything else you would like me to adjust while I work though it. This bug is a particular headache for a terraform setup I am working with, so it's in my best interest to work through it quickly.

@jhujasonw
Copy link
Contributor Author

Looking over the acceptance testing doc, would it be a good idea to just add a second profile into the test that already exists? The destroy checks would pick up on it and if it errors out on creation of the second resource because of serialization issues, the acceptance test configuration apply would fail. Those are really the only two instances where we care to test serialization and the rest of the test would operate only on the first resource. I'm not trying to be lazy, but I am trying to simplify the tests for the eks fargate profiles and keep them readable. If this isn't the right approach, I can come up with a different plan.

@bflad
Copy link
Contributor

bflad commented Jul 2, 2020

@jhujasonw in general, we prefer new tests and test configurations to ensure existing configurations work as expected, even if it is more verbose. It also helps future maintainers narrow down issues easier, since the different tests can isolate different behaviors. 👍

@ghost ghost added tests PRs: expanded test coverage. Issues: expanded coverage, enhancements to test infrastructure. size/M Managed by automation to categorize the size of a PR. and removed size/XS Managed by automation to categorize the size of a PR. labels Jul 6, 2020
@jhujasonw
Copy link
Contributor Author

Changed the serialization method to use mutex locking and gave writing an acceptance test a go. Output from the acceptance test posted to the original comment.

@jhujasonw
Copy link
Contributor Author

So I have run into another issue with this. If you are removing multiple profiles that are actually running deployments, they take significantly longer (5 to 15 minutes, each) to destroy and you reach a timeout error on the ones waiting in line to be removed.

This PR fixes the issue of serialization, but perhaps bumping the timeout may still be something that needs to be considered? Or is there perhaps another way around that that I am not seeing.

@ivankatliarchuk
Copy link

Can we merge this in? Edge cases can be solved with follow up PR in my opinion. What I can see, EKS Fargate management and lifecycle itself is far from ideal, e.g. idea is great, when implementation is horrible, an I hope the AWS team is going to make it more automation-friendly in the future releases.

@jhujasonw jhujasonw requested a review from bflad September 3, 2020 18:45
Copy link
Contributor Author

@jhujasonw jhujasonw left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Made the changes in a later commit.

@jhujasonw
Copy link
Contributor Author

Please let me know if you would like me to work on this any further before considering for merge.

@bflad bflad self-assigned this Nov 18, 2020
@bflad bflad added this to the v3.16.0 milestone Nov 18, 2020
Copy link
Contributor

@bflad bflad left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry for the delay, @jhujasonw, this looks good 🚀

Output from acceptance testing:

--- PASS: TestAccAWSEksFargateProfile_basic (1143.84s)
--- PASS: TestAccAWSEksFargateProfile_disappears (1163.66s)
--- PASS: TestAccAWSEksFargateProfile_Multi_Profile (1456.46s)
--- PASS: TestAccAWSEksFargateProfile_Selector_Labels (1446.71s)
--- PASS: TestAccAWSEksFargateProfile_Tags (1185.36s)

defer awsMutexKV.Unlock(mutexKey)

// Creation of profiles must be serialized, and can take a while, increase the timeout to a long time.
err := resource.Retry(30*time.Minute, func() *resource.RetryError {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since the mutex handles the timing issue -- going to revert this back to the 2 minute IAM propagation timeout.

@bflad bflad merged commit b93d899 into hashicorp:master Nov 18, 2020
bflad added a commit that referenced this pull request Nov 18, 2020
@ghost
Copy link

ghost commented Nov 18, 2020

This has been released in version 3.16.0 of the Terraform AWS provider. Please see the Terraform documentation on provider versioning or reach out if you need any assistance upgrading.

For further feature requests or bug reports with this functionality, please create a new GitHub issue following the template for triage. Thanks!

@ghost
Copy link

ghost commented Dec 18, 2020

I'm going to lock this issue because it has been closed for 30 days ⏳. This helps our maintainers find and focus on the active issues.

If you feel this issue should be reopened, we encourage creating a new issue linking back to this one for added context. Thanks!

@ghost ghost locked as resolved and limited conversation to collaborators Dec 18, 2020
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
bug Addresses a defect in current functionality. service/eks Issues and PRs that pertain to the eks service. size/M Managed by automation to categorize the size of a PR. tests PRs: expanded test coverage. Issues: expanded coverage, enhancements to test infrastructure.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

AWS Fargate profiles cannot delete in parallel aws_eks_fargate_profile inconsistent result
3 participants