-
Notifications
You must be signed in to change notification settings - Fork 584
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
🐛 Attempt to clean up CF IAM users #5242
Conversation
/test ? |
@nrb: The following commands are available to trigger required jobs:
The following commands are available to trigger optional jobs:
Use
In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. |
/test pull-cluster-api-provider-aws-e2e |
/test pull-cluster-api-provider-aws-e2e |
Probably needs to be rebased onto #5240 |
Periodic tests seemed to get into a failure loop because an IAM user with the same name already existed, which is not allowed. This then failed the entire CloudFoundation stack. Depite the stack claiming to have been rolled back, the next iteration would run into the same problem. This change includes IAM users in the list of resources we need to specifically delete in the case of a CloudFoundation failure, just in case they've leaked Signed-off-by: Nolan Brubaker <[email protected]>
/test pull-cluster-api-provider-aws-e2e |
/test pull-cluster-api-provider-aws-test VPC limit was reached for this test. |
@nrb: The following test failed, say
Full PR test history. Your PR dashboard. Please help us cut down on flakes by linking to an open issue when you hit one in your PR. Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here. |
This looks good to me. It also points to a failure in aws-janitor potentially as that should clean up a AWS account that has a failed test. |
@richardcase Yeah, I asked about the janitor on Slack. The IAM code only looks at roles and instance policies (https://github.com/kubernetes-sigs/boskos/tree/master/aws-janitor/resources). I'm suspecting that what could be happening is that multiple periodics are using CF at the same time and stepping on each other. With your account logging PR, we can double check that in the future. |
I'm running in to the same issue regularly with PR E2E tests, thanks for fixing this! /lgtm |
Removing WIP on this as I think it's ready, excepting problems with #5252 |
Since this is tying up LF resources fairly consistently, I'm going to self-approve. /approve |
/approve |
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: dlipovetsky, nrb The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
What type of PR is this?
/kind failing-test
What this PR does / why we need it:
Periodic tests seemed to get into a failure loop because an IAM user
with the same name already existed, which is not allowed. This then
failed the entire CloudFoundation stack. Depite the stack claiming to
have been rolled back, the next iteration would run into the same
problem.
This change includes IAM users in the list of resources we need to
specifically delete in the case of a CloudFoundation failure, just in
case they've leaked
Special notes for your reviewer:
The periodic tests at https://testgrid.k8s.io/sig-cluster-lifecycle-cluster-api-provider-aws#periodic-e2e-release-2-7 were failing roughly every other day between Nov 23, 2024 to Dec 6, 2024.
We'd seen failures prior to that, but testgrid's history doesn't appear to go that far back.
Nearly all the failures within the
capa-e2e.[SynchronizedBeforeSuite]
function contained this log entry:Checklist:
Release note:
-->