-
Notifications
You must be signed in to change notification settings - Fork 285
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Shuffle hardware inventory for tinkerbell before reservation #8264
Conversation
Signed-off-by: Rahul Ganesh <[email protected]>
Codecov ReportAll modified and coverable lines are covered by tests ✅
Additional details and impacted files@@ Coverage Diff @@
## main #8264 +/- ##
==========================================
+ Coverage 73.42% 73.48% +0.06%
==========================================
Files 578 578
Lines 36054 36489 +435
==========================================
+ Hits 26471 26814 +343
- Misses 7905 7956 +51
- Partials 1678 1719 +41 ☔ View full report in Codecov by Sentry. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
overall lgtm, only nit comments
Have we planned already the work to clean the boot entries? I'm totally ok merging this, it's a good patch, but it doesn't guarantee the problem won't happen again. In fact if I'm understanding this correctly, it will 100% happen, it will just take longer. And it doesn't seem like an easy issue to diagnose.
internal/test/e2e/run.go
Outdated
@@ -592,3 +596,10 @@ func logTinkerbellTestHardwareInfo(conf *instanceRunConf, action string) { | |||
} | |||
conf.Logger.V(1).Info(action+" hardware for TestRunner", "hardwarePool", strings.Join(hardwareInfo, ", ")) | |||
} | |||
|
|||
func shuffleHardwareInventory(invCatalogue *hardwareCatalogue) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: why the use of inventory and catalogue? aren't they representing the same thing?
@@ -217,6 +218,9 @@ func RunTests(conf instanceRunConf, inventoryCatalogue map[string]*hardwareCatal | |||
} else { | |||
hardwareCatalogue = inventoryCatalogue[nonAirgappedHardware] | |||
} | |||
conf.Logger.Info("Shuffling hardware inventory for tinkerbell") | |||
// shuffle hardware to introduce randomness during hardware reservation. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: I would expand more on why randomness is desired. We don't do this to introduce randomness, we do this to avoid picking up the same machines on every run. Randomness is just the mechanism to achieve that goal.
internal/test/e2e/run.go
Outdated
@@ -592,3 +596,10 @@ func logTinkerbellTestHardwareInfo(conf *instanceRunConf, action string) { | |||
} | |||
conf.Logger.V(1).Info(action+" hardware for TestRunner", "hardwarePool", strings.Join(hardwareInfo, ", ")) | |||
} | |||
|
|||
func shuffleHardwareInventory(invCatalogue *hardwareCatalogue) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why not make this a method in hardwareCatalogue
? It's manipulating the internal extructure, it seems like a good idea to abstract that in a method instead of exposing it like this.
I fact, don't you need to use the mutex? If I'm not mistaken the hardwareCatalogue
is shared between runner threads and all of them are going to try to call this method concurrently.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good catch! I should have seen this.
Signed-off-by: Rahul Ganesh <[email protected]>
I think long term, we were looking into a solution to automate the cleanup of hardware. Jacob has a runbook on how to do it manually and we still have to figure some nitbits there before it could be fully automated. I can create a issue on CI board to see if that has to be tracked. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/lgtm
/approve |
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: rahulbabu95 The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
/cherry-pick release-0.20 |
@sp1999: new pull request created: #8898 In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
Description of changes:
Shuffle hardware inventory before reserving hardware for Tinkerbell E2E tests. As we run quick e2e more frequently the boot entries on the boot list get populated quickly leading to an error when there's no space left to add to that boot list. Ideally we should have an automation around removing the boot entries periodically on the BMCs but until then we should try reserving the hardware in random order for quick test to not burden the boot entries on the first few hardware. Also, with randomness we reduce the likelihood of picking up an erroneous hardware in case during repetitive quick E2E runs.
Testing (if applicable):
Kicked of run against my branch and verified that the hardware reserved for the test were different from the regular hardware (
eksa-ci01 to eksa-ci12
) that gets reserved at present.Documentation added/planned (if applicable):
By submitting this pull request, I confirm that you can use, modify, copy, and redistribute this contribution, under the terms of your choice.