-
Notifications
You must be signed in to change notification settings - Fork 1.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
test/metrics: simplify OOM test #7036
Conversation
Skipping CI for Draft Pull Request. |
Codecov Report
Additional details and impacted files@@ Coverage Diff @@
## main #7036 +/- ##
==========================================
- Coverage 49.12% 49.03% -0.10%
==========================================
Files 136 133 -3
Lines 15496 15471 -25
==========================================
- Hits 7613 7586 -27
+ Misses 6981 6978 -3
- Partials 902 907 +5 |
/test kata-containers |
@haircommander were you able to make a new test image, as noted in #6950 (comment)? I still see failures from bats in fedora-integration job, which is full of:
|
the images will automatically update once this merges. the update happens daily at some point. I think @sohankunkerkar knows how to manually triggger it |
Hmm... it looks like that PR was merged a couple of weeks ago. Ideally, that change should have been picked up by the periodic job on that day itself. |
/retest |
Built new images for CI. Let's see if this run passes the integration tests. |
A friendly reminder that this PR had no activity for 30 days. |
@kolyshkin given opencontainers/runc#3932 is in 1.8, instead of this should we update CRI-O to have 1.8 and drop the |
@haircommander I think that would be a second step. This PR still makes lots of sense because without it the test waits 15 minutes for the condition that is not going to happen). I will change |
while [ $CNT -le 100 ]; do | ||
CNT=$((CNT + 1)) | ||
OUTPUT=$(crictl inspect --output yaml "$CTR_ID") | ||
if [[ "$OUTPUT" == *"OOMKilled"* ]]; then |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This code (together with the sleep 10
below) waits for the OOMKilled condition that may never happen. In case there's no OOM, the test wastes 100 * 10 seconds.
The correct logic, as implemented in this PR, is to wait until container has exited, and check if OOMKilled is set:
# Wait for container to OOM.
EXPECTED_EXIT_STATUS=137 wait_until_exit "$CTR_ID"
crictl inspect "$CTR_ID" | jq -e '.status.reason == "OOMKilled"'
https://github.com/cri-o/cri-o/actions/runs/5658658156/job/15330451849?pr=7036 |
Currently, this test case keeps waiting for the container that was already killed in case OOM detection has failed. This is a huge (15 minutes) waste of time. To fix, change the logic to wait for container exit, and then check the OOM. This also makes the test code more readable. Also, add some debug info in case the test has failed. Also, add a sleep before the memory eater to give the container some time to live. Signed-off-by: Kir Kolyshkin <[email protected]>
fixed |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
/retest |
1 similar comment
/retest |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks!
/retest
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: kolyshkin, saschagrunert, sohankunkerkar The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
What type of PR is this?
/kind ci
What this PR does / why we need it:
Currently, this test case keeps waiting for the container that was
already killed in case OOM detection has failed. This is a huge (15
minutes) waste of time.
To fix, change the logic to wait for container exit, and then check the
OOM. This also makes the test code more readable.
Also, add some debug info in case the test has failed.
Also, add a 1 second sleep to the container command, which apparently
improves the chances for the test to succeed.
Which issue(s) this PR fixes:
None
Related to: #7035
Special notes for your reviewer:
Does this PR introduce a user-facing change?