Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

test/metrics: simplify OOM test #7036

Merged
merged 1 commit into from
Jul 26, 2023
Merged

Conversation

kolyshkin
Copy link
Collaborator

@kolyshkin kolyshkin commented Jun 9, 2023

What type of PR is this?

/kind ci

What this PR does / why we need it:

Currently, this test case keeps waiting for the container that was
already killed in case OOM detection has failed. This is a huge (15
minutes) waste of time.

To fix, change the logic to wait for container exit, and then check the
OOM. This also makes the test code more readable.

Also, add some debug info in case the test has failed.

Also, add a 1 second sleep to the container command, which apparently
improves the chances for the test to succeed.

Which issue(s) this PR fixes:

None

Related to: #7035

Special notes for your reviewer:

Does this PR introduce a user-facing change?

NONE

@openshift-ci openshift-ci bot added do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. release-note-none Denotes a PR that doesn't merit a release note. labels Jun 9, 2023
@openshift-ci
Copy link
Contributor

openshift-ci bot commented Jun 9, 2023

Skipping CI for Draft Pull Request.
If you want CI signal for your change, please convert it to an actual PR.
You can still manually trigger a test run with /test all

@openshift-ci openshift-ci bot added kind/ci Categorizes issue or PR as related to CI dco-signoff: yes Indicates the PR's author has DCO signed all their commits. approved Indicates a PR has been approved by an approver from all required OWNERS files. labels Jun 9, 2023
@codecov
Copy link

codecov bot commented Jun 9, 2023

Codecov Report

Merging #7036 (538870a) into main (1c2ea67) will decrease coverage by 0.10%.
The diff coverage is n/a.

❗ Current head 538870a differs from pull request most recent head 927843e. Consider uploading reports for the commit 927843e to get more accurate results

Additional details and impacted files
@@            Coverage Diff             @@
##             main    #7036      +/-   ##
==========================================
- Coverage   49.12%   49.03%   -0.10%     
==========================================
  Files         136      133       -3     
  Lines       15496    15471      -25     
==========================================
- Hits         7613     7586      -27     
+ Misses       6981     6978       -3     
- Partials      902      907       +5     

test/metrics.bats Outdated Show resolved Hide resolved
@kolyshkin kolyshkin marked this pull request as ready for review June 9, 2023 19:24
@kolyshkin kolyshkin requested a review from mrunalp as a code owner June 9, 2023 19:24
@openshift-ci openshift-ci bot removed the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Jun 9, 2023
@openshift-ci openshift-ci bot requested review from QiWang19 and sohankunkerkar June 9, 2023 19:25
@littlejawa
Copy link
Contributor

/test kata-containers

@kolyshkin
Copy link
Collaborator Author

@haircommander were you able to make a new test image, as noted in #6950 (comment)? I still see failures from bats in fedora-integration job, which is full of:

/usr/go/src/github.com/cri-o/cri-o/test/./helpers.bash: line 4: bats_require_minimum_version: command not found

@haircommander
Copy link
Member

the images will automatically update once this merges. the update happens daily at some point. I think @sohankunkerkar knows how to manually triggger it

test/metrics.bats Outdated Show resolved Hide resolved
@sohankunkerkar
Copy link
Member

The images will automatically update once this merges. the update happens daily at some point. I think @sohankunkerkar knows how to manually trigger it

Hmm... it looks like that PR was merged a couple of weeks ago. Ideally, that change should have been picked up by the periodic job on that day itself.

@sohankunkerkar
Copy link
Member

/retest

@sohankunkerkar
Copy link
Member

Built new images for CI. Let's see if this run passes the integration tests.

@sohankunkerkar
Copy link
Member

@github-actions
Copy link

A friendly reminder that this PR had no activity for 30 days.

@github-actions github-actions bot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Jul 15, 2023
@sohankunkerkar sohankunkerkar removed the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Jul 19, 2023
@haircommander
Copy link
Member

@kolyshkin given opencontainers/runc#3932 is in 1.8, instead of this should we update CRI-O to have 1.8 and drop the inactive-or-failed?

@kolyshkin
Copy link
Collaborator Author

@kolyshkin given opencontainers/runc#3932 is in 1.8, instead of this should we update CRI-O to have 1.8 and drop the inactive-or-failed?

@haircommander I think that would be a second step. This PR still makes lots of sense because without it the test waits 15 minutes for the condition that is not going to happen).

I will change sleep 1 to sleep 5 now.

Comment on lines -106 to -109
while [ $CNT -le 100 ]; do
CNT=$((CNT + 1))
OUTPUT=$(crictl inspect --output yaml "$CTR_ID")
if [[ "$OUTPUT" == *"OOMKilled"* ]]; then
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This code (together with the sleep 10 below) waits for the OOMKilled condition that may never happen. In case there's no OOM, the test wastes 100 * 10 seconds.

The correct logic, as implemented in this PR, is to wait until container has exited, and check if OOMKilled is set:

	# Wait for container to OOM.
	EXPECTED_EXIT_STATUS=137 wait_until_exit "$CTR_ID"
	crictl inspect "$CTR_ID" | jq -e '.status.reason == "OOMKilled"'

@sohankunkerkar
Copy link
Member

https://github.com/cri-o/cri-o/actions/runs/5658658156/job/15330451849?pr=7036
this needs to be fixed.

Currently, this test case keeps waiting for the container that was
already killed in case OOM detection has failed. This is a huge (15
minutes) waste of time.

To fix, change the logic to wait for container exit, and then check the
OOM. This also makes the test code more readable.

Also, add some debug info in case the test has failed.

Also, add a sleep before the memory eater to give the container some
time to live.

Signed-off-by: Kir Kolyshkin <[email protected]>
@kolyshkin
Copy link
Collaborator Author

Copy link
Member

@sohankunkerkar sohankunkerkar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@openshift-ci openshift-ci bot added the lgtm Indicates that a PR is ready to be merged. label Jul 25, 2023
@sohankunkerkar
Copy link
Member

/retest

1 similar comment
@sohankunkerkar
Copy link
Member

/retest

Copy link
Member

@saschagrunert saschagrunert left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks!

/retest

@openshift-ci
Copy link
Contributor

openshift-ci bot commented Jul 26, 2023

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: kolyshkin, saschagrunert, sohankunkerkar

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:
  • OWNERS [kolyshkin,saschagrunert]

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@openshift-merge-robot openshift-merge-robot merged commit 846fbcb into cri-o:main Jul 26, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. dco-signoff: yes Indicates the PR's author has DCO signed all their commits. kind/ci Categorizes issue or PR as related to CI lgtm Indicates that a PR is ready to be merged. release-note-none Denotes a PR that doesn't merit a release note.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants