Skip to content

Commit

Permalink
test/metrics: simplify oom test, add debug
Browse files Browse the repository at this point in the history
Currently, this test case keeps waiting for the container that was
already killed in case OOM detection has failed. This is a huge (15
minutes) waste of time.

To fix, change the logic to wait for container exit, and then check the
OOM. This also makes the test code more readable.

Also, add some debug info in case the test has failed.

Also, add a sleep before the memory eater to give the container some
time to live.

Signed-off-by: Kir Kolyshkin <[email protected]>
  • Loading branch information
kolyshkin committed Jul 25, 2023
1 parent 1c2ea67 commit 927843e
Showing 1 changed file with 24 additions and 11 deletions.
35 changes: 24 additions & 11 deletions test/metrics.bats
Original file line number Diff line number Diff line change
Expand Up @@ -97,21 +97,34 @@ function teardown() {
CONTAINER_ENABLE_METRICS=true CONTAINER_METRICS_PORT=$PORT start_crio

jq '.linux.resources.memory_limit_in_bytes = 15728640
| .command = ["sh", "-c", "dd if=/dev/zero of=/dev/null bs=20M"]' \
| .command = ["sh", "-c", "sleep 5; dd if=/dev/zero of=/dev/null bs=20M"]' \
"$TESTDATA/container_config.json" > "$TESTDIR/config.json"
CTR_ID=$(crictl run "$TESTDIR/config.json" "$TESTDATA/sandbox_config.json")

# Wait for container to OOM
CNT=0
while [ $CNT -le 100 ]; do
CNT=$((CNT + 1))
OUTPUT=$(crictl inspect --output yaml "$CTR_ID")
if [[ "$OUTPUT" == *"OOMKilled"* ]]; then
break
# Wait for container to OOM.
EXPECTED_EXIT_STATUS=137 wait_until_exit "$CTR_ID"
if ! crictl inspect "$CTR_ID" | jq -e '.status.reason == "OOMKilled"'; then
# The container has exited but it was not OOM-killed.
# Provide some details to debug the issue.
echo "--- crictl inspect :: ---"
crictl inspect --output yaml "$CTR_ID" | grep -A40 'status:'
echo "--- --- ---"
# Most probably it's a conmon bug.
if [ "$RUNTIME_TYPE" == "oci" ]; then
echo "--- conmon log :: ---"
journalctl -t conmon --grep "${CTR_ID::20}"
echo "--- --- ---"
fi
sleep 10
done
[[ "$OUTPUT" == *"OOMKilled"* ]]
# Systemd should have caught the OOM event.
if [[ "$CONTAINER_CGROUP_MANAGER" == "systemd" ]]; then
echo "--- systemd log :: ---"
journalctl --unit "crio-${CTR_ID}.scope"
echo "--- --- ---"
fi

# Alas, we have utterly failed.
false
fi

METRIC=$(curl -sf "http://localhost:$PORT/metrics" | grep '^container_runtime_crio_containers_oom_total')
[[ "$METRIC" == 'container_runtime_crio_containers_oom_total 1' ]]
Expand Down

0 comments on commit 927843e

Please sign in to comment.