Skip to content

Commit

Permalink
libct/int: retry Checkpoint for cgroup v1
Browse files Browse the repository at this point in the history
Cgroup v1 freezer have issues when trying to freeze a cgroup, and despite criu retries, it may fail like this:

	=== RUN TestCheckpoint
	time="2024-10-18T08:55:44Z" level=warning msg="--- Quoting "/tmp/TestCheckpoint214687474/003/criu-parent/dump.log""
	time="2024-10-18T08:55:44Z" level=warning msg="118:(09.517977) freezer.state=FREEZING"
	time="2024-10-18T08:55:44Z" level=warning msg="119:(09.618087) freezer.state=FREEZING"
	time="2024-10-18T08:55:44Z" level=warning msg="120:(09.718192) freezer.state=FREEZING"
	time="2024-10-18T08:55:44Z" level=warning msg="121:(09.818291) freezer.state=FREEZING"
	time="2024-10-18T08:55:44Z" level=warning msg="122:(09.918412) freezer.state=FREEZING"
	time="2024-10-18T08:55:44Z" level=warning msg="123:(10.001045) Error (criu/cr-dump.c:1779): Timeout reached. Try to interrupt: 0"
	time="2024-10-18T08:55:44Z" level=warning msg="124:(10.001084) freezer.state=FREEZING"
	time="2024-10-18T08:55:44Z" level=warning msg="125:(10.001125) Unfreezing tasks into 1"
	time="2024-10-18T08:55:44Z" level=warning msg="126:(10.001128) \tUnseizing 45035 into 1"
	time="2024-10-18T08:55:44Z" level=warning msg="127:(10.001140) Error (compel/src/lib/infect.c:418): Unable to detach from 45035: No such process"
	time="2024-10-18T08:55:44Z" level=warning msg="128:(10.001144) Writing image inventory (version 1)"
	time="2024-10-18T08:55:44Z" level=warning msg="129:(10.001223) Error (criu/cr-dump.c:1893): Pre-dumping FAILED."
	time="2024-10-18T08:55:44Z" level=warning msg=---
	checkpoint_test.go:93: criu failed: type PRE_DUMP errno 0

Since cgroup v1 is going to be deprecated, and the problem doesn't exist
on cgroup v2, let's retry the checkpoint a few times (on v1 only) to
avoid flaky tests.

Issues 4457, 4273.

Signed-off-by: Kir Kolyshkin <[email protected]>
  • Loading branch information
kolyshkin committed Oct 28, 2024
1 parent 4ad9f7f commit a05e50d
Showing 1 changed file with 22 additions and 2 deletions.
24 changes: 22 additions & 2 deletions libcontainer/integration/checkpoint_test.go
Original file line number Diff line number Diff line change
Expand Up @@ -8,8 +8,10 @@ import (
"regexp"
"strings"
"testing"
"time"

"github.com/opencontainers/runc/libcontainer"
"github.com/opencontainers/runc/libcontainer/cgroups"
"golang.org/x/sys/unix"
)

Expand Down Expand Up @@ -79,6 +81,24 @@ func testCheckpoint(t *testing.T, userns bool) {
tmp := t.TempDir()
var parentImage string

retryCheckpoint := func(opts *libcontainer.CriuOpts) error {
err := container.Checkpoint(opts)
// Cgroup v1 freezer is flaky; v2 is fine.
if err == nil || cgroups.IsCgroup2UnifiedMode() {
return err
}

const retries = 2
for i := 1; i <= retries; i++ {
time.Sleep(time.Second << i)
t.Logf("cgroup v1 checkpointing is flaky, retry %d of %d", i, retries)
if err = container.Checkpoint(opts); err == nil {
return nil
}
}
return err
}

// Test pre-dump if mem_dirty_track is available.
if criuFeature("mem_dirty_track") {
parentImage = "../criu-parent"
Expand All @@ -89,7 +109,7 @@ func testCheckpoint(t *testing.T, userns bool) {
PreDump: true,
}

if err := container.Checkpoint(preDumpOpts); err != nil {
if err := retryCheckpoint(preDumpOpts); err != nil {
t.Fatal(err)
}

Expand All @@ -109,7 +129,7 @@ func testCheckpoint(t *testing.T, userns bool) {
ParentImage: parentImage,
}

if err := container.Checkpoint(checkpointOpts); err != nil {
if err := retryCheckpoint(checkpointOpts); err != nil {
t.Fatal(err)
}

Expand Down

0 comments on commit a05e50d

Please sign in to comment.