-
-
Notifications
You must be signed in to change notification settings - Fork 511
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Bug]: ryuk restart/create after first instance has timed out fails #1671
Comments
Hi @srenatus I think this is the expected behavior, as Ryuk will check the configured timeout and die if no connection is made during that time. The workaround is actually the way to increase the time Ryuk waits for connections, so I'd say with a lot of caution this is a feature not a bug 🪲 |
Is it? I mean, checking if there is still an instance and reusing it if possible makes sense, but if there is no instance running anymore, it could start a new one. Similar to how it starts one at the very start. (Extending the timeout would then be an optimisation.) |
Oh I'm sorry I probably misunderstood the issue, as it's not about Ryuk not receiving connections and dying, but about the need of Ryuk recreation if for some reason the test took more time than the reconnection timeout, so it would need to be recreated, right? If the latter, I think the work in #1904 could help in making the issue more visible. |
I'm out for the rest of the week but I can give it a try next week! |
Update -- I've run our CI bits with testcontainers-go from main, a9f0ac8, and it failed again with reaper-related errors
|
Looking into this, it appears that the reaper is supposed to be recreated but fails during the state check because the container is already gone by then. We could check if it's an // 1. if the reaper instance has been already created, return it
if reaperInstance != nil {
// Verify this instance is still running by checking state.
// Can't use Container.IsRunning because the bool is not updated when Reaper is terminated
state, err := reaperInstance.container.State(ctx)
if err != nil {
if !errdefs.IsNotFound(err) {
return nil, err
}
} else {
if state.Running {
return reaperInstance, nil
}
}
// else: the reaper instance has been terminated, so we need to create a new one
reaperOnce = sync.Once{}
} |
@Mathew-Estafanous would you mind submitting a PR with your fix? And just to confirm, the last line, where the reaperOnce is reset, is needed because a previous call was executed, and for some reason Ryuk died in between, therefore the once state is protecting that block. Is this assumption correct? |
Here you go! #2084
Yup! |
Testcontainers version
0.24.1
Using the latest Testcontainers version?
Yes
Host OS
macos, linux
Host arch
x86
Go version
1.21
Docker version
Client: Version: 24.0.6 API version: 1.43 Go version: go1.20.7 Git commit: ed223bc Built: Mon Sep 4 12:28:49 2023 OS/Arch: darwin/amd64 Context: orbstack Server: Docker Engine - Community Engine: Version: 24.0.6 API version: 1.43 (minimum version 1.12) Go version: go1.20.7 Git commit: 1a79695 Built: Mon Sep 4 12:32:17 2023 OS/Arch: linux/amd64 Experimental: false containerd: Version: v1.7.5 GitCommit: fe457eb99ac0e27b3ce638175ef8e68a7d2bc373 runc: Version: 1.1.9 GitCommit: 82f18fe0e44a59034f3e1f45e475fa5636e539aa docker-init: Version: 0.19.0 GitCommit: de40ad0
Docker info
What happened?
With this test definition,
I noticed that with no sleep (or a sleep below 10 seconds, the default ryuk reconnection timeout), the tests pass. With a sleep of 10s or more, it'll fail. Both runs are in the log output below 👇
Relevant log output
I'm using redpanda here, but that's for illustration only. I don't believe it has anything to do with the module that's used.
Additional information
Increasing the reconnection timeout (#1668) was a workaround for this.
The text was updated successfully, but these errors were encountered: