-
-
Notifications
You must be signed in to change notification settings - Fork 287
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Bug]: Conflict: The container name testcontainers-ryuk-...
is already in use
#1252
Comments
My proposal would be add locking around |
I am wondering how you are running into this issue. What causes the singleton instance to be disposed of 🤔? The default resource reaper instance should be instantiated once and never disposed of. Perhaps it is failing the instantiation of the singleton instance? Is there any other exception? |
Thanks for getting back to me! Seems like you are right as well. Previous to the above exception there are multiple test cases which failed during setup with
I assume that even though the commands timed out, one of them actually started the resource reaper container. So synchronizing the disposal will probably not help. |
I doubt it. If TC successfully starts a resource reaper instance, there is no reason to create it again. As I mentioned, there is no need to dispose of the default instance; we never call the dispose method except in the mentioned error case (perhaps we should throw a different exception if someone tries to dispose of the default instance). The resource reaper runs longer than the test process and cleans up after itself. This is crucial to prevent resource leaks. Changing this will very likely hide the underlying issue and root cause. The |
Please try to run the resource reaper (Ryuk) manual on your build agent and check if it fails:
|
docker run -v /var/run/docker.sock:/var/run/docker.sock -e RYUK_PORT=8080 -p 8080 testcontainers/ryuk:0.9.0
Unable to find image 'testcontainers/ryuk:0.9.0' locally
0.9.0: Pulling from testcontainers/ryuk
46b060cc2620: Pull complete
950af9946849: Pull complete
dce2d503360a: Pull complete
Digest: sha256:448beed1b3fd18e9411dd4b6a26a04f3aa0fccf229502c9665ebe8d628c7d2c5
Status: Downloaded newer image for testcontainers/ryuk:0.9.0
2024/09/05 08:59:47 Pinging Docker...
2024/09/05 08:59:47 Docker daemon is available!
2024/09/05 08:59:47 Starting on port 8080...
2024/09/05 08:59:47 Started!
2024/09/05 08:59:58 Signal received
2024/09/05 08:59:58 Removed 0 container(s), 0 network(s), 0 volume(s) 0 image(s) Worked without problem. The test runs usually seem to use |
My hypothesis though is that the docker client times out, reporting to TC that container start failed, while the docker server was just to slow to start, but actually did manage to start. That way TC cannot know about the started container, and would try to start again. Could it happen that way? |
👍 That looks good. The version does not matter.
Does the service start while running the build or test? Ephemeral agent? Can you ensure the service is in a ready state before running the tests? It does not even start the container; it fails just by trying to create the resource (aka The default timeout for Docker.DotNet Npipe connection appears quite small, although I have never experienced any issues before (and TC is initially able to connect to it; otherwise, you would see different errors). You can try passing a custom endpoint authentication provider to the builder and increase the timeout to see if that resolves the issue. For example: public sealed class CustomEndpointAuthProvider : IDockerEndpointAuthenticationConfiguration
{
private CustomEndpointAuthProvider()
{
}
public static IDockerEndpointAuthenticationConfiguration Instance { get; }
= new CustomEndpointAuthProvider();
public Credentials Credentials
=> null;
public Uri Endpoint
=> new Uri("npipe://./pipe/docker_engine");
public DockerClientConfiguration GetDockerClientConfiguration(Guid sessionId = default)
{
return new DockerClientConfiguration(Endpoint, Credentials, namedPipeConnectTimeout: TimeSpan.FromSeconds(10));
}
} public sealed class GitHub
{
static GitHub()
{
// Because the endpoint uses the same address as the default configuration, we need
// to override the selected auto-discovery endpoint. Otherwise, we will be using
// the default (cached) provider instead of the custom one.
// It is important to override it before any builder is instantiated.
TestcontainersSettings.OS = new Windows(CustomEndpointAuthProvider.Instance);
}
[Fact]
public async Task Issue1252()
{
_ = new ContainerBuilder().WithImage(CommonImages.Alpine).Build();
}
} |
We've started seeing this on Microsoft-hosted build agents with the SQL Server container specifically. I believe the embedded image tag is no longer supported in some way, so a |
Hi, if it can be useful to confirm what the guy @benjaminsampica said, on the local environment everything worked perfectly, and until a few days ago also on Microsoft-hosted build agents. Since yesterday it stopped working on agents during pipelines
adding |
Both mentioned MSSQL issues are not related to this issue. Both of you are running into: #1265. |
Without additional information, I am unable to help. I will close the issue for now. Please refer to my comment above, and do not hesitate to reopen the issue if you have further information. |
Testcontainers version
3.9.0
Using the latest Testcontainers version?
No
Host OS
Windows
Host arch
x86
.NET version
8.0.401
Docker version
docker version Client: Cloud integration: v1.0.35+desktop.13 Version: 26.1.1 API version: 1.45 Go version: go1.21.9 Git commit: 4cf5afa Built: Tue Apr 30 11:48:43 2024 OS/Arch: windows/amd64 Context: default Server: Docker Desktop 4.30.0 (149282) Engine: Version: 26.1.1 API version: 1.45 (minimum version 1.24) Go version: go1.21.9 Git commit: ac2de55 Built: Tue Apr 30 11:48:28 2024 OS/Arch: linux/amd64 Experimental: false containerd: Version: 1.6.31 GitCommit: e377cd56a71523140ca6ae87e30244719194a521 runc: Version: 1.1.12 GitCommit: v1.1.12-0-g51d5e94 docker-init: Version: 0.19.0 GitCommit: de40ad0
Docker info
What happened?
While running multiple test stages in parallel on a Jenkins, sometimes a test (and all subsequent requiring a testcontainer) fail with:
I expect this to never happen. The testcontainers library should either correctly handle an existing ryuk container it created, or ensure complete clean-up before starting a new one.
Relevant log output
Additional information
I was not able to reproduce this locally directly, but while inspecting the code I noticed that
ResourceReaper.DisposeAsync()
is not synchronized withResourceReaper.GetAndStartDefaultAsync(...)
. This leads to a race condition: WhenDisposeAsync
has already set_disposed = true
, but not yet removed the container, then callingGetAndStartDefaultAsync
produces the above exception. I could verify this "manually" by applying the following diffand then executing this test:
The text was updated successfully, but these errors were encountered: