Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

NetworkMode=host causes starting issue #5151

Closed
rgugliel-da opened this issue Mar 4, 2022 · 22 comments
Closed

NetworkMode=host causes starting issue #5151

rgugliel-da opened this issue Mar 4, 2022 · 22 comments

Comments

@rgugliel-da
Copy link

I use the following code to start a container:

private class CustomOracleContainer
    extends GenericContainer[CustomOracleContainer](
      DockerImageName.parse("mycompany/oracle:enterprise-19.14.0-preloaded-20220120-27-36701e3")
    )

private val oracleContainer: CustomOracleContainer = new CustomOracleContainer()
  .withExposedPorts(1521)
  .withNetworkMode("host")
  .waitingFor(Wait.forLogMessage("DATABASE IS READY TO USE!\\n", 1))
  .withStartupTimeout(Duration.ofSeconds(160))

With testcontainers 1.15.1, it works.
When updating to 1.16.3, I get the following error:

ERROR ?.1.0-preloaded-20220120-27-36701e3] - Could not start container
org.testcontainers.shaded.org.awaitility.core.ConditionTimeoutException: Lambda expression in org.testcontainers.containers.GenericContainer: expected the predicate to return <true> but it returned <false> for input of <InspectContainerResponse(...))> within 5 seconds.
	at org.testcontainers.shaded.org.awaitility.core.ConditionAwaiter.await(ConditionAwaiter.java:164)
	at org.testcontainers.shaded.org.awaitility.core.AbstractHamcrestCondition.await(AbstractHamcrestCondition.java:86)
	at org.testcontainers.shaded.org.awaitility.core.ConditionFactory.until(ConditionFactory.java:939)
	at org.testcontainers.shaded.org.awaitility.core.ConditionFactory.until(ConditionFactory.java:645)

Any idea what's happening ?

@pretep22
Copy link

pretep22 commented Mar 7, 2022

I have the same issue... For me as a workaround, I copy-pasted the GenericContainer -class (and dependent classes) from Git, commit of version 1.16.3 - and "reverted" the changes of Commit "Remove withPublishAllPorts from Ryuk and stabilize containerInfo content on start (#4263)"; 034daa6 from 20.07.21 in the class mentioned above. These are the lines where the exception is thrown...
But the container is running anyway!

The problem has something to do with starting the container with networkMode=host.

public class ShowHostNetworkProblemTest {

    @Test
    void startWithHostNetwork_fails() throws Exception {
        MariaDBContainer mariaDBContainer = new MyMariaDBContainer("mariadb:10.1.48");
        mariaDBContainer.withNetworkMode("host");

        try {
            mariaDBContainer.start();

        } catch (ContainerLaunchException ex) {
            // noop
        } finally {
            Connection connection = mariaDBContainer.createConnection("");
            Statement statement = connection.createStatement();
            ResultSet resultSet = statement.executeQuery("SELECT 2;");
            resultSet.next();
            int anInt = resultSet.getInt(1);
            assertEquals(2, anInt);

            System.out.println("Container is running!!");
        }

        assertNotNull(mariaDBContainer);
    }

    public static class MyMariaDBContainer extends MariaDBContainer<MyMariaDBContainer> {
        MyMariaDBContainer(String dockerImageName) {
            super(dockerImageName);
        }

        @Override
        public Integer getMappedPort(int originalPort) {
            if ("host".equals(getNetworkMode())) {
                return originalPort;
            } else {
                return super.getMappedPort(originalPort);
            }
        }
    }
}

I hope the fix will come with the next version, so I can remove my workaround-copy-pasted-classes!

@kiview
Copy link
Member

kiview commented Mar 8, 2022

I could confirm the issue and it might be the case for any container exposing ports while using network mode host.

@javahippie
Copy link

I am also affected by this in a certain scenario. Is there any way I can help with this issue? (Information about the env, code, etc.?)

@kiview
Copy link
Member

kiview commented Jun 20, 2022

I think we would need to debug why the check that was introduced with 034daa6 (which did fix certain race condition scenarios, especially on Windows) seems to not apply if network mode is set to host.

@aidando73
Copy link
Contributor

aidando73 commented Aug 20, 2022

Looking into this now. What I've found so far is:

GenericContainer.start waits for all the exposed ports to be available in the output of docker inspect (or more accurately the corresponding Docker Engine API route) as part of this snippet of code:

// Wait until inspect container returns the mapped ports
containerInfo =
  await()
    .atMost(5, TimeUnit.SECONDS)
    .pollInterval(DynamicPollInterval.ofMillis(50))
    .pollInSameThread()
    .until(
        () -> dockerClient.inspectContainerCmd(containerId).exec(),
        inspectContainerResponse -> {
            Set<Integer> exposedAndMappedPorts = inspectContainerResponse
                .getNetworkSettings()
                .getPorts()
                .getBindings()
                .entrySet()
                .stream()
                .filter(it -> Objects.nonNull(it.getValue())) // filter out exposed but not yet mapped
                .map(Entry::getKey)
                .map(ExposedPort::getPort)
                .collect(Collectors.toSet());
  
            return exposedAndMappedPorts.containsAll(exposedPorts);
        }
    );

The issue is, however, that published (exposed) ports are discarded on network mode host. E.g.

$ docker run --publish 8080:80 --network host --detach nginx
WARNING: Published ports are discarded when using host network mode
1da8e93a7de7011dff3a523e54d751de669988f8edcfea9f5c3c3cf54fefe0bc

In the above example the code expects docker inspect to return:

    {
        ...
        "NetworkSettings": {
           ...
            "Ports": {
                "80/tcp": [
                    {
                        "HostIp": "0.0.0.0",
                        "HostPort": "8080"
                    },
                    {
                        "HostIp": "::",
                        "HostPort": "8080"
                    }
                ]
            }
        }
    }

But in reality, docker inspect returns:

    {
        ...
        "NetworkSettings": {
           ...
            "Ports": {}
        }
    }

So the container times out and throws an exception. From here, I'm planning to look into the reasoning behind the above code snippet and potential fixes.

@kiview
Copy link
Member

kiview commented Aug 22, 2022

@REslim30 If I remember correctly, parts of this code were refactored in order to remove certain race conditions on Docker Desktop (particularly Docker Desktop on Windows). Your findings make sense, this is likely the issue.

@aidando73
Copy link
Contributor

aidando73 commented Aug 25, 2022

I've been testing locally and I think there's a simple fix we could do. We could add to ContainerState.getMappedPort an additional condition:

    default Integer getMappedPort(int originalPort) {
        ...
        Ports.Binding[] binding = new Ports.Binding[0];
        final InspectContainerResponse containerInfo = this.getContainerInfo();
        if (containerInfo != null) {
            if (containerInfo.getHostConfig().getNetworkMode().equals("host")) {
                return originalPort;
            }
            binding = containerInfo.getNetworkSettings().getPorts().getBindings().get(new ExposedPort(originalPort));
        }
        ...
    }

Which simply returns the original port if the container is in host mode. Though this feels a bit wrong. host mode doesn't really have a concept of mapped ports - the ports are simply on the host. It seems like we'd be stretching this method beyond it's original abstraction.

The more semantically correct alternative would be to throw if getMappedPort is called when container is host mode:

    default Integer getMappedPort(int originalPort) {
        ...
        Ports.Binding[] binding = new Ports.Binding[0];
        final InspectContainerResponse containerInfo = this.getContainerInfo();
        if (containerInfo != null) {
            if (containerInfo.getHostConfig().getNetworkMode().equals("host")) {
                throw new IllegalArgumentException("getMappedPort is not supported for host network mode");
            }
            binding = containerInfo.getNetworkSettings().getPorts().getBindings().get(new ExposedPort(originalPort));
        }
        ...
    }

We would, however, need to check all references to getMapped port and add in a check for host mode and there's a lot:
Screenshot from 2022-08-25 20-27-41

@kiview
Copy link
Member

kiview commented Aug 25, 2022

Thanks for the investigation @REslim30, this is super helpful to proceed with this.

Both your points are valid and solution 1 seems very pragmatic. And we can improve on this by documenting this behavior in the Javadoc. The blast radius of solution 2 feels too high for me, to be really feasible.

Before talking more about the potential solutions, I would like to circle back to understand the use cases for which to use a container with host network mode in the context of Testcontainers. Can you please give some context @rgugliel-da @javahippie?

@rgugliel-da
Copy link
Author

We need that as part of the quite specific pipeline.
We use Testcontainer to run an Oracle DB for testing on CI (Oracle enterprise). The reason we use Testcontainer and not directly the Oracle container is because of technicalities of our CI platform. This nesting of containers creates issue with queries issue to the DB except if we use network=nost.

@kiview
Copy link
Member

kiview commented Aug 26, 2022

@rgugliel-da Thanks for sharing the context. So this is for solving an issue when using a containerized CI and you using Docker-Socket-Mounting to start sibling containers?

I think I don't yet understand how the network setup affects querying the database in this case. Did you consider putting the sibling container on the same Docker network as the containerized test JVM and don't use dynamic port mapping at all, but rely on fixed port and Docker DNS?

@kiview kiview self-assigned this Sep 13, 2022
@kiview
Copy link
Member

kiview commented Sep 14, 2022

Hey @rgugliel-da, any chance to follow up on my previous question? Or alternatively @javahippie?

@rgugliel-da
Copy link
Author

rgugliel-da commented Sep 14, 2022

Hi @kiview ,
I am very sorry for the delay in my response!

So this is for solving an issue when using a containerized CI and you using Docker-Socket-Mounting to start sibling containers?

That's correct.

Did you consider putting the sibling container on the same Docker network as the containerized test JVM and don't use dynamic port mapping at all, but rely on fixed port and Docker DNS?

No, I did not. I went for the current approach because I thought I could use similar setup both locally and on CI.

@kiview
Copy link
Member

kiview commented Sep 14, 2022

No, I did not. I went for the current approach because I thought I could use similar setup both locally and on CI.

I was thinking about using Testcontainers' Advanced Networking Feature. Is there an in the communication of 2 containers started by Testcontainers with each other, or is it about the JVM process communicating with the Oracle container?

Edit:
Added a napkin drawing to visualize the 2 different scenarios 😅
image

@rgugliel-da
Copy link
Author

Situation A.
From the build container, I run my JVM (sbt) and from there I start a testcontainer running Oracle

@kiview
Copy link
Member

kiview commented Sep 16, 2022

Thank you, then it seems we indeed need to support the network mode. I still don't understand why the Oracle container requires it in this setup, but since it worked in old Testcontainers versions we introduced a regression for this feature and I believe we should try to restore the fuctionality.

@kiview
Copy link
Member

kiview commented Sep 16, 2022

@REslim30 Would you like to contribute the fix?

Regarding your 2nd proposal, what if we implement your second proposal, but throw a runtime exception, so we don't have to handle all usages of getMappedPort? (and we avoid a breaking change like this)

@aidando73
Copy link
Contributor

Sure, give me one sec.

@aidando73
Copy link
Contributor

@rgugliel-da, @javahippie @pretep22, can I confirm what environments you're on?

The host networking driver only works on Linux hosts, and is not supported on Docker Desktop for Mac, Docker Desktop for Windows, or Docker EE for Windows Server.

Docker docs

I'm considering whether or not to throw an exception (or at least a warning) on non-linux environments when trying to use host mode.

@aidando73
Copy link
Contributor

Ok I've submitted a PR for this.

As for:

I'm considering whether or not to throw an exception (or at least a warning) on non-linux environments when trying to use host mode.

I've added a separate issue for this. I think there's value in it. #5853

@rgugliel-da
Copy link
Author

@rgugliel-da, @javahippie @pretep22, can I confirm what environments you're on?

The host networking driver only works on Linux hosts, and is not supported on Docker Desktop for Mac, Docker Desktop for Windows, or Docker EE for Windows Server.

Docker docs

I'm considering whether or not to throw an exception (or at least a warning) on non-linux environments when trying to use host mode.

I use Ubuntu.

@kiview
Copy link
Member

kiview commented Nov 17, 2022

We close this as not planned, see comment in PR:
#5852 (comment)

However, we don't want to support network mode host through Testcontainers, for 2 main reasons:

  1. Network mode host is only available on Linux (as also reflected in your implementation) and test code using it would therefore not be portable.
  2. Containers started in network mode host share the same network namespace, which basically means they will use fixed ports. This is not aligned with the Testcontainers philosophy of avoiding test failures through port conflicts (meaning environment state).

@kiview kiview closed this as not planned Won't fix, can't repro, duplicate, stale Nov 17, 2022
@kiview
Copy link
Member

kiview commented Mar 1, 2023

I'd like to clarify, network mode host still works as expected if not used in conjunction with withExposedPorts(). This also makes sense, because when using network mode host, there is no need to expose ports through the TC API, since the container is accessible through the host network.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
5 participants