Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bug]: Endpoint() returns "port not found" error, even though wait.ForListeningPort is used #605

Open
awagner-iq opened this issue Nov 2, 2022 · 15 comments
Labels
podman Issues regarding podman.

Comments

@awagner-iq
Copy link

Testcontainers version

0.15.0

Using the latest Testcontainers version?

Yes

Host OS

Linux

Host Arch

x86_64

Go Version

1.19

Docker version

mero@dibbler ~$ podman version
Version:      3.4.7
API Version:  3.4.7
Go Version:   go1.16.15
Built:        Thu Apr 21 19:38:09 2022
OS/Arch:      linux/amd64

Docker info

mero@dibbler ~$ podman info   
host:
  arch: amd64
  buildahVersion: 1.23.1
  cgroupControllers:
  - cpu
  - io
  - memory
  - pids
  cgroupManager: systemd
  cgroupVersion: v2
  conmon:
    package: conmon-2.0.32-1.fc34.x86_64
    path: /usr/bin/conmon
    version: 'conmon version 2.0.32, commit: '
  cpus: 12
  distribution:
    distribution: fedora
    variant: workstation
    version: "34"
  eventLogger: journald
  hostname: dibbler
  idMappings:
    gidmap:
    - container_id: 0
      host_id: 1000
      size: 1
    - container_id: 1
      host_id: 100000
      size: 65536
    uidmap:
    - container_id: 0
      host_id: 1000
      size: 1
    - container_id: 1
      host_id: 100000
      size: 65536
  kernel: 5.17.12-100.fc34.x86_64
  linkmode: dynamic
  logDriver: k8s-file
  memFree: 1298100224
  memTotal: 16543584256
  ociRuntime:
    name: crun
    package: crun-1.4.4-1.fc34.x86_64
    path: /usr/bin/crun
    version: |-
      crun version 1.4.4
      commit: 6521fcc5806f20f6187eb933f9f45130c86da230
      spec: 1.0.0
      +SYSTEMD +SELINUX +APPARMOR +CAP +SECCOMP +EBPF +CRIU +YAJL
  os: linux
  remoteSocket:
    exists: true
    path: /run/user/1000/podman/podman.sock
  security:
    apparmorEnabled: false
    capabilities: CAP_CHOWN,CAP_DAC_OVERRIDE,CAP_FOWNER,CAP_FSETID,CAP_KILL,CAP_NET_BIND_SERVICE,CAP_SETFCAP,CAP_SETGID,CAP_SETPCAP,CAP_SETUID,CAP_SYS_CHROOT
    rootless: true
    seccompEnabled: true
    seccompProfilePath: /usr/share/containers/seccomp.json
    selinuxEnabled: true
  serviceIsRemote: false
  slirp4netns:
    executable: /usr/bin/slirp4netns
    package: slirp4netns-1.1.12-2.fc34.x86_64
    version: |-
      slirp4netns version 1.1.12
      commit: 7a104a101aa3278a2152351a082a6df71f57c9a3
      libslirp: 4.4.0
      SLIRP_CONFIG_VERSION_MAX: 3
      libseccomp: 2.5.3
  swapFree: 16906444800
  swapTotal: 16936591360
  uptime: 221h 45m 47.69s (Approximately 9.21 days)
plugins:
  log:
  - k8s-file
  - none
  - journald
  network:
  - bridge
  - macvlan
  volume:
  - local
registries:
  search:
  - registry.fedoraproject.org
  - registry.access.redhat.com
  - docker.io
  - quay.io
store:
  configFile: /home/mero/.config/containers/storage.conf
  containerStore:
    number: 42
    paused: 0
    running: 0
    stopped: 42
  graphDriverName: overlay
  graphOptions: {}
  graphRoot: /home/mero/.local/share/containers/storage
  graphStatus:
    Backing Filesystem: extfs
    Native Overlay Diff: "true"
    Supports d_type: "true"
    Using metacopy: "false"
  imageStore:
    number: 4
  runRoot: /run/user/1000/containers
  volumePath: /home/mero/.local/share/containers/storage/volumes
version:
  APIVersion: 3.4.7
  Built: 1650562689
  BuiltTime: Thu Apr 21 19:38:09 2022
  GitCommit: ""
  GoVersion: go1.16.15
  OsArch: linux/amd64
  Version: 3.4.7

What happened?

We are seeing test flakes with the error message mysql.Endpoint() = port not found. The code we use to create the container is attached below. In particular, we use both wait.ForListeningPort("3306/tcp") and wait.ForLog("port: 3306") to make sure we wait until the container is running and available. However, it seems even with that, GenericContainer will occasionally return without being ready, as evidenced by the fact that Endpoint returns an error about the port not being found.

Relevant log output

No response

Additional Information

	mysql, err := testcontainers.GenericContainer(ctx, testcontainers.GenericContainerRequest{
		ContainerRequest: testcontainers.ContainerRequest{
			Image:        "mysql:8",
			ExposedPorts: []string{"3306/tcp"},
			Env: map[string]string{
				"MYSQL_ROOT_PASSWORD": "test",
			},
			WaitingFor: wait.ForAll(wait.ForListeningPort("3306/tcp"), wait.ForLog("port: 3306")),
			SkipReaper: true, // https://github.com/testcontainers/testcontainers-go/issues/604
		},
		Started: true,
		Logger:  testcontainers.TestLogger(t),
	})
	if err != nil {
		t.Fatalf("testcontainers.GenericContainer() = %v", err)
	}
	t.Cleanup(func() {
		t.Logf("Stopping container")
		d := new(time.Duration)
                // TODO: What happens when the context expires? Can we pass a different
                // context here, or would that prevent Stop from working?
		mysql.Stop(context.Background(), d)
	})

	ep, err := mysql.Endpoint(ctx, "")
	if err != nil {
		t.Fatalf("mysql.Endpoint() = %v", err)
	}
@anuraaga
Copy link
Contributor

anuraaga commented Feb 8, 2023

I am seeing this as well on 0.17.0 with mysql image, test sometimes fails, more often on CI. My configuration is the same as presented except I am using normal Docker and don't skip reaper. time.Sleep(5 * time.Second) didn't add stability but time.Sleep(30 * time.Second) does seem to have helped - presumably this is the time of the relatively slow MySQL container itself and the wait condition is getting missed for some reason.

For context, the build starts and terminates a database container about three times, perhaps having multiple start/stop in the same binary run (in this case go test ./...) causes a race condition?

@mdelapenya
Copy link
Member

@anuraaga @awagner-iq thanks for open this issue, and sorry for the radio silence, I went on paternity leave at the beginning of December and probably skipped this ticket.

I'm investigating why it fails for your use case although, with the above snippet, I'm not able to reproduce it yet. Debugging...

@mdelapenya mdelapenya added the podman Issues regarding podman. label Mar 23, 2023
@michielboekhoff
Copy link
Contributor

I'm noticing this as well while running on Docker 20.10 on Linux - it's resulting in a fair bit of flakiness, especially in our CI pipeline. It doesn't look like this is strictly related to Podman.

@priegger
Copy link

It seems to be a bug in the Endpoint function. I'm using docker on linux and I can reproduce this easily on my machine and in ci. Now I create the URL myself for a specific port (not the first one, as Endpoint does it) and I did not see this issue yet.

@mdelapenya
Copy link
Member

It seems to be a bug in the Endpoint function. I'm using docker on linux and I can reproduce this easily on my machine and in ci. Now I create the URL myself for a specific port (not the first one, as Endpoint does it) and I did not see this issue yet.

Could you please add a repro snippet, including which version of the library are you using? 🙏 If there are so many users seeing it, we could be in front of a real bug that we should fix 🐞

@bokimilinkovic
Copy link

bokimilinkovic commented May 23, 2023

Yes would be great if this one can go up for priority because it is still occurring , and it is breaking a lot of CI pipelines.
This is code snippet:

github.com/testcontainers/testcontainers-go v0.19.0

req := testcontainers.ContainerRequest{
		Image:        "nats:latest",
		Name:         natsContainerName,
		ExposedPorts: []string{"4222/tcp"},
		Networks:     networks,
		WaitingFor:   wait.ForAll(wait.ForLog("Server is ready")),
	}

	container, err := testcontainers.GenericContainer(ctx, testcontainers.GenericContainerRequest{
		ContainerRequest: req,
		Started:          true,
	})
	if err != nil {
		return nil, err
	}
        natsMappedPort, err := container.MappedPort(ctx, "4222/tcp")
	if err != nil {
		log.Errorf("Failed to get nats port: %v", err)
		return initFailure
	}
	natsHost, err := container.Host(ctx)
	if err != nil {
		log.Errorf("Failed to get nats host: %v", err)

		return initFailure
	}

	natsEndpoint, err := container.Endpoint(ctx, "") 
	if err != nil {
		log.Errorf("Failed to get nats endpoint: %v", err) --- error occurs in here
		return initFailure
	}

@priegger
Copy link

Could you please add a repro snippet, including which version of the library are you using? pray If there are so many users seeing it, we could be in front of a real bug that we should fix lady_beetle

Hi, I did not have the time yet to create a repro snippet. The failing code I have is something I cannot share.

I switched all my Endpoint(...) calls to PortEndpoint(...) and explicitly specified the port. I did not have any issues since then. So I think the reason is that the first port that gets detected somehow does not work.

What I'd do given some time:

  • Improve the "port not found" error to contain the port (first port that is calculated) and the list of available ports.
  • Create a test that starts a lot of containers in go funcs (e.g. 10 mongo containers in my case) and then try to get the Endpoint(...). In my case I get the endpoint immediately after creating the container.

I used a GenericContainer with mongo:4.0, and testcontainers 0.19.0, but also switched to the latest version for a quick test and that did not fix the issue.

Maybe this already helps and gives someone the opportunity to reproduce and analyze this bug.

@mdelapenya
Copy link
Member

I'm able to reproduce this exact error in a VM using podman. Will take a look after my parental leave

@mdelapenya
Copy link
Member

In the mean time, checking the state of the Docker types, I saw this difference when inspecting a container with podman Vs docker:

Steps to reproduce

  1. Edit docker.go, Ports method, to print out the docker types for the inspected container:
func (c *DockerContainer) Ports(ctx context.Context) (nat.PortMap, error) {
	inspect, err := c.inspectContainer(ctx)
	if err != nil {
		return nil, err
	}
+	fmt.Printf(">>> network.settings.ports: %+v\n", inspect.NetworkSettings.Ports)
+	fmt.Printf(">>> config.exposed.ports: %+v\n", inspect.Config.ExposedPorts)
	return inspect.NetworkSettings.Ports, nil
}
  1. Run the following test:
go run gotest.tools/gotestsum --format short-verbose --packages="./..." -- -run "^TestContainerWithHostNetworkOptions_UseExposePortsFromImageConfigs" -timeout 600s  -count=1 -v

With Podman (Ubuntu VM)

>>> network.settings.ports: map[80/tcp:[]]
>>> config.exposed.ports: map[80/tcp:{}]

With Docker (on Mac)

>>> network.settings.ports: map[80/tcp:[{HostIP:0.0.0.0 HostPort:57087}]]
>>> config.exposed.ports: map[80/tcp:{}]

So for some reason, the docker types in Podman are inconsistent according Docker 🤷 . Pinging @kiview @cristianrgreco @eddumelendez @HofmeisterAn for awareness while I'm out

@mdelapenya
Copy link
Member

Could be related to containers/podman#17780 🤔

@tylfin
Copy link

tylfin commented Apr 10, 2024

I'm seeing an intermittent port not found error using Docker for Mac, I tried upgrading and am currently on Docker version 26.0.0, build 2ae903e. I'm exposing it as a CustomizeRequest, e.g.:

func WithPort(port int) testcontainers.CustomizeRequestOption {
	return func(req *testcontainers.GenericContainerRequest) {
		...
		req.ExposedPorts = append(req.ExposedPorts, fmt.Sprintf("%d/tcp", port))
        }
}

and a wait strategy that passes:

wait.ForHTTP(endpoint).WithPort(containerPortWithProtocol).WithStartupTimeout(timeout),

If folks have any workarounds or tips that would be helpful.

EDIT: After investigating this issue further, it looks like docker inspect returns results excluding the exposed port from the network settings, it might be related to moby/moby#42860. A possible workaround would be to retry MappedPort in Docker.go a few times internally 🤷‍♂️

@vladcar
Copy link

vladcar commented May 15, 2024

Retrying MappedPort didn't work for me so I ended up calling docker inspect from the code. Ugly workaround but it works:

func getMappedPorts(containerID string) (nat.PortMap, error) {
	cmd := exec.Command("docker", "inspect", "--format", "{{json .NetworkSettings.Ports}}", containerID)

	output, err := cmd.CombinedOutput()
	if err != nil {
		return nil, fmt.Errorf("failed to execute docker inspect: %s, %v", output, err)
	}

	var ports nat.PortMap
	if err = json.Unmarshal(output, &ports); err != nil {
		return nil, fmt.Errorf("failed to parse output: %v", err)
	}

	return ports, nil
}

@mhornbacher
Copy link

mhornbacher commented May 23, 2024

This issue started occurring for me only after installing minikube on the local docker installation. However since then it is consistently failing.

However adding a manual delay and watching that the port is open seems to resolve this. I assume there is a timing issue here...

This occurs on Docker as well as podman

@rajaie7
Copy link

rajaie7 commented Jun 27, 2024

Also seeing this error with the postgres container

		postgresContainer, err := postgres.RunContainer(ctx, testcontainers.WithImage("postgres:14-alpine"),
			postgres.WithUsername("username"), postgres.WithPassword("password"),
			testcontainers.WithLogConsumers(logConsumer))

		mappedPort, err := postgresContainer.MappedPort(ctx, nat.Port("5432/tcp"))

adding

		time.Sleep(10 * time.Second)

helps resolve the issue.

This also seems to have helped

testcontainers.WithWaitStrategy(
				wait.ForLog("database system is ready to accept connections").
					WithOccurrence(2).
					WithStartupTimeout(5*time.Second))

@mdelapenya
Copy link
Member

Some fixes around container inspect and available ports were added in v0.32.0 and v0.33.0.

Could you please check if the issue described here is gone? Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
podman Issues regarding podman.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

10 participants