Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] Intermittent "container <id> is not connected to the network <network>" errors on up #10668

Closed
milas opened this issue Jun 7, 2023 · 14 comments
Assignees
Labels

Comments

@milas
Copy link
Contributor

milas commented Jun 7, 2023

Description

Since I've run the E2E suite a bunch today, I've noticed the ddev test fails intermittently when bringing up the project:

Failed to start 004: ComposeCmd failed to run 'COMPOSE_PROJECT_NAME=ddev-004 docker-compose -f /tmp/TestComposeRunDdev2386477416/004/.ddev/.ddev-docker-compose-full.yaml up --build -d', action='[up --build -d]', err='exit status 1', stdout='#1 [web internal] load build definition from Dockerfile
Network ddev-004_default  Creating
Network ddev-004_default  Created
Container ddev-004-web  Creating
Container ddev-004-dba  Creating
Container ddev-004-db  Creating
Error response from daemon: container 434c66ed5ce1c8b0a230b1ba25ac0b5cbedfc62710f3fbe01c0000d8fe2e2f80 is not connected to the network ddev_default'

That's coming from here:

err = s.apiClient().NetworkDisconnect(ctx, netwrk.Name, created.ID, false)

When I modified the test locally to leave things running, I could see that the container was in fact connected to the network, so I think this is another Engine race condition with networks given that we create the container just before this

Steps To Reproduce

Run TestComposeRunDdev E2E test a bunch

Compose Version

HEAD

Docker Environment

Occurs on both Engine v20.10.x in CI and v24.x locally on my macOS Docker Desktop install

Anything else?

No response

@milas
Copy link
Contributor Author

milas commented Jun 7, 2023

Given that we're about to reconnect the network (with the full set of options? which I guess we can't do on create), would it make more sense to do the initial container create with network: none and then connect so we don't have to do the pre-emptive disconnect?

@ndeloof
Copy link
Contributor

ndeloof commented Jun 8, 2023

This was implemented to replicate the docker run API call payload, but maybe we indeed don't need to.

@ToshY
Copy link

ToshY commented Jun 23, 2023

I got the same error when I upgraded compose from 2.18.1 to 2.19.0. Reverted back in the meantime.

@dud1337
Copy link

dud1337 commented Jun 28, 2023

If using the "docker in docker" image to deploy, reverting from docker:latest to docker:24.0.1 reverts compose back before 2.19.0 as suggested above.

@wintheriscomming
Copy link

We got the same error after upgrading to 2.19.0, after reverting to previous version things started working again.

@rfay
Copy link
Contributor

rfay commented Jun 29, 2023

This is preventing DDEV from using v2.19.0, not sure what the fix is, fails on every platform

@rfay
Copy link
Contributor

rfay commented Jun 29, 2023

@ndeloof is there some kind of workaround for this behavior on our end, or does it require a fix to compose?

@milas
Copy link
Contributor Author

milas commented Jun 29, 2023

Compose v2.19.1 was just released with a fix for this: https://github.com/docker/compose/releases/tag/v2.19.1

@ToshY
Copy link

ToshY commented Jun 30, 2023

Compose v2.19.1 was just released with a fix for this: https://github.com/docker/compose/releases/tag/v2.19.1

@milas While this seems to fix the not connected to network issue, it gave me a forward host lookup failed: Host name lookup failure in return when running waisbrot/wait with docker compose run.

Example

version: '3.9'
  
services:
  mysql:
    image: ${MYSQL_IMAGE_VERSION}
    ports:
      - "3308:3306"
    environment:
      MYSQL_USER: ${DATABASE_USER}
      MYSQL_DATABASE: ${DATABASE_NAME}
      MYSQL_ROOT_PASSWORD: ${DATABASE_ROOT_PASSWORD}
      MYSQL_PASSWORD: ${DATABASE_PASSWORD}
    volumes:
      - mysql_data:/var/lib/mysql
    networks:
      - webapp

  wait:
    image: waisbrot/wait
    environment:
      TARGETS: mysql:3306
      TIMEOUT: 300
    networks:
      - webapp

volumes:
  mysql_data:
    driver: local

networks:
  webapp:
    driver: bridge
$ docker compose up -d
[+] Running 4/4
 ✔ Container webapp-wait-1    Started                                                                                                                                                         3.0s
 ✔ Container webapp-mysql-1   Started                                                                                                                                                         2.9s

$ docker compose run --rm wait
Waiting for mysql:3306  .mysql: forward host lookup failed: Host name lookup failure
.mysql: forward host lookup failed: Host name lookup failure
.mysql: forward host lookup failed: Host name lookup failure

$ docker compose ps
NAME                            IMAGE                         COMMAND                  SERVICE             CREATED             STATUS              PORTS
webapp-mysql-1                  mysql:8.0                     "docker-entrypoint.s…"   mysql               8 minutes ago       Up 5 minutes        33060/tcp, 0.0.0.0:3308->3306/tcp, :::3308->3306/tcp

I'm reverting back to 2.18.1 yet again 🙁

@rfay
Copy link
Contributor

rfay commented Jun 30, 2023

2.19.1 is not working for DDEV in many different ways (although the "not connected to network" seems to have gone away), hard to sort out all the test failures but all test types have some kind of new failure.

@akaspin
Copy link

akaspin commented Jul 6, 2023

Same. 2.19.1 not working. Reverted to 2.18.1.

@milas
Copy link
Contributor Author

milas commented Jul 6, 2023

@ToshY I'm having trouble reproducing what you're seeing:

compose.yaml

services:
  mysql:
    image: mysql
    ports:
      - "3308:3306"
    volumes:
      - mysql_data:/var/lib/mysql
    environment:
      - MYSQL_RANDOM_ROOT_PASSWORD=1
    networks:
      - webapp

  wait:
    image: waisbrot/wait
    environment:
      TARGETS: mysql:3306
      TIMEOUT: 300
    networks:
      - webapp

volumes:
  mysql_data:
    driver: local

networks:
  webapp:
    driver: bridge
❯ docker compose up -d
[+] Running 2/2
 ✔ Container network-mysql-1  Started                                                                                                                 0.2s
 ✔ Container network-wait-1   Started                                                                                                                 0.1s

❯ docker compose run --rm wait
Waiting for mysql:3306  .  up!
Everything is up

If I start wait first, it will write out mysql: forward host lookup failed: Unknown host, which is expected since that host does not yet exist. The error goes away and wait succeeds once the mysql service is started.

@milas
Copy link
Contributor Author

milas commented Jul 6, 2023

Spoke too soon - I got a reliable repro and have a fix. I'm closing this issue to avoid confusion, follow #10777 for the Host name lookup failure fix. Thanks for the reports!

@milas milas closed this as completed Jul 6, 2023
milas added a commit to milas/compose that referenced this issue Jul 6, 2023
As part of the fix for docker#10668, the logic was adjusted so that the
default (highest-priority) network is used in the `ContainerCreate`,
and then the remaining networks are connected via calls to
`NetworkConnect` before starting the container.

Unfortunately, `ServiceConfig::NetworksByPriority` is neither
deterministic nor stable when networks have the same priority.

It's non-deterministic because the order of networks from parsing
YAML is random, since they are loaded into a Go map (which have
random iteration order). Additionally, it's not using a `SortStable`
in `compose-go`, so even if the load order was predictable, it
still might produce different results.

While I look at improving `compose-go` here to prevent this from
tripping us up in the future, this fix looks at _all_ networks for
a service and ignores the "default" one now. Before, it would
always skip the first one in the slice since that _should_ have
been the "default".

Signed-off-by: Milas Bowman <[email protected]>
@rfay
Copy link
Contributor

rfay commented Jul 6, 2023

That said, although the "connected to the network" issues seem to be gone, there seem to be an interesting array of new problems in 2.19. Following #10777

ndeloof pushed a commit that referenced this issue Jul 7, 2023
As part of the fix for #10668, the logic was adjusted so that the
default (highest-priority) network is used in the `ContainerCreate`,
and then the remaining networks are connected via calls to
`NetworkConnect` before starting the container.

Unfortunately, `ServiceConfig::NetworksByPriority` is neither
deterministic nor stable when networks have the same priority.

It's non-deterministic because the order of networks from parsing
YAML is random, since they are loaded into a Go map (which have
random iteration order). Additionally, it's not using a `SortStable`
in `compose-go`, so even if the load order was predictable, it
still might produce different results.

While I look at improving `compose-go` here to prevent this from
tripping us up in the future, this fix looks at _all_ networks for
a service and ignores the "default" one now. Before, it would
always skip the first one in the slice since that _should_ have
been the "default".

Signed-off-by: Milas Bowman <[email protected]>
@milas milas unpinned this issue Jul 7, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

7 participants