-
Notifications
You must be signed in to change notification settings - Fork 122
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Reproducible: container network behavior breaks all host-to-container networking #3487
Comments
Similar to #3448 but reproducible. |
I've created a repository with code to reproducibly break Docker engine on Docker for Mac. |
I have determined that it is not the amount of data transferred that triggers the failure. It appears that the failure is triggered by whatever it is that the DockerHub image for This parameter defaults to 16777216 (16 MiB), and we first observed the Docker engine failure when we tried to insert BLOBs larger than this amount. But, by setting a smaller limit, e.g. by passing this to the image, That is, it appears that whatever network activity happens when MariaDB is generating its exception about max-allowed-packet being exceeded is what breaks the Docker engine host-to-container networking, not the amount of data itself. |
Also, we have observed that both the |
Hi @hughsw, I ran into something similar to what you're describing. Figured I'd share my experiences. For what it's worth I found that upgrading to anything later than the August 2018 version breaks so I've personally rolled back to that after downloading it here: https://docs.docker.com/docker-for-mac/release-notes/#docker-community-edition-18061-ce-mac73-2018-08-29. I also found this issue to also be reproducible but interestingly it seems our scale of "size" seems to be different. I created a demo app here: https://github.com/Allan-Clements/docker-demo It spins up a localstack container hosting a localhost version of S3 and then uploads files of increasing sizes to it with random bytes in each file to increase the size each time. I encountered this bug due to a test for a real application uploading files to a localstack container kept getting stuck at the 2nd test case. The first case involved uploading a 7 KB file. Whereas the 2nd case was 168 KB. Running this demo app I have observed it failing to upload a file as small as 15 KB up to 31 KB depending on whichever post August 2018 release of Docker For Mac I was trying out. And like in your case it seems like my docker environment is stuck until I reset the docker daemon. I would confirm the networking layer was broken by just trying to run the nginx example container and curl against it. It'd hang like this after my demo app would hit whatever file size would break it for that version:
|
Issues go stale after 90d of inactivity. Prevent issues from auto-closing with an If this issue is safe to close now please do so. Send feedback to Docker Community Slack channels #docker-for-mac or #docker-for-windows. |
/remove-lifecycle stale |
I've just confirmed that the bug is in Docker for Mac 2.0.0.3/18.09.2 I've updated the documentation etc for the reproducible code: https://github.com/hughsw/dockerbug |
Thanks for the report (and especially for the repro example!) I can confirm the current stable is still broken (Version 2.0.0.4 (31365)). Inside the VM (using a command like
I believe the bug is fixed in current edge (Version 2.0.5.0 (35318)). Inside the VM the log file has a different message:
This log comes from here, added by this commit in moby/vpnkit#453 . I believe what's happening is the connection is being closed while data is in-flight: previously this would trigger a fatal error breaking all future port forwarding. It now generates a log message and safely continues. So I think this will be fixed in stable in the next stable release. Thanks again for the repro example! |
Issues go stale after 90d of inactivity. Prevent issues from auto-closing with an If this issue is safe to close now please do so. Send feedback to Docker Community Slack channels #docker-for-mac or #docker-for-windows. |
Closed issues are locked after 30 days of inactivity. If you have found a problem that seems similar to this, please open a new issue. Send feedback to Docker Community Slack channels #docker-for-mac or #docker-for-windows. |
[x ] I have tried with the latest version of my channel (Stable or Edge)
[x ] I have uploaded Diagnostics
Diagnostics ID: CDF6C54F-DD10-459F-9401-2F769C811A8F/20190509120230
Version 2.0.0.2 ID: CDF6C54F-DD10-459F-9401-2F769C811A8F/20190125144328
Expected behavior
Host to Docker engine networking should just work.
Actual behavior
Express/Sequelize process on host attempts to send a BLOB larger than the configured
max_allowed_packet
to a MariaDB Sql server running in a container. MariaDB server Aborts the connection and complains(Got a packet bigger than 'max_allowed_packet' bytes)
. Thereafter, all networking from the host to any existing or newly running containers fail with connection timeouts.Information
The problem is reproducible. It seems to be triggered by transfer of large BLOB data (15MB) from NodeJS/Sequelize process running on the Mac into a MariaDB SQL server running in a container. The MariaDB server is exposing its port, 3306, to the host, and the NodeJS/Sequelize process is contacting it as localhost:3306. The MariaDB SQL container mounts a host directory for the container to use for the SQL database.
After the failure, no containers can be accessed via networking from the host. That is, existing running containers can no longer be reached (connections time out). Also, newly
docker run
containers cannot be reached either.Note: The connections to the containers time out, they are not refused.
If I
docker exec
into running containers, I can access their networked ports via the container's localhost. So, the issue seems to be more at the level of host-network-to-container-network rather than the internal container's networking....I have looked at the streaming logs while the failure occurs (
/usr/bin/log stream ...
). There is no docker message in these logs around the time of the failure event.The only fix I've found so far is to Restart the Docker app/engine.
I do not know how long the problem has been around because I have only just started the work to upload large BLOBs to the MariaDB server. Smaller BLOBs (1MB) do not cause the problem.
Diagnostic logs
Diagnose succeded
Steps to reproduce the behavior
This problem was found during dev work for a complex setup with 8 containers running in a Stack.
There is now a repo with minimal code to reproducibly demonstrate this problem: https://github.com/hughsw/dockerbug
The text was updated successfully, but these errors were encountered: