Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Hybrid swarm routing mesh not working #3052

Closed
1 task
stevenmiller opened this issue Dec 7, 2018 · 8 comments
Closed
1 task

Hybrid swarm routing mesh not working #3052

stevenmiller opened this issue Dec 7, 2018 · 8 comments

Comments

@stevenmiller
Copy link

Howdy!

I have a two node hybrid swarm, with one Ubuntu Linux 18.04 node acting as manager and one Windows 2019 node participating as a worker. We have a sample environment we are testing which are entirely Windows containers. We are able to reach the published port for these services from the Windows node (ex. 10.20.1.121:8888) but we are unable to from the Linux node (ex. 10.20.1.122:8888). Our understanding of the ingress mesh networking is that this should work fairly seamlessly as it will route traffic within the swarm to the correct node.

This issue is also occurring with the swarmpit UI stack we have deployed to the swarm. The UI (10.20.1.122:888) is not able to be reached from the Windows node IP (10.20.1.121:888).

  • [x ] I have tried with the latest version of my channel (Stable or Edge)
  • I have uploaded Diagnostics
  • Diagnostics ID:

Expected behavior

Services running on one node should be accessible from published port on another node

Actual behavior

Published Windows container services are not accessible from linux node and vice versa

Information

  • Windows Version: Server 2019
  • Docker for Windows Version: 18.09.0
  • Linux Version: Ubuntu 18.04.1 LTS
  • Linux Node Docker Version: 18.09.0

Published services and ports:

image

Both the environment we have pulled from a private repo and the swarmpit stack exhibit this behavior. I am unable to reach the swarmpit UI from port 888 on the Windows node IP.

Steps to reproduce the behavior

Here is the stack with our private repo removed:

teststack.zip

Let me know if I can provide any other info.

@olljanat
Copy link

olljanat commented Dec 7, 2018

@stevenmiller first of I can see that this is your first issue on GitHub so let me say welcome :)

Then some side comments (which you don't need to comment but maybe it is good idea to check them out):

  • Some best practices with Windows containers: https://github.com/MicrosoftDocs/Virtualization-Documentation-Private/issues/1531
  • You can set default logging driver to all Docker hosts so you don't need specify it separately for all containers.
  • Environment variable ASPNETCORE_ENVIRONMENT gives me hint that your applications are ASP.NET Core based so unless your application really needs something Windows specific features I highly recommend to use Linux version of these images. It sounds bit weird idea but based on my two years experience with Linux and Windows containers I still highly recommend it.
  • Kestrel web server which is used by .NET Core have been supported to publish directly to internet starting from version 2.1 but I would still recommend to consider configuration where you run example NGINX as reverse proxy front of it and make it running on Linux nodes even if your applications are running on Windows nodes.
  • Consider to use Portainer instead of Swarmpit. It example have support for native Windows containers which is missing from Swarmpit and especially their agent deployment is very interesting as you can manage Linux and Windows containers from one UI and even connect multiple Swarms to it if needed.
  • If you need Windows containers then consider to get Microsoft Premier Support contract (if you don't have it yet).
  • Microsoft recommends to use host mode in production
  • You can find me with same nick from Docker and Portainer Slacks if you want hear some real world examples how this stuff works on production.

Then what comes to this issue, IMO your stack file looks to be just like documentation says that how it should work but in real life it looks that Windows implementation of overlay network / components used behind it (example: hcsshim) have some undocumented weaknesses.

@stevenmiller
Copy link
Author

stevenmiller commented Dec 10, 2018

@olljanat Thanks for the tips! We are revising the apps to run what we can on linux containers. At least one service requires Windows so we are stuck with it for the time being. Is there any further troubleshooting you can advise with this current setup or is likely just an issue we will have to wait out?

@stevenmiller
Copy link
Author

Just as a note: with my org's Microsoft SA agreement we get support, so I have a technical support case open regarding this issue. I will update with the resolution (if one is found).

@olljanat
Copy link

At least one service requires Windows so we are stuck with it for the time being. Is there any further troubleshooting you can advise with this current setup or is likely just an issue we will have to wait out?

Depending on your application architecture (which apps need to be able to talk which apps) you can example place Windows and Linux containers to same overlay network so they can talk with each others inside of it.

Just as a note: with my org's Microsoft SA agreement we get support, so I have a technical support case open regarding this issue. I will update with the resolution (if one is found).

OK. It is really interesting to hear if they are solve this issue as my earlier experiences of level of Microsoft support have not been too positive.

@stevenmiller
Copy link
Author

Here's the final email from MS support. Unfortunately no resolution at this time, however we did find that if you deploy a container and expose the default port (say 80 for IIS) it is reachable from both nodes without a problem. It seems to be an issue with the port translation between hosts. :

lab setup

PS C:\Users\Administrator> docker version
Client:
Version: 18.09.0
API version: 1.39
Go version: go1.10.3
Git commit: 33a45cd0a2
Built: unknown-buildtime
OS/Arch: windows/amd64
Experimental: false

// create overlay network
PS C:\Users\TEMP> docker network create --driver=overlay overlay-nw
vqixaimowmg9ch6byz089jmph

PS C:\Users\TEMP> docker network ls
NETWORK ID NAME DRIVER SCOPE
08ick3kig81k ingress overlay swarm
4e5485733a1d nat nat local
8ab446910e24 none null local
vqixaimowmg9 overlay-nw overlay swarm

// Init Swarm on manager node
PS C:\Users\TEMP> docker swarm init --advertise-addr=10.168.179.56 --listen-addr 10.168.179.56:2377

// Join a swarm as a work on Windows worker node
PS C:\Users\Administrator.WUXI> docker swarm join --token SWMTKN-1-5lazzfmd1wyft4egtnzope30i1n744qlw99bj79qtd4ccdv2hg-b4j4fkki75ud34c9n5fvmnffw 10.168.179.56:2377

// Join a swarm as a work on Linux worker node
admin@ubuntu-6011:~$ docker swarm join --token SWMTKN-1-5lazzfmd1wyft4egtnzope30i1n744qlw99bj79qtd4ccdv2hg-b4j4fkki75ud34c9n5fvmnffw 10.168.179.56:2377

PS C:\Users\TEMP> docker node ls
ID HOSTNAME STATUS AVAILABILITY MANAGER STATUS ENGINE VERSION
e99kxpcgwhkgtte7dwwko5pl0 * WS2019 Ready Active Leader 18.09.0
whjthcivqkvpj5oz45w6xg3de WS2019-B Ready Active 18.09.0
7mpt4wn2ioi4atmtmkxdu7q6u ubuntu-6011 Ready Active 18.09.0

// create service on master node
docker service create --name=test --endpoint-mode vip --network=overlay-nw microsoft/iis cmd
docker service update --publish-add published=8080,target=80 test

Problem repro and analysis

The test result shows the similar packet flow with the customer packet. The problem is that Linux node didn’t translate the published port 8080 to target port 80.

Failed Packet flow: Client -> Linux 10.168.179.42:8080[10.255.0.4] -> Windows 10.168.179.56:8080 [10.255.0.3:8080]

Success Packet flow: Client -> Windows 10.168.179.55:8080[10.255.0.3] -> Windows 10.168.179.56:80 [10.255.0.3:80]

Summary

As our test result and network traffic analysis, we found after deploying docker swarm mixed with Windows node and Linux node, we publish IIS service with 808080, when we visit the site with Windows nodes on port 8080, it will transfer to IIS container port 80, however, when we visit the site with Linux node on port 8080, it didn’t translate to the real IIS container port 80.

Suggestion

After discussing and according to the log analysis result, it seems there are some compatibility issues on the Linux node to translate the published port to the real container port. Since it may related with Linux system compatibility with docker swarm ingress network, it’s recommended to contact Linux vendor to look into the issue deeply.

@olljanat
Copy link

olljanat commented Jan 15, 2019

Btw is this related to moby/moby#38484 ?

@docker-robott
Copy link
Collaborator

Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale comment.
Stale issues will be closed after an additional 30d of inactivity.

Prevent issues from auto-closing with an /lifecycle frozen comment.

If this issue is safe to close now please do so.

Send feedback to Docker Community Slack channels #docker-for-mac or #docker-for-windows.
/lifecycle stale

@docker-robott
Copy link
Collaborator

Closed issues are locked after 30 days of inactivity.
This helps our team focus on active issues.

If you have found a problem that seems similar to this, please open a new issue.

Send feedback to Docker Community Slack channels #docker-for-mac or #docker-for-windows.
/lifecycle locked

@docker docker locked and limited conversation to collaborators Jul 3, 2020
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

No branches or pull requests

3 participants