-
Notifications
You must be signed in to change notification settings - Fork 18.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Node downed through repeated promote/demotes #33580
Comments
I kicked over the docker daemon, and this has happened:
And on the "primary" manager
|
The steps say to repeat the promotion/demotion cycle, but the node ls output shows 3 managers in the broken state. Am I correct that it's actually promotion failing, not demotion?
|
Yes, it's a promotion that's failing. Sorry that I was unclear. |
I think the issue here is that we made demote async recently (it sounds weird, but it actually makes things way more solid). However, if you demote/promote rapidly, it's possible to promote a node before the demotion has actually gone through, and you would get the "a raft member with this node ID already exists" error that you're seeing in Apparently we aren't handling this situation gracefully. The fact that BTW, I believe this would no longer be an issue with moby/swarmkit#2198. All the more reason to get that merged :) |
Did some debugging on this, and it looks like the issue is the one fixed by moby/swarmkit#2203 However, moby/swarmkit#2198 will probably be necessary to completely resolve the issue. |
The mentioned fixes were merged. I will mark this complete. |
Description
Repeatedly promoting and demoting a node has put it in a Down and Unreachable state. Doing
docker info
hangs, but doingdocker ps
works. This likely indicates some sort of deadlock in the Cluster subcomponent.Steps to reproduce the issue:
Down
/Unreachable
.Output of
docker version
:Output of
docker info
:N/A,
docker info
hangs.Additional environment details (AWS, VirtualBox, physical, etc.):
3 Node cluster on AWS, t2.micro instances.
The text was updated successfully, but these errors were encountered: