-
-
Notifications
You must be signed in to change notification settings - Fork 1.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Headscale removes node after it was re-added #1798
Comments
Okay, I've tested this "workaround" and it seems to work. No undesired offline so far.
But the things is kinda difficult to me to propose something better than that |
Could you please test the latest alpha? |
I believe fixes in https://github.com/juanfont/headscale/releases/tag/v0.23.0-alpha12 should resolve this issue, let me now if not and we will reopen it. |
Bug description
It seems there is an issue with nodes that are hopping between different networks. The most common scenario is a mobile device with a mobile network and WiFi connections. So when the node switches network, it sends an update and Headscale adds a new update channel for it. The problem is, that the new update channel rewrites the old one in the notifier instance before the cleanup of the old one takes place. So when it's time to close and delete the old channel, Headscale in fact removes the new one. I believe it's because there is no place for multiple update channels for a single node. If the node is fast enough to send a new update before the cleanup, the old channel will be overwritten, and the deferred remove procedure takes down the new one instead. Thus the node ends up offline and receives no updates from this point.
Environment
To Reproduce
Swap any node between networks, make it send new updates. Observe how update channels change
Logs and attachments
I've came up with some silly and naive quick-fix and I don't think it's a good one, because it may fix the outcome, but the original problem. Anyway, I haven't tested it yet, it's just an idea.
Firstly, we add a channel counter for each machine.
Then we increment this counter in the AddNode function
And finally, we decrement the counter in the RemoveNode and remove the node completely only if it was the last one
The text was updated successfully, but these errors were encountered: