-
-
Notifications
You must be signed in to change notification settings - Fork 1.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Bug] Node Connection Issues(~600 nodes) in v0.23.0-alpha12 #1966
Comments
did you verify that there was a problem with the connections between nodes, or are you saying that you do not expect any errors? |
I verified that there are two issues in the latest version: (1) When 600 users join a single Headscale server, the error "ERR update not sent, context cancelled..." occurs in Headscale. (2) Some of the joined 600 users are in an offline status when checked with headscale node list. There are no issues with connections between users who are in an online status. |
t2.medium sounds a bit optimistic, its unclear if its too small for the headscale, or for the test clients: The error mentioned would mean one or more of:
The problem here might be either that the Headscale machine does not have enough resources to maintain all of the connections, or the VMs running 100s of client does not have enough resources to run them all. The machine used in #1656 is significantly larger, its probably a bit overspecced with the new alpha. |
Another important question is whether you are running sqlite or postgres. If sqlite try enabling wal, or switching to postgres. Sounds like it could be a concurrency issue. |
I am currently using sqlite(without wal option). I will rerun the same tests on a higher performance instance using postgres. |
Please try with WAL first. |
WAL on by default for SQLite is coming in #1985. I will close this issue as it is more of a performance/scaling thing than a bug. We have a couple of hidden tuning options, which together with WAL might be good content for a "performance" or "scaling" guide in the future. |
Using Postgres I'm experiencing the same issue here using alpha 12 in a network of ~30 nodes, with a handful of ephemeral nodes coming in and out through the day. I've seen both regular users on laptops, and machines in the cloud be able to connect to Headscale, but then not be able to reach any other node in the network. Headscale outputs the same errors at stated at the beginning of the issue, though while digging through the new map session logic I'm unsure if the error and the issue is related. If I were to guess something is hanging in Line 271 in 8571513
I had the problem with a laptop connecting to a remote machine, so I had ran |
Is this a support request?
Is there an existing issue for this?
Current Behavior
To verify if issue #1656 persists in v0.23.0-alpha12, a connection test was conducted. When attempting to connect 600 tailscale nodes using the v0.23.0-alpha12 version of headscale, the following error occurs frequently and some nodes become offline after connecting. There was no CPU or memory overload.
Expected Behavior
All 600 tailscale nodes should connect successfully to the headscale server and operate stably without error logs.
Steps To Reproduce
Environment
Runtime environment
Anything else?
headscale_log_2024-06-03.txt
headscale_node_list.txt
Attached are the container logs of the tested headscale and the node list when attempting to connect approximately 600 nodes.
Based on these logs, it appears that issue #1656 persists in v0.23.0-alpha12.
The text was updated successfully, but these errors were encountered: