-
-
Notifications
You must be signed in to change notification settings - Fork 1.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Bugs related to state change in #1492 #1561
Comments
Headscale logs
|
@vsychov Thanks for the awesome and comprehensive writeup, I truly appreciate it. I will try to get to this shortly, I have some theories, but need some time to sit down with it. |
I've merged a bunch of fixes in #1564, please give it a go and come back to me. |
0.23.0-alpha2 addresses a series of issues with node synchronisation, online status and subnet routers, please test this release and report back if the issue still persist. |
Please give https://github.com/juanfont/headscale/releases/tag/v0.23.0-alpha4 a swing |
To note, the tag of the container image moved from We also had to change the command that is run in the container from Also the syntax from database configuration in This also comes with a new parameter Although the image possibly misses a directory at All of these are probably worth noting as breaking changes, as they will possibly prohibit a seamless upgrade path. |
Made an issue regarding config migrations for you #1758 Regarding the docker part, that is of course not officially supported and not well documented. If you feel it is a big issue as of now, feel free to create an issue (and possibly PR). |
Could you please test if this is still the case with https://github.com/juanfont/headscale/releases/tag/v0.23.0-alpha5 ? |
Thank you, very well. Upgrading
This is alleviated with changing the name of the configuration key < ip_prefixes:
< - fd7a:115c:a1e0::/48
< - 100.64.0.0/10
---
> prefixes:
> v6: fd7a:115c:a1e0::/48
> v4: 100.64.0.0/10 It will be good to include this information in the migration guide as well. If there is interest, I can take look at https://github.com/juanfont/headscale/blob/main/docs/running-headscale-container.md again and prepare a PR including the current changes. And because we fixed the broken migration manually, we cannot run the new migration, as the column to be deleted does not exist right now #1748.
Running |
Could you please try the newest alpha (https://github.com/juanfont/headscale/releases/tag/v0.23.0-alpha6) and report back? |
Thanks @kradalby , I'll make tests today or tomorrow |
Hello @kradalby , I tried testing it on the
I created two nodes:
Then, I connected them to headscale using a pre-auth key for the user tailscale up --auth-key XXXX --advertise-exit-node --login-server=https://headscale-test.example.com --advertise-routes=10.110.0.0/16 --shields-up=false Node details:
However, both nodes are invisible to each other: root@tmp-tailscale-2-ams3-do:~# tailscale status
tmp-tailscale-2-ams3-do user.example.com linux idle; offers exit node root@tmp-tailscale-1-ams3-do:~# tailscale status
tmp-tailscale-1-ams3-do user.example.com linux idle; offers exit node Additionally, I noticed that there are no IP addresses in |
Yes the IPs is strange I suspect they might be the case. Is this sqlite or postgres? do you have the config? Edit: the config might not have the new IP prefix syntax? |
@kradalby, you are right! There were missing |
Yea, we need to throw an error if there are no prefixes! |
https://github.com/juanfont/headscale/releases/tag/v0.23.0-alpha10 was also released to address a couple of regressions in the ACL. |
and addressed no prefix issue in #1918 |
@vsychov let me know when you have had a time to give this a go, if these issues are resolved, I will tag a beta release after resolving one other issue. |
@kradalby , I was just about to write that everything was fine, but it seems something went wrong. I deployed a test environment consisting of 3 machines and one headscale control server. I left it running for about 3 days, and for the first 24 hours, it worked super stably. Subnet routers failed over very quickly, there were no problems with nodes going offline, and so on. Everything seemed just fine. But literally just before writing here, I decided to do a retest and found that both subnet routers became
Consequently, from the clients, the subnet 10.0.0.0/8 became inaccessible.
Testing was performed on version 0.23.0-alpha8. If I can provide any additional information that would help identify the cause of this behavior, let me know, and I'll try to get it (perhaps logs would be useful or something else). |
@vsychov do you have the log of the machine? that would be helpful. If you cannot share it, it would be useful to see if you see a lot log lines with |
@kradalby, which machine's log? Where was headscale run? Or machine with tailscale? |
Headscale please |
I noticed that it was at the 'INFO' level, so there are no lines with |
I've switched the logs to trace mode, and I suggest waiting a few more days in an attempt to reproduce the issue. Meanwhile, as far as I can see, headscale doesn't seem to perform route reselection upon restart, or it seems better to do it once in a while, for example, every 10 seconds (the time could be configurable), by going through all routes and finding those without any 'Primary' nodes, and forcibly performing reselections. This way, it seems possible to protect against some failures when routes are chosen based on events (as I understand it, that's how it's currently done). |
I'm writing an update on the testing results. Since I started running I'll continue to keep the |
I've fixed up some of the things that could cause a deadlock in https://github.com/juanfont/headscale/releases/tag/v0.23.0-alpha12, I will leave this open for you to test, and hopefully we can close this. |
This comment was marked as abuse.
This comment was marked as abuse.
We reckon this is now fixed, after a significant redesign of the state machine. Can you open a new issue should this be present in the new release? |
I completed all my tests, and am not able to reproduce this issue anymore. Thanks for your great job! |
Bug description
Hello there,
After #1492 was merged, I noticed the following two issues, which may be related, so I'm posting this as a single problem:
The
offers exit node
flag takes quite some time to appear after route approvals, and nodes status displayed as "offline" (and this is main problem), I tested this with the following configuration:tailscale up --auth-key XXXX --advertise-exit-node --login-server=https://headscale-test.example.com/ --advertise-routes=10.0.0.0/8 --accept-routes --accept-dns --ssh --shields-up=false
(one user and one reusable auth-key)1.50.0
of tailscale for all clients, but the same problem is present on1.48.2
I connected all clients, confirmed all routes, and at
2023-09-26T10:32:26Z
got the following result of theheadscale route list
command:At
2023-09-26T10:32:45Z
, I ran the command/Applications/Tailscale.app/Contents/MacOS/Tailscale status
on macOS, and saw the following result (tmp-tailscale-fra1-01
aleady displayed as offline):At
2023-09-26T10:33:15Z
, the nodetmp-tailscale-fra1-02
displayed as offline, and still, no one was offering an exit-node:At
2023-09-26T10:34:39Z
, the nodetmp-tailscale-fra1-03
started to be displayed as offering an exit-node (and I noticed, that its hostname changed):The situation remained the same until
2023-09-26T10:39:50Z
when I stopped the test, here is the result ofheadscale nodes list
at that time:All 3 Linux clients are on the same network, have the same connectivity with headscale (the command works the same on all machines, example with
tmp-tailscale-fra1-03
):with version 0.22.3 it's works well.
Also there is headscale logs, that shows, that cominnication with nodes displayed as "offline" is was going.
Environment
main
(01b85e5)1.50.0
The text was updated successfully, but these errors were encountered: