Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Teardown tunnel automatically if peer's certificate expired #370

Merged

Conversation

ton31337
Copy link
Contributor

@ton31337 ton31337 commented Feb 2, 2021

Handle this with periodic keepalive ticks. This is needed to avoid
hanging a connection even peer's certificate expired.

In case you use short-lived certificates (let's say 2 hours), current
behavior is a bit wrong, because the tunnel stays UP and RUNNING fine
unless restarted nebula service.

Signed-off-by: Donatas Abraitis [email protected]

@CLAassistant
Copy link

CLAassistant commented Feb 2, 2021

CLA assistant check
All committers have signed the CLA.

@ton31337 ton31337 force-pushed the feature/validate_peer_cert_with_tunnel_status branch 3 times, most recently from be368eb to f670376 Compare February 2, 2021 21:13
@ton31337 ton31337 changed the title Teardown tunnel automatically if peer's certificate expired [WIP] Teardown tunnel automatically if peer's certificate expired Feb 2, 2021
@ton31337 ton31337 force-pushed the feature/validate_peer_cert_with_tunnel_status branch from f670376 to ac84e19 Compare February 2, 2021 22:21
@ton31337 ton31337 changed the title [WIP] Teardown tunnel automatically if peer's certificate expired Teardown tunnel automatically if peer's certificate expired Feb 2, 2021
@ton31337 ton31337 force-pushed the feature/validate_peer_cert_with_tunnel_status branch from ac84e19 to 91793df Compare February 3, 2021 09:56
@ton31337 ton31337 force-pushed the feature/validate_peer_cert_with_tunnel_status branch 3 times, most recently from 6b5bb37 to 33a7a59 Compare May 12, 2021 11:24
@virtadpt
Copy link

Can we please also have a notice "Node X's certificate has expired, tearing down connection" in the logs?

@ton31337 ton31337 force-pushed the feature/validate_peer_cert_with_tunnel_status branch from 33a7a59 to 71b2c4a Compare May 13, 2021 09:06
@ton31337
Copy link
Contributor Author

@virtadpt added more logging regarding disconnection.

@virtadpt
Copy link

So, monitor output for "Invalid certificate status", and then suss out the value of vpnIp for the host that needs remediated.

Works for me. I'll give it a try this weekend.

@ton31337
Copy link
Contributor Author

ton31337 commented Jun 4, 2021

@rawdigits @nbrownus any chance to review this?

@virtadpt
Copy link

virtadpt commented Jun 4, 2021

@ton31337 Not yet. I'm still working 90 hour weeks.

@virtadpt
Copy link

virtadpt commented Aug 4, 2021

I finally went off call yesterday. I'll give it a try when I have a moment.

Copy link
Collaborator

@nbrownus nbrownus left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Made a few comments, using connectionManager for this is nice as it won't really punish the hot path and should give a fairly quick reaction to changes in CA trust/certificate expiry.

connection_manager.go Outdated Show resolved Hide resolved
connection_manager.go Outdated Show resolved Hide resolved
connection_manager.go Outdated Show resolved Hide resolved
examples/config.yml Outdated Show resolved Hide resolved
connection_manager.go Outdated Show resolved Hide resolved
connection_manager.go Outdated Show resolved Hide resolved
@ton31337 ton31337 force-pushed the feature/validate_peer_cert_with_tunnel_status branch from 71b2c4a to f96d554 Compare September 15, 2021 15:41
Handle this with periodic keepalive ticks. This is needed to avoid
hanging a connection even peer's certificate expired.

In case you use short-lived certificates (let's say 2 hours), current
behavior is a bit wrong, because the tunnel stays UP and RUNNING fine
unless restarted nebula service.

This is the log from how it's reflected.

```
time="2021-05-12T10:15:51Z" level=debug msg="Tunnel status" tunnelCheck="map[method:passive state:alive]" vpnIp=172.17.90.241
time="2021-05-12T10:15:59Z" level=debug msg="Tunnel status" tunnelCheck="map[method:passive state:alive]" vpnIp=172.17.90.241
time="2021-05-12T10:16:06Z" level=debug msg="Invalid certificate status" certName=ton31337 vpnIp=172.17.90.241
time="2021-05-12T10:16:06Z" level=debug msg="Tunnel status" tunnelCheck="map[method:passive state:alive]" vpnIp=172.17.90.241
time="2021-05-12T10:16:06Z" level=debug msg="Tunnel status" certName=ton31337 tunnelCheck="map[method:active state:testing]" vpnIp=172.17.90.241
time="2021-05-12T10:16:20Z" level=debug msg="Tunnel status" tunnelCheck="map[method:active state:alive]" vpnIp=172.17.90.241
time="2021-05-12T10:16:20Z" level=info msg="Tunnel status" certName=ton31337 tunnelCheck="map[method:active state:dead]" vpnIp=172.17.90.241
time="2021-05-12T10:16:20Z" level=debug msg="deleting 172.17.90.241 from lighthouse."
time="2021-05-12T10:16:20Z" level=debug msg="Hostmap hostInfo deleted" hostMap="map[indexNumber:3347248980 mapName:main mapTotalSize:844 remoteIndexNumber:3605974939 vpnIp:172.17.90.241]"
```

Signed-off-by: Donatas Abraitis <[email protected]>
@ton31337 ton31337 force-pushed the feature/validate_peer_cert_with_tunnel_status branch from f96d554 to 1246442 Compare September 15, 2021 15:44
@ton31337 ton31337 requested a review from nbrownus September 15, 2021 15:48
@wadey wadey added this to the v1.5.0 milestone Sep 27, 2021
Copy link
Member

@wadey wadey left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me, lets just put a note in the CHANGELOG that this won't recognize if the remote side used SIGHUP to refresh to a new certificate. (and we should fix that in a follow up PR)

@nbrownus nbrownus merged commit 32e2619 into slackhq:master Oct 20, 2021
nbrownus added a commit that referenced this pull request Oct 20, 2021
@ton31337 ton31337 deleted the feature/validate_peer_cert_with_tunnel_status branch October 20, 2021 18:29
nbrownus added a commit that referenced this pull request Oct 20, 2021
nbrownus added a commit that referenced this pull request Oct 20, 2021
wadey added a commit that referenced this pull request Mar 18, 2022
This restores the hostMap.QueryVpnIP block to how it looked before #370
was merged. I'm not sure why the patch from #370 wanted to continue on
if there was no match found in the hostmap, since there isn't anything
to do at that point (the tunnel has already been closed).

This was causing a crash because the handleInvalidCertificate check
expects the hostinfo to be passed in (but it is nil since there was no
hostinfo in the hostmap).

Fixes: #657
wadey added a commit that referenced this pull request Apr 4, 2022
This restores the hostMap.QueryVpnIP block to how it looked before #370
was merged. I'm not sure why the patch from #370 wanted to continue on
if there was no match found in the hostmap, since there isn't anything
to do at that point (the tunnel has already been closed).

This was causing a crash because the handleInvalidCertificate check
expects the hostinfo to be passed in (but it is nil since there was no
hostinfo in the hostmap).

Fixes: #657
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants