Dynamically resolve reverse tunnel address #9958

rosstimothy · 2022-01-26T15:24:28Z

The reverse tunnel address is currently a static string that is
retrieved from config and passed around for the duration of a
services lifetime. When the tunnel_public_address is changed
on the proxy and the proxy is then restarted, all established
reverse tunnels over the old address will fail indefinintely.
As a means to get around this, #8102 introduced a mechanism
that would cause nodes to restart if their connection to the
auth server was down for a period of time. While this did
allow the nodes to pickup the new address after the nodes
restarted it was meant to be a stop gap until a more robust
solution could be applid.

Instead of using a static address, the reverse tunnel address
is now resolved via a reversetunnel.Resolver. Anywhere that
previoulsy relied on the static proxy address now will fetch
the actual reverse tunnel address via the webclient by using
the Resolver. In addition this builds on the refactoring done
in #4290 to further simplify the reversetunnel package. Since
we no longer track multiple proxies, all the left over bits
that did so have been removed to accomodate using a dynamic
reverse tunnel address.

lib/reversetunnel/agent.go

lib/reversetunnel/transport.go

Joerger

LGTM, just some minor comments

lib/reversetunnel/agentpool.go

lib/reversetunnel/rc_manager_test.go

lib/reversetunnel/resolver.go

Joerger · 2022-02-02T19:47:05Z

lib/reversetunnel/track/tracker.go

-			t.wp.Set(addr, uint64(count))
+	if t.sets.expire(cutoff) > 0 {
+		count := len(t.sets.proxies)
+		if count < 1 {


Why do we set count to 1 here? Is it obvious and I'm missing it, or could we add a comment?

My educated guess is that setting the workpool target to anything < 1 would cause it to close the underlying workgroup and thus require a new one to be created in the future when the target becomes >= 1. By instead setting the workpool target to 1 it resets the workgroup without deleting it.

@fspmarshall it looks like it has been like this since you added the tracker can you please correct me if I am wrong here?

The count is the number of proxies we expect to discover, based on the heartbeats we've seen. We don't start off knowing about any proxies since it is the first proxy that we connect to that tells us about its peers. Therefore if we don't know about any proxies yet, we just try to find at least one, and then wait for it to tell us the real number.

lib/service/service.go

The reverse tunnel address is currently a static string that is retrieved from config and passed around for the duration of a services lifetime. When the `tunnel_public_address` is changed on the proxy and the proxy is then restarted, all established reverse tunnels over the old address will fail indefinintely. As a means to get around this, #8102 introduced a mechanism that would cause nodes to restart if their connection to the auth server was down for a period of time. While this did allow the nodes to pickup the new address after the nodes restarted it was meant to be a stop gap until a more robust solution could be applid. Instead of using a static address, the reverse tunnel address is now resolved via a `reversetunnel.Resolver`. Anywhere that previoulsy relied on the static proxy address now will fetch the actual reverse tunnel address via the webclient by using the Resolver. In addition this builds on the refactoring done in #4290 to further simplify the reversetunnel package. Since we no longer track multiple proxies, all the left over bits that did so have been removed to accomodate using a dynamic reverse tunnel address.

- rename ResolveViwWebClient to WebClientResolver - add singleProcessModeResolver to TeleportProcess - make AgentPool.filterAndClose return nil if there are no matches - rename Pool.groups to Pool.group - remove Lease from TrackExpected

lib/reversetunnel/agent.go

lib/reversetunnel/agentpool.go

lib/reversetunnel/rc_manager.go

fspmarshall · 2022-02-02T21:37:04Z

lib/reversetunnel/resolver.go

+// Resolver looks up reverse tunnel addresses
+type Resolver func() (*utils.NetAddr, error)


I'm a bit concerned about the increased network load if proxies are restarted in very large clusters. Might create another thundering herd problem if we reload the address on every call to the resolver. I'm thinking it might be good if the resolver had some basic ttl-based caching capabilities, so that if it were called many times over a small period (say 3-5 seconds), only one network call is actually made.

We already use lib/cache/fncache.go to do the same thing to reduce thundering herd issues when the cache is unhealthy. That helper doesn't rely on anything in lib/cache, so we could easily move it somewhere under utils, and use it to provide similar functionality here. Ex:

func CachingResolver(resolver Resolver) Resolver { cache := utils.NewFnCache(3 * time.Second) return func() (*utils.NetAddr, error) { a, err := cache.Get(context.TODO(), "resolver", func() (interface{}, error) { addr, err := resolver() return addr, err }) if err != nil { return nil, err } return a.(*utils.NetAddr), nil } }

* Dynamically resolve reverse tunnel address The reverse tunnel address is currently a static string that is retrieved from config and passed around for the duration of a services lifetime. When the `tunnel_public_address` is changed on the proxy and the proxy is then restarted, all established reverse tunnels over the old address will fail indefinintely. As a means to get around this, #8102 introduced a mechanism that would cause nodes to restart if their connection to the auth server was down for a period of time. While this did allow the nodes to pickup the new address after the nodes restarted it was meant to be a stop gap until a more robust solution could be applid. Instead of using a static address, the reverse tunnel address is now resolved via a `reversetunnel.Resolver`. Anywhere that previoulsy relied on the static proxy address now will fetch the actual reverse tunnel address via the webclient by using the Resolver. In addition this builds on the refactoring done in #4290 to further simplify the reversetunnel package. Since we no longer track multiple proxies, all the left over bits that did so have been removed to accomodate using a dynamic reverse tunnel address. (cherry picked from commit 6cb1371)

bernardjkim · 2022-03-15T16:28:11Z

Hi @rosstimothy do we plan on back porting this into v7?

In certain situations v7 agents will fail to reconnect after a proxy config change. More details here https://github.com/gravitational/cloud/issues/1441#issuecomment-1067391237. If we don't plan on back porting this change into v7 I can create a new issue to patch up the current implementation.

* Dynamically resolve reverse tunnel address The reverse tunnel address is currently a static string that is retrieved from config and passed around for the duration of a services lifetime. When the `tunnel_public_address` is changed on the proxy and the proxy is then restarted, all established reverse tunnels over the old address will fail indefinintely. As a means to get around this, #8102 introduced a mechanism that would cause nodes to restart if their connection to the auth server was down for a period of time. While this did allow the nodes to pickup the new address after the nodes restarted it was meant to be a stop gap until a more robust solution could be applid. Instead of using a static address, the reverse tunnel address is now resolved via a `reversetunnel.Resolver`. Anywhere that previoulsy relied on the static proxy address now will fetch the actual reverse tunnel address via the webclient by using the Resolver. In addition this builds on the refactoring done in #4290 to further simplify the reversetunnel package. Since we no longer track multiple proxies, all the left over bits that did so have been removed to accomodate using a dynamic reverse tunnel address.

* Dynamically resolve reverse tunnel address (#9958) The reverse tunnel address is currently a static string that is retrieved from config and passed around for the duration of a services lifetime. When the `tunnel_public_address` is changed on the proxy and the proxy is then restarted, all established reverse tunnels over the old address will fail indefinitely. As a means to get around this, #8102 introduced a mechanism that would cause nodes to restart if their connection to the auth server was down for a period of time. While this did allow the nodes to pickup the new address after the nodes restarted it was meant to be a stop gap until a more robust solution could be applied. Instead of using a static address, the reverse tunnel address is now resolved via a `reversetunnel.Resolver`. Anywhere that previoulsy relied on the static proxy address now will fetch the actual reverse tunnel address via the webclient by using the Resolver. In addition this builds on the refactoring done in #4290 to further simplify the reversetunnel package. Since we no longer track multiple proxies, all the left over bits that did so have been removed to accommodate using a dynamic reverse tunnel address.

rosstimothy force-pushed the tross/tunnel_resolver branch 2 times, most recently from 59f1017 to fa00cc9 Compare January 26, 2022 19:23

fspmarshall reviewed Jan 26, 2022

View reviewed changes

lib/reversetunnel/agent.go Outdated Show resolved Hide resolved

fspmarshall reviewed Jan 26, 2022

View reviewed changes

lib/reversetunnel/transport.go Show resolved Hide resolved

rosstimothy force-pushed the tross/tunnel_resolver branch 2 times, most recently from fabafda to df3a741 Compare January 27, 2022 14:45

rosstimothy changed the title ~~use tunnel resolver to resolve proxy tunnel instead of static string~~ Dynamically resolve reverse tunnel address Jan 27, 2022

rosstimothy force-pushed the tross/tunnel_resolver branch 2 times, most recently from 204300c to b3625c0 Compare January 27, 2022 21:48

rosstimothy marked this pull request as ready for review February 1, 2022 14:05

rosstimothy requested a review from fspmarshall February 1, 2022 14:06

github-actions bot added the tctl tctl - Teleport admin tool label Feb 1, 2022

github-actions bot requested a review from quinqu February 1, 2022 14:06

rosstimothy added the robustness Resistance to crashes and reliability label Feb 1, 2022

russjones requested review from Joerger and removed request for quinqu February 2, 2022 18:07

Joerger approved these changes Feb 2, 2022

View reviewed changes

rosstimothy added 2 commits February 2, 2022 16:28

address feedback and minor cleanup

6356177

- rename ResolveViwWebClient to WebClientResolver - add singleProcessModeResolver to TeleportProcess - make AgentPool.filterAndClose return nil if there are no matches - rename Pool.groups to Pool.group - remove Lease from TrackExpected

rosstimothy force-pushed the tross/tunnel_resolver branch from b3625c0 to 6356177 Compare February 2, 2022 21:28

fspmarshall approved these changes Feb 2, 2022

View reviewed changes

rosstimothy and others added 2 commits February 2, 2022 18:10

migrate fnCache to utils and add a CachingResolver

5279edf

Merge branch 'master' into tross/tunnel_resolver

93a8d8b

russjones approved these changes Feb 3, 2022

View reviewed changes

rosstimothy enabled auto-merge (squash) February 3, 2022 16:19

rosstimothy merged commit 6cb1371 into master Feb 3, 2022

rosstimothy deleted the tross/tunnel_resolver branch February 3, 2022 16:24

rosstimothy mentioned this pull request Feb 9, 2022

Attempt to deflake TestFnCacheSanity #10250

Merged

webvictim mentioned this pull request Mar 4, 2022

Opened in error #10856

Closed

rosstimothy mentioned this pull request Mar 16, 2022

V7 backport #9958 and #10368 #11201

Merged

Joerger mentioned this pull request May 5, 2022

Node cannot connect over reverse_tunnel port #12438

Closed

espadolini mentioned this pull request Jun 22, 2022

Remove RestartThreshold and related constants #13722

Merged

alexatcanva mentioned this pull request Oct 13, 2022

BUGFIX | Fix Teleport ALPN Proxy not being HTTP CONNECT Proxy Aware alexatcanva/teleport#30

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Dynamically resolve reverse tunnel address #9958

Dynamically resolve reverse tunnel address #9958

rosstimothy commented Jan 26, 2022 •

edited

Loading

Joerger left a comment

Joerger Feb 2, 2022

rosstimothy Feb 2, 2022

fspmarshall Feb 2, 2022

fspmarshall Feb 2, 2022

bernardjkim commented Mar 15, 2022

		// Resolver looks up reverse tunnel addresses
		type Resolver func() (*utils.NetAddr, error)

Dynamically resolve reverse tunnel address #9958

Dynamically resolve reverse tunnel address #9958

Conversation

rosstimothy commented Jan 26, 2022 • edited Loading

Joerger left a comment

Choose a reason for hiding this comment

Joerger Feb 2, 2022

Choose a reason for hiding this comment

rosstimothy Feb 2, 2022

Choose a reason for hiding this comment

fspmarshall Feb 2, 2022

Choose a reason for hiding this comment

fspmarshall Feb 2, 2022

Choose a reason for hiding this comment

bernardjkim commented Mar 15, 2022

rosstimothy commented Jan 26, 2022 •

edited

Loading