-
Notifications
You must be signed in to change notification settings - Fork 1.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Dynamically resolve reverse tunnel address #9958
Conversation
59f1017
to
fa00cc9
Compare
fabafda
to
df3a741
Compare
204300c
to
b3625c0
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, just some minor comments
t.wp.Set(addr, uint64(count)) | ||
if t.sets.expire(cutoff) > 0 { | ||
count := len(t.sets.proxies) | ||
if count < 1 { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why do we set count to 1 here? Is it obvious and I'm missing it, or could we add a comment?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My educated guess is that setting the workpool target to anything < 1
would cause it to close the underlying workgroup and thus require a new one to be created in the future when the target becomes >= 1
. By instead setting the workpool target to 1
it resets the workgroup without deleting it.
@fspmarshall it looks like it has been like this since you added the tracker
can you please correct me if I am wrong here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The count is the number of proxies we expect to discover, based on the heartbeats we've seen. We don't start off knowing about any proxies since it is the first proxy that we connect to that tells us about its peers. Therefore if we don't know about any proxies yet, we just try to find at least one, and then wait for it to tell us the real number.
The reverse tunnel address is currently a static string that is retrieved from config and passed around for the duration of a services lifetime. When the `tunnel_public_address` is changed on the proxy and the proxy is then restarted, all established reverse tunnels over the old address will fail indefinintely. As a means to get around this, #8102 introduced a mechanism that would cause nodes to restart if their connection to the auth server was down for a period of time. While this did allow the nodes to pickup the new address after the nodes restarted it was meant to be a stop gap until a more robust solution could be applid. Instead of using a static address, the reverse tunnel address is now resolved via a `reversetunnel.Resolver`. Anywhere that previoulsy relied on the static proxy address now will fetch the actual reverse tunnel address via the webclient by using the Resolver. In addition this builds on the refactoring done in #4290 to further simplify the reversetunnel package. Since we no longer track multiple proxies, all the left over bits that did so have been removed to accomodate using a dynamic reverse tunnel address.
- rename ResolveViwWebClient to WebClientResolver - add singleProcessModeResolver to TeleportProcess - make AgentPool.filterAndClose return nil if there are no matches - rename Pool.groups to Pool.group - remove Lease from TrackExpected
b3625c0
to
6356177
Compare
// Resolver looks up reverse tunnel addresses | ||
type Resolver func() (*utils.NetAddr, error) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm a bit concerned about the increased network load if proxies are restarted in very large clusters. Might create another thundering herd problem if we reload the address on every call to the resolver. I'm thinking it might be good if the resolver had some basic ttl-based caching capabilities, so that if it were called many times over a small period (say 3-5 seconds), only one network call is actually made.
We already use lib/cache/fncache.go to do the same thing to reduce thundering herd issues when the cache is unhealthy. That helper doesn't rely on anything in lib/cache
, so we could easily move it somewhere under utils
, and use it to provide similar functionality here. Ex:
func CachingResolver(resolver Resolver) Resolver {
cache := utils.NewFnCache(3 * time.Second)
return func() (*utils.NetAddr, error) {
a, err := cache.Get(context.TODO(), "resolver", func() (interface{}, error) {
addr, err := resolver()
return addr, err
})
if err != nil {
return nil, err
}
return a.(*utils.NetAddr), nil
}
}
* Dynamically resolve reverse tunnel address The reverse tunnel address is currently a static string that is retrieved from config and passed around for the duration of a services lifetime. When the `tunnel_public_address` is changed on the proxy and the proxy is then restarted, all established reverse tunnels over the old address will fail indefinintely. As a means to get around this, #8102 introduced a mechanism that would cause nodes to restart if their connection to the auth server was down for a period of time. While this did allow the nodes to pickup the new address after the nodes restarted it was meant to be a stop gap until a more robust solution could be applid. Instead of using a static address, the reverse tunnel address is now resolved via a `reversetunnel.Resolver`. Anywhere that previoulsy relied on the static proxy address now will fetch the actual reverse tunnel address via the webclient by using the Resolver. In addition this builds on the refactoring done in #4290 to further simplify the reversetunnel package. Since we no longer track multiple proxies, all the left over bits that did so have been removed to accomodate using a dynamic reverse tunnel address. (cherry picked from commit 6cb1371)
Hi @rosstimothy do we plan on back porting this into v7? In certain situations v7 agents will fail to reconnect after a proxy config change. More details here https://github.com/gravitational/cloud/issues/1441#issuecomment-1067391237. If we don't plan on back porting this change into v7 I can create a new issue to patch up the current implementation. |
* Dynamically resolve reverse tunnel address The reverse tunnel address is currently a static string that is retrieved from config and passed around for the duration of a services lifetime. When the `tunnel_public_address` is changed on the proxy and the proxy is then restarted, all established reverse tunnels over the old address will fail indefinintely. As a means to get around this, #8102 introduced a mechanism that would cause nodes to restart if their connection to the auth server was down for a period of time. While this did allow the nodes to pickup the new address after the nodes restarted it was meant to be a stop gap until a more robust solution could be applid. Instead of using a static address, the reverse tunnel address is now resolved via a `reversetunnel.Resolver`. Anywhere that previoulsy relied on the static proxy address now will fetch the actual reverse tunnel address via the webclient by using the Resolver. In addition this builds on the refactoring done in #4290 to further simplify the reversetunnel package. Since we no longer track multiple proxies, all the left over bits that did so have been removed to accomodate using a dynamic reverse tunnel address.
* Dynamically resolve reverse tunnel address (#9958) The reverse tunnel address is currently a static string that is retrieved from config and passed around for the duration of a services lifetime. When the `tunnel_public_address` is changed on the proxy and the proxy is then restarted, all established reverse tunnels over the old address will fail indefinitely. As a means to get around this, #8102 introduced a mechanism that would cause nodes to restart if their connection to the auth server was down for a period of time. While this did allow the nodes to pickup the new address after the nodes restarted it was meant to be a stop gap until a more robust solution could be applied. Instead of using a static address, the reverse tunnel address is now resolved via a `reversetunnel.Resolver`. Anywhere that previoulsy relied on the static proxy address now will fetch the actual reverse tunnel address via the webclient by using the Resolver. In addition this builds on the refactoring done in #4290 to further simplify the reversetunnel package. Since we no longer track multiple proxies, all the left over bits that did so have been removed to accommodate using a dynamic reverse tunnel address.
The reverse tunnel address is currently a static string that is
retrieved from config and passed around for the duration of a
services lifetime. When the
tunnel_public_address
is changedon the proxy and the proxy is then restarted, all established
reverse tunnels over the old address will fail indefinintely.
As a means to get around this, #8102 introduced a mechanism
that would cause nodes to restart if their connection to the
auth server was down for a period of time. While this did
allow the nodes to pickup the new address after the nodes
restarted it was meant to be a stop gap until a more robust
solution could be applid.
Instead of using a static address, the reverse tunnel address
is now resolved via a
reversetunnel.Resolver
. Anywhere thatprevioulsy relied on the static proxy address now will fetch
the actual reverse tunnel address via the webclient by using
the Resolver. In addition this builds on the refactoring done
in #4290 to further simplify the reversetunnel package. Since
we no longer track multiple proxies, all the left over bits
that did so have been removed to accomodate using a dynamic
reverse tunnel address.