Stop loading the entire node set into memory per tsh ssh connection #12014

rosstimothy · 2022-04-15T21:07:08Z

When doing a tsh ssh node the proxySubsys would load the entire
set of nodes into memory to determine which server to route the
request to. Under heavy load, like an automated bot that periodically
spawns numerous session, a proxy could easily consume all
available memory.

To do prevent this, proxies now utilize a NodeWatcher, that maintains
a single node set in memory. This prevents loading the nodes into
memroy more than once, and also eliminates the need to unmarshal the
types.Server on each retrieval of the nodes. The NodeWatcher only
provides a GetNodes function that require a filter function to make
it intentionally challenging to retrieve a copy of the entire node set.

rosstimothy · 2022-04-25T13:00:39Z

friendly ping @smallinsky @ibeckermayer

smallinsky

@rosstimothy
Sorry for the delay. I have left one question about NodeWatcher approach vs access point cache where the NodeType is already watched

teleport/lib/cache/collections.go

Line 691 in 1aa38f4

func (c *node) processEvent(ctx context.Context, event types.Event) error {

lib/reversetunnel/srv.go

lib/services/presence.go

lib/srv/regular/sshserver.go

rosstimothy · 2022-04-25T15:05:00Z

@rosstimothy Sorry for the delay. I have left one question about NodeWatcher approach vs access point cache where the NodeType is already watched

teleport/lib/cache/collections.go

Line 691 in 1aa38f4

func (c *node) processEvent(ctx context.Context, event types.Event) error {

@smallinsky the NodeWatcher is intended to be how anything that needs to retrieve nodes, or a subset of them, would do so. While a cache may already be watching for types.KindNode, and a caller is already able to retrieve nodes from the cache via GetNodes or ListNodes, doing so can be quite expensive on larger clusters. Each time the entire set of nodes is retrieved from the cache each node has to be unmarshalled from the JSON representation that they are stored in. When multiple callers do this at once it causes a spike in memory that is proportional to the number of nodes, which has been observed to cause the proxy to be OOM killed on a cluster that had ~50,000 nodes. This is easily triggered by multiple tsh ssh sessions opened at once, which result in loading the entire set of nodes from the cache per session.

teleport/lib/srv/regular/proxy.go

Line 333 in 1aa38f4

    
           servers, err = t.srv.proxyAccessPoint.GetNodes(ctx.CancelContext(), t.namespace)

By leveraging the NodeWatcher we can be sure that the entire node set is only loaded into memory once, and by storing them as types.Server we don't have to pay the cost to unmarshal from JSON. The method by which a caller retrieves nodes from the NodeWatcher is also meant to ensure that only copies of the matched subset are made.

smallinsky

By leveraging the NodeWatcher we can be sure that the entire node set is only loaded into memory once, and by storing them as types.Server we don't have to pay the cost to unmarshal from JSON. The method by which a caller retrieves nodes from the NodeWatcher is also meant to ensure that only copies of the matched subset are made.

All right, thanks for the explanation.

I was curios what is actually benefit and checked benchmark for old cache approach vs watcher for 40_000 nodes:

BenchmarkListNodesWatcher-8   	      16	  68339640 ns/op	29017941 B/op	   40183 allocs/op
BenchmarkListMaxNodes-8   	           5      229637793 ns/op	80429200 B/op	 1520193 allocs/op

I have left some minor comment otherwise it LGTM.

When doing a `tsh ssh node` the proxySubsys would load the entire set of nodes into memory to determine which server to route the request to. Under heavy load, like an automated bot that periodically spawns numerous session, a proxy could easily consume all available memory. To do prevent this, proxies now utilize a NodeWatcher, that maintains a single node set in memory. This prevents loading the nodes into memroy more than once, and also eliminates the need to unmarshal the types.Server on each retrieval of the nodes. The NodeWatcher only provides a GetNodes function that require a filter function to make it intentionally challenging to retrieve a copy of the entire node set.

ibeckermayer

@rosstimothy apologies, I've been head down and wasn't giving needed attention to my GitHub notifications.

Possibly a naive question and/or the premise may be false, but aren't we using the GetNodes you're replacing in many other parts of the codebase? Why should those not also be replaced with a call to a NodeWatcher to prevent a similar potential problem?

rosstimothy · 2022-04-27T14:30:18Z

@rosstimothy apologies, I've been head down and wasn't giving needed attention to my GitHub notifications.

Possibly a naive question and/or the premise may be false, but aren't we using the GetNodes you're replacing in many other parts of the codebase? Why should those not also be replaced with a call to a NodeWatcher to prevent a similar potential problem?

Good question. Anything service other than auth which could potentially call GetNodes concurrently should definitely make use of the NodeWatcher. Doing a quick check it looks like there may be a couple spots in the web api that should probably be migrated.

ibeckermayer · 2022-04-27T15:57:04Z

Doing a quick check it looks like there may be a couple spots in the web api that should probably be migrated.

Sounds good, your call whether to refactor them here or just create an issue for it.

rosstimothy · 2022-04-28T15:14:06Z

Doing a quick check it looks like there may be a couple spots in the web api that should probably be migrated.

Sounds good, your call whether to refactor them here or just create an issue for it.

Addressed this in 18f9570

…12014) * Prevent proxy from loading entire node set into memory more than once When establishing a new session to a node, the proxy would load the entire node set into memory in an attempt to find the matching host. For smaller clusters this may not be that problematic. But on larger clusters, loading >40k nodes into memory from the cache can be quite expensive. This problem is compounded by the fact that it happened**per** session, which could potentially cause the proxy to consume all available memory and be OOM killed. A new `NodeWatcher` is introduced which will maintain an in memory list of all nodes per process. The watcher leverages the existing resource watcher system and stores all nodes as types.Server, to eliminate the cost incurred by unmarshalling the nodes from the cache. The `NodeWatcher` provides a way to retrieve a filtered list of nodes in order to reduce the number of copies made to only the matches. (cherry picked from commit fa12352) # Conflicts: # lib/reversetunnel/api.go # lib/service/service.go # lib/services/watcher.go # lib/srv/regular/proxy.go # lib/srv/regular/sshserver.go # lib/web/apiserver.go # lib/web/apiserver_test.go # lib/web/terminal.go # tool/tsh/proxy_test.go # tool/tsh/tsh.go

…12014) * Prevent proxy from loading entire node set into memory more than once When establishing a new session to a node, the proxy would load the entire node set into memory in an attempt to find the matching host. For smaller clusters this may not be that problematic. But on larger clusters, loading >40k nodes into memory from the cache can be quite expensive. This problem is compounded by the fact that it happened**per** session, which could potentially cause the proxy to consume all available memory and be OOM killed. A new `NodeWatcher` is introduced which will maintain an in memory list of all nodes per process. The watcher leverages the existing resource watcher system and stores all nodes as types.Server, to eliminate the cost incurred by unmarshalling the nodes from the cache. The `NodeWatcher` provides a way to retrieve a filtered list of nodes in order to reduce the number of copies made to only the matches. (cherry picked from commit fa12352) # Conflicts: # lib/services/watcher.go # lib/web/apiserver_test.go

rosstimothy · 2022-05-11T13:57:59Z

💚 All backports created successfully

Status	Branch	Result
✅	branch/v7
✅	branch/v8
✅	branch/v9

Questions ?

Please refer to the Backport tool documentation

…12014) * Prevent proxy from loading entire node set into memory more than once When establishing a new session to a node, the proxy would load the entire node set into memory in an attempt to find the matching host. For smaller clusters this may not be that problematic. But on larger clusters, loading >40k nodes into memory from the cache can be quite expensive. This problem is compounded by the fact that it happened**per** session, which could potentially cause the proxy to consume all available memory and be OOM killed. A new `NodeWatcher` is introduced which will maintain an in memory list of all nodes per process. The watcher leverages the existing resource watcher system and stores all nodes as types.Server, to eliminate the cost incurred by unmarshalling the nodes from the cache. The `NodeWatcher` provides a way to retrieve a filtered list of nodes in order to reduce the number of copies made to only the matches. (cherry picked from commit fa12352)

…12014) (#12571) * Prevent proxy from loading entire node set into memory more than once When establishing a new session to a node, the proxy would load the entire node set into memory in an attempt to find the matching host. For smaller clusters this may not be that problematic. But on larger clusters, loading >40k nodes into memory from the cache can be quite expensive. This problem is compounded by the fact that it happened**per** session, which could potentially cause the proxy to consume all available memory and be OOM killed. A new `NodeWatcher` is introduced which will maintain an in memory list of all nodes per process. The watcher leverages the existing resource watcher system and stores all nodes as types.Server, to eliminate the cost incurred by unmarshalling the nodes from the cache. The `NodeWatcher` provides a way to retrieve a filtered list of nodes in order to reduce the number of copies made to only the matches. (cherry picked from commit fa12352)

…12014) * Prevent proxy from loading entire node set into memory more than once When establishing a new session to a node, the proxy would load the entire node set into memory in an attempt to find the matching host. For smaller clusters this may not be that problematic. But on larger clusters, loading >40k nodes into memory from the cache can be quite expensive. This problem is compounded by the fact that it happened**per** session, which could potentially cause the proxy to consume all available memory and be OOM killed. A new `NodeWatcher` is introduced which will maintain an in memory list of all nodes per process. The watcher leverages the existing resource watcher system and stores all nodes as types.Server, to eliminate the cost incurred by unmarshalling the nodes from the cache. The `NodeWatcher` provides a way to retrieve a filtered list of nodes in order to reduce the number of copies made to only the matches. (cherry picked from commit fa12352)

…12014) (#12562) * Prevent proxy from loading entire node set into memory more than once When establishing a new session to a node, the proxy would load the entire node set into memory in an attempt to find the matching host. For smaller clusters this may not be that problematic. But on larger clusters, loading >40k nodes into memory from the cache can be quite expensive. This problem is compounded by the fact that it happened**per** session, which could potentially cause the proxy to consume all available memory and be OOM killed. A new `NodeWatcher` is introduced which will maintain an in memory list of all nodes per process. The watcher leverages the existing resource watcher system and stores all nodes as types.Server, to eliminate the cost incurred by unmarshalling the nodes from the cache. The `NodeWatcher` provides a way to retrieve a filtered list of nodes in order to reduce the number of copies made to only the matches. (cherry picked from commit fa12352)

…12014) (#12573) * Prevent proxy from loading entire node set into memory more than once When establishing a new session to a node, the proxy would load the entire node set into memory in an attempt to find the matching host. For smaller clusters this may not be that problematic. But on larger clusters, loading >40k nodes into memory from the cache can be quite expensive. This problem is compounded by the fact that it happened**per** session, which could potentially cause the proxy to consume all available memory and be OOM killed. A new `NodeWatcher` is introduced which will maintain an in memory list of all nodes per process. The watcher leverages the existing resource watcher system and stores all nodes as types.Server, to eliminate the cost incurred by unmarshalling the nodes from the cache. The `NodeWatcher` provides a way to retrieve a filtered list of nodes in order to reduce the number of copies made to only the matches. (cherry picked from commit fa12352)

rosstimothy force-pushed the tross/node_watcher branch 7 times, most recently from 7f09c3f to 95524ce Compare April 19, 2022 16:25

rosstimothy marked this pull request as ready for review April 19, 2022 17:28

github-actions bot added the tsh tsh - Teleport's command line tool for logging into nodes running Teleport. label Apr 19, 2022

github-actions bot requested review from ibeckermayer and smallinsky April 19, 2022 17:29

rosstimothy added backport-required robustness Resistance to crashes and reliability labels Apr 19, 2022

smallinsky reviewed Apr 25, 2022

View reviewed changes

lib/reversetunnel/srv.go Show resolved Hide resolved

lib/services/presence.go Outdated Show resolved Hide resolved

lib/srv/regular/sshserver.go Outdated Show resolved Hide resolved

smallinsky approved these changes Apr 27, 2022

View reviewed changes

rosstimothy added 2 commits April 27, 2022 08:28

address feedback

ff8723d

rosstimothy force-pushed the tross/node_watcher branch from 038b4c2 to ff8723d Compare April 27, 2022 12:28

ibeckermayer approved these changes Apr 27, 2022

View reviewed changes

replace cache.GetNodes with NodeWatcher in lib/web

18f9570

rosstimothy added 5 commits April 28, 2022 16:00

Merge branch 'master' into tross/node_watcher

59303c3

Merge branch 'master' into tross/node_watcher

c7a4423

Merge branch 'master' into tross/node_watcher

26da16d

Merge branch 'master' into tross/node_watcher

47210e4

Merge branch 'master' into tross/node_watcher

b7e2d7c

rosstimothy added 2 commits May 2, 2022 14:36

Merge branch 'master' into tross/node_watcher

ed4b5a2

Merge branch 'master' into tross/node_watcher

74bcd3e

rosstimothy merged commit fa12352 into master May 2, 2022

rosstimothy mentioned this pull request May 10, 2022

[branch/v7] Stop loading the enitre node set into memory per tsh ssh connection (#12014) #12562

Merged

rosstimothy mentioned this pull request May 11, 2022

[branch/v8] Stop loading the enitre node set into memory per tsh ssh connection (#12014) #12571

Merged

rosstimothy mentioned this pull request May 11, 2022

[branch/v9] Stop loading the enitre node set into memory per tsh ssh connection (#12014) #12573

Merged

webvictim mentioned this pull request Jun 8, 2022

Opened in error #13285

Closed

rosstimothy deleted the tross/node_watcher branch July 6, 2022 21:21

This was referenced Oct 10, 2022

Cannot SSH to nodes connected to a v10 leaf cluster from a v11 root cluster #17211

Closed

Cannot SSH to nodes connected to a v10.2.2 leaf cluster from a v10.2.3 root cluster #17219

Closed

rosstimothy changed the title ~~Stop loading the enitre node set into memory per tsh ssh connection~~ Stop loading the entire node set into memory per tsh ssh connection Oct 10, 2022

alexatcanva mentioned this pull request Oct 13, 2022

BUGFIX | Fix Teleport ALPN Proxy not being HTTP CONNECT Proxy Aware alexatcanva/teleport#30

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Stop loading the entire node set into memory per tsh ssh connection #12014

Stop loading the entire node set into memory per tsh ssh connection #12014

rosstimothy commented Apr 15, 2022

rosstimothy commented Apr 25, 2022

smallinsky left a comment

rosstimothy commented Apr 25, 2022

smallinsky left a comment

ibeckermayer left a comment

rosstimothy commented Apr 27, 2022

ibeckermayer commented Apr 27, 2022

rosstimothy commented Apr 28, 2022

rosstimothy commented May 11, 2022

Stop loading the entire node set into memory per tsh ssh connection #12014

Stop loading the entire node set into memory per tsh ssh connection #12014

Conversation

rosstimothy commented Apr 15, 2022

rosstimothy commented Apr 25, 2022

smallinsky left a comment

Choose a reason for hiding this comment

rosstimothy commented Apr 25, 2022

smallinsky left a comment

Choose a reason for hiding this comment

ibeckermayer left a comment

Choose a reason for hiding this comment

rosstimothy commented Apr 27, 2022

ibeckermayer commented Apr 27, 2022

rosstimothy commented Apr 28, 2022

rosstimothy commented May 11, 2022

💚 All backports created successfully

Questions ?