adding CLI for MaxConnectionIdleTime #607

sarabrajsingh · 2022-05-03T21:17:20Z

upstream issue - https://github.com/ansible/tower/issues/5777

adding ability to provide a MaxIdleConnectionTimeout parameter in a configuration file to keep backend below-the-mesh TCP connections alive.

to-do:

unit tests

fosterseth · 2022-05-03T21:31:26Z

pkg/netceptor/netceptor.go

+		return fmt.Errorf("user defined maxIdleConnectionTimeout [%d] is less than the default default timeout [%d]", duration, defaultMaxConnectionIdleTime)
+	}
+	// we also need to ensure that if defined, a user defined connection timeout is at least 2x longer than the defaultRouteUpdateTime value
+	if duration < (2*defaultRouteUpdateTime + 1*time.Second) {


I think this check is unnecessary, given that defaultMaxConnectionIdleTime is defined as 2x defaultRouteUpdateTime + 1 second, and we ensure that duration < defaultMaxConnectionIdleTime in the previous check.

fosterseth · 2022-05-03T21:34:44Z

cmd/receptor-cl/receptor.go

+	ID                       string                       `description:"Node ID. Defaults to local hostname." barevalue:"yes"`
+	DataDir                  string                       `description:"Directory in which to store node data"`
+	FirewallRules            []netceptor.FirewallRuleData `description:"Firewall Rules (see documentation for syntax)"`
+	MaxIdleConnectionTimeout string                       `description:"User defined maximum time a backend connection can go without data before we consider it failed." default:"21s"`


I'd prefer if we don't set a default here. if not defined in the config it will just be an empty string. That way we still only have one place where we define the default, the constant at the top of the netceptor.go file

if we remove this default, SetMaxConnectionIdleTime should return without error if given an empty string

@fosterseth - ok good suggestion. not including a default here will resolve the previous code comment you made :)

fosterseth · 2022-05-04T16:43:32Z

cmd/receptor-cl/receptor.go

+	ID                       string                       `description:"Node ID. Defaults to local hostname." barevalue:"yes"`
+	DataDir                  string                       `description:"Directory in which to store node data"`
+	FirewallRules            []netceptor.FirewallRuleData `description:"Firewall Rules (see documentation for syntax)"`
+	MaxIdleConnectionTimeout string                       `description:"User defined maximum time a backend connection can go without data before we consider it failed."`


We may change this to,

"Max duration with no traffic before a backend connection is timed out and refreshed."

To emphasize that the backend will timeout, but then refresh automatically.

fosterseth · 2022-05-05T15:45:00Z

tests/functional/mesh/netceptor_test.go

+	"github.com/ansible/receptor/tests/functional/lib/mesh"
+)
+
+func TestSetMaxConnectionIdleTimeFromPseudoConfigFile(t *testing.T) {


are these tests ensuring that the mesh fails to start because of a bad maxidleconnectiontimeout setting?

I'd be content with just having the 3 single node tests that you have above, since they basically test the same thing but are less involved.

yes, thats exactly what they're doing (bad config file). we didn't have any functional tests around passing bad config files, so i thought i'd create some. we can remove these, i understand that there are already failsafes for bad configuration keys in the config file within receptor (config file checks done at bootstrap-time).

fosterseth · 2022-05-05T17:57:40Z

pulled down and tested

WARNING 2022/05/05 13:54:08 Timing out connection, idle for the past 40s

Something to keep in mind, this setting is per-node, so if nodeA has a timeout set to 40s, and nodeB has the default 21s, then a backend connection between them will timeout in 21s (whichever is less).

@sarabrajsingh this would be a good candidate setting to have a blurb in the docs. I think a dedicated page for it would be fine, like we did for firewalls.rst

sarabrajsingh · 2022-05-06T19:35:26Z

@fosterseth i added documentation around the maxidleconnectiontimeout feature. please check out the new docs and let me know what you think :)

docs/source/edge_networks.rst

fosterseth · 2022-05-09T18:38:29Z

docs/source/edge_networks.rst

+
+Receptor encapsulates the concepts of `below-the-mesh` and `above-the-mesh` connections. Please refer to :doc:`tls` for a better understanding of these networking layers.
+
+If a particular node in a network has higher than normal latency, we allow the users to define a finely-grained idle connection timeout value for any given Receptor node. This will help Receptor keep `below-the-mesh` tcp connections alive. Receptor will attempt to reconnect to a timed-out node, to a maximum retry limit and if this retry limit is surpassed, Receptor will drop the connection from the routing table.


"to a maximum retry limit and if this retry limit is surpassed"

it's not so much retry logic and limits, rather just monitoring traffic that flows over a backend connection.

If the last statement in this paragraph should be something like,

Receptor will monitor backend connections for traffic and will timeout any connection that hasn't seen traffic for a period of time. Once the connection is dropped, a new connection is formed automatically.

verbatim copy/paste :) thanks for the suggesstion

fosterseth · 2022-05-09T18:38:42Z

docs/source/edge_networks.rst

+
+Consider the following environment:
+
+.. image:: edge.png


awesome diagram and docs around this!

fosterseth

looks great, ready to merge when rebase + squash to get the lint changes

Restrict TLS 1.2 cipher suite to strong ciphers

…s for SetMaxConnectionIdleTime

sarabrajsingh requested a review from fosterseth May 3, 2022 21:17

fosterseth reviewed May 3, 2022

View reviewed changes

sarabrajsingh requested a review from fosterseth May 4, 2022 05:28

sarabrajsingh force-pushed the feature/configurable-tcp-timeouts branch 4 times, most recently from 07c044b to 0179a59 Compare May 4, 2022 14:30

fosterseth reviewed May 4, 2022

View reviewed changes

sarabrajsingh changed the title ~~[WIP] - adding CLI for MaxConnectionIdleTime~~ adding CLI for MaxConnectionIdleTime May 4, 2022

sarabrajsingh requested a review from fosterseth May 5, 2022 00:07

fosterseth reviewed May 5, 2022

View reviewed changes

sarabrajsingh requested a review from fosterseth May 5, 2022 16:40

fosterseth approved these changes May 5, 2022

View reviewed changes

sarabrajsingh requested a review from fosterseth May 6, 2022 19:35

fosterseth reviewed May 9, 2022

View reviewed changes

docs/source/edge_networks.rst Show resolved Hide resolved

fosterseth reviewed May 9, 2022

View reviewed changes

docs/source/edge_networks.rst

Consider the following environment:

.. image:: edge.png

Copy link

Member

fosterseth May 9, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

awesome diagram and docs around this!

fosterseth approved these changes May 10, 2022

View reviewed changes

ghjm and others added 5 commits May 10, 2022 14:40

Avoid dropping buffered results data

630e99e

Allow setting minimum TLS to 1.3

634f097

Restrict TLS 1.2 cipher suite to strong ciphers

fixing linting issues introduced by golangci-lint 1.46.0

fbf70e5

reformatted t.Fatalf() calls

853b421

adding CLI for MaxConnectionIdleTime; adding unit and functional test…

907da06

…s for SetMaxConnectionIdleTime

sarabrajsingh force-pushed the feature/configurable-tcp-timeouts branch from f9468eb to 907da06 Compare May 10, 2022 18:40

sarabrajsingh merged commit c44b81a into ansible:devel May 10, 2022

sarabrajsingh mentioned this pull request May 10, 2022

revert "adding CLI for MaxConnectionIdleTime" #614

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

adding CLI for MaxConnectionIdleTime #607

adding CLI for MaxConnectionIdleTime #607

sarabrajsingh commented May 3, 2022 •

edited

Loading

fosterseth May 3, 2022

fosterseth May 3, 2022 •

edited

Loading

fosterseth May 3, 2022 •

edited

Loading

sarabrajsingh May 4, 2022

fosterseth May 4, 2022

sarabrajsingh May 5, 2022

fosterseth May 5, 2022

sarabrajsingh May 5, 2022 •

edited

Loading

fosterseth commented May 5, 2022

sarabrajsingh commented May 6, 2022

fosterseth May 9, 2022

sarabrajsingh May 9, 2022

fosterseth May 9, 2022

fosterseth left a comment


		Receptor encapsulates the concepts of `below-the-mesh` and `above-the-mesh` connections. Please refer to :doc:`tls` for a better understanding of these networking layers.

		If a particular node in a network has higher than normal latency, we allow the users to define a finely-grained idle connection timeout value for any given Receptor node. This will help Receptor keep `below-the-mesh` tcp connections alive. Receptor will attempt to reconnect to a timed-out node, to a maximum retry limit and if this retry limit is surpassed, Receptor will drop the connection from the routing table.

adding CLI for MaxConnectionIdleTime #607

adding CLI for MaxConnectionIdleTime #607

Conversation

sarabrajsingh commented May 3, 2022 • edited Loading

Choose a reason for hiding this comment

fosterseth May 3, 2022 • edited Loading

Choose a reason for hiding this comment

fosterseth May 3, 2022 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

sarabrajsingh May 5, 2022 • edited Loading

Choose a reason for hiding this comment

fosterseth commented May 5, 2022

sarabrajsingh commented May 6, 2022

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

fosterseth left a comment

Choose a reason for hiding this comment

sarabrajsingh commented May 3, 2022 •

edited

Loading

fosterseth May 3, 2022 •

edited

Loading

fosterseth May 3, 2022 •

edited

Loading

sarabrajsingh May 5, 2022 •

edited

Loading