Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

adding CLI for MaxConnectionIdleTime #607

Merged

Conversation

sarabrajsingh
Copy link
Contributor

@sarabrajsingh sarabrajsingh commented May 3, 2022

upstream issue - https://github.com/ansible/tower/issues/5777

adding ability to provide a MaxIdleConnectionTimeout parameter in a configuration file to keep backend below-the-mesh TCP connections alive.

to-do:

  • unit tests

return fmt.Errorf("user defined maxIdleConnectionTimeout [%d] is less than the default default timeout [%d]", duration, defaultMaxConnectionIdleTime)
}
// we also need to ensure that if defined, a user defined connection timeout is at least 2x longer than the defaultRouteUpdateTime value
if duration < (2*defaultRouteUpdateTime + 1*time.Second) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this check is unnecessary, given that defaultMaxConnectionIdleTime is defined as 2x defaultRouteUpdateTime + 1 second, and we ensure that duration < defaultMaxConnectionIdleTime in the previous check.

ID string `description:"Node ID. Defaults to local hostname." barevalue:"yes"`
DataDir string `description:"Directory in which to store node data"`
FirewallRules []netceptor.FirewallRuleData `description:"Firewall Rules (see documentation for syntax)"`
MaxIdleConnectionTimeout string `description:"User defined maximum time a backend connection can go without data before we consider it failed." default:"21s"`
Copy link
Member

@fosterseth fosterseth May 3, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd prefer if we don't set a default here. if not defined in the config it will just be an empty string. That way we still only have one place where we define the default, the constant at the top of the netceptor.go file

Copy link
Member

@fosterseth fosterseth May 3, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if we remove this default, SetMaxConnectionIdleTime should return without error if given an empty string

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@fosterseth - ok good suggestion. not including a default here will resolve the previous code comment you made :)

@sarabrajsingh sarabrajsingh force-pushed the feature/configurable-tcp-timeouts branch 4 times, most recently from 07c044b to 0179a59 Compare May 4, 2022 14:30
ID string `description:"Node ID. Defaults to local hostname." barevalue:"yes"`
DataDir string `description:"Directory in which to store node data"`
FirewallRules []netceptor.FirewallRuleData `description:"Firewall Rules (see documentation for syntax)"`
MaxIdleConnectionTimeout string `description:"User defined maximum time a backend connection can go without data before we consider it failed."`
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We may change this to,

"Max duration with no traffic before a backend connection is timed out and refreshed."

To emphasize that the backend will timeout, but then refresh automatically.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done :-)

@sarabrajsingh sarabrajsingh changed the title [WIP] - adding CLI for MaxConnectionIdleTime adding CLI for MaxConnectionIdleTime May 4, 2022
"github.com/ansible/receptor/tests/functional/lib/mesh"
)

func TestSetMaxConnectionIdleTimeFromPseudoConfigFile(t *testing.T) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

are these tests ensuring that the mesh fails to start because of a bad maxidleconnectiontimeout setting?

I'd be content with just having the 3 single node tests that you have above, since they basically test the same thing but are less involved.

Copy link
Contributor Author

@sarabrajsingh sarabrajsingh May 5, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, thats exactly what they're doing (bad config file). we didn't have any functional tests around passing bad config files, so i thought i'd create some. we can remove these, i understand that there are already failsafes for bad configuration keys in the config file within receptor (config file checks done at bootstrap-time).

@fosterseth
Copy link
Member

pulled down and tested

WARNING 2022/05/05 13:54:08 Timing out connection, idle for the past 40s

Something to keep in mind, this setting is per-node, so if nodeA has a timeout set to 40s, and nodeB has the default 21s, then a backend connection between them will timeout in 21s (whichever is less).

@sarabrajsingh this would be a good candidate setting to have a blurb in the docs. I think a dedicated page for it would be fine, like we did for firewalls.rst

@sarabrajsingh
Copy link
Contributor Author

@fosterseth i added documentation around the maxidleconnectiontimeout feature. please check out the new docs and let me know what you think :)


Receptor encapsulates the concepts of `below-the-mesh` and `above-the-mesh` connections. Please refer to :doc:`tls` for a better understanding of these networking layers.

If a particular node in a network has higher than normal latency, we allow the users to define a finely-grained idle connection timeout value for any given Receptor node. This will help Receptor keep `below-the-mesh` tcp connections alive. Receptor will attempt to reconnect to a timed-out node, to a maximum retry limit and if this retry limit is surpassed, Receptor will drop the connection from the routing table.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"to a maximum retry limit and if this retry limit is surpassed"

it's not so much retry logic and limits, rather just monitoring traffic that flows over a backend connection.

If the last statement in this paragraph should be something like,

Receptor will monitor backend connections for traffic and will timeout any connection that hasn't seen traffic for a period of time. Once the connection is dropped, a new connection is formed automatically. 

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

verbatim copy/paste :) thanks for the suggesstion


Consider the following environment:

.. image:: edge.png
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

awesome diagram and docs around this!

Copy link
Member

@fosterseth fosterseth left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looks great, ready to merge when rebase + squash to get the lint changes

@sarabrajsingh sarabrajsingh force-pushed the feature/configurable-tcp-timeouts branch from f9468eb to 907da06 Compare May 10, 2022 18:40
@sarabrajsingh sarabrajsingh merged commit c44b81a into ansible:devel May 10, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants