Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fixes #1145: Implement idle connection timeout for tcpListener #1148

Closed
wants to merge 1 commit into from

Conversation

kgiusti
Copy link
Contributor

@kgiusti kgiusti commented Jun 30, 2023

No description provided.

@kgiusti kgiusti requested review from ted-ross and ganeshmurthy June 30, 2023 13:24
@codecov
Copy link

codecov bot commented Jun 30, 2023

Codecov Report

Merging #1148 (216f557) into main (7895c27) will increase coverage by 0.03%.
The diff coverage is 86.07%.

❗ Current head 216f557 differs from pull request most recent head cfb59de. Consider uploading reports for the commit cfb59de to get more accurate results

Additional details and impacted files
@@            Coverage Diff             @@
##             main    #1148      +/-   ##
==========================================
+ Coverage   77.89%   77.92%   +0.03%     
==========================================
  Files         238      239       +1     
  Lines       60659    60741      +82     
  Branches     5576     5582       +6     
==========================================
+ Hits        47248    47333      +85     
  Misses      10779    10779              
+ Partials     2632     2629       -3     
Flag Coverage Δ
pysystemtests 87.45% <91.66%> (+0.02%) ⬆️
pyunittests 54.51% <ø> (ø)
systemtests 71.59% <80.00%> (+<0.01%) ⬆️
unittests 27.20% <10.00%> (-0.02%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

Components Coverage Δ
calculator 30.80% <10.00%> (-0.02%) ⬇️
systemtests 78.51% <87.17%> (+0.01%) ⬆️

@ganeshmurthy
Copy link
Contributor

Can we please change the commit message to "Fixes #1134:......"
That way we will have link to the issue right on the commit which will help us navigate to the issue quickly and merging this PR will close the issue automatically

@@ -1254,6 +1254,13 @@
"type": ["up", "down"],
"description": "The operational status of TCP socket listener: up - the service is active and incoming connections are permitted; down - the service is not active and incoming connection attempts will be refused.",
"create": false
},
"idleTimeoutSeconds": {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We used to have an attribute called dataConnectionCount which used to be in the connector entity and @grs requested that it be moved to the router entity which we did. @grs, I just wanted to get your thoughts on whether this idleTimeoutSeconds should be in the router entity as well ?

Copy link
Contributor Author

@kgiusti kgiusti Jun 30, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

From a code standpoint it's easier to have it in the listener. Having it in the listener also allows us to disable/adjust it per-service type. More flexibility. You'll also notice that the AMQP listener and connector management object have the same 'idleTimeoutSeconds' attribute, so adding it to the tcpListener is consistent with that.

I don't see a compelling advantage to having a single global value for all services. Is there a use-case where it makes sense?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I did not realize that we are only terminating idle connections on the client side but not on the server side. Terminating the client side connections will automatically terminate the server side connections as well, so we should be ok. I cannot think of a case where idle server side connections will need to be terminated.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The counter to that is whether we really envisage users wanting to set different values for this? What will the default be? Could there be a way to configure the default (i.e. a way to change it for all listeners in one place instead of changing the configuration of each individual listener)?

What advice would we give users about when to use this option and what to set it to? And can we make it so that in our opinion the majority of users can ignore it?

@kgiusti kgiusti linked an issue Jun 30, 2023 that may be closed by this pull request
@kgiusti
Copy link
Contributor Author

kgiusti commented Jun 30, 2023

Can we please change the commit message to "Fixes #1134:......" That way we will have link to the issue right on the commit which will help us navigate to the issue quickly and merging this PR will close the issue automatically

I can update the git commit message, but I've also set the issue in the github Development attribute associated with this pull request which will automagically close the related issue. See left sidebar.

@kgiusti kgiusti requested review from grs, ajssmith and nluaces June 30, 2023 14:03
@ganeshmurthy
Copy link
Contributor

Can we please change the commit message to "Fixes #1134:......" That way we will have link to the issue right on the commit which will help us navigate to the issue quickly and merging this PR will close the issue automatically

I can update the git commit message, but I've also set the issue in the github Development attribute associated with this pull request which will automagically close the related issue. See left sidebar.

Yes, update the commit message please. The commit message, when referencing an issue, would be nice if we start the commit message with "Fixes #yyy: ...." as we agreed to do here - https://github.com/skupperproject/skupper-router/blob/main/CONTRIBUTING.adoc

@@ -1089,6 +1115,21 @@ static void handle_connection_event(pn_event_t *e, qd_server_t *qd_server, void
case PN_RAW_CONNECTION_WAKE: {
qd_log(LOG_TCP_ADAPTOR, QD_LOG_DEBUG, "[C%" PRIu64 "] PN_RAW_CONNECTION_WAKE %s", conn->conn_id,
qdr_tcp_connection_role_name(conn));
if (CLEAR_ATOMIC_FLAG(&conn->check_idle_conn)) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we do this bit of code only if conn->ingress is true ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good idea! How about we check for existence of the idle_timer? The patch already does that elsewhere since the idle_timer is only allocated if idleTimeoutSeconds != 0.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cool yo

@@ -1209,9 +1250,14 @@ static qdr_tcp_connection_t *qdr_tcp_connection(qd_tcp_listener_t *listener, qd_
assert(tcp_stats);
assert(server);

if (tc->config->adaptor_config->idle_timeout) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Move this into the above if (listener) so we can have idle timers only for connections on the listener side ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We may get a request to add timeout support for the connector side as well. Still not carved in stone but if so I'd rather not conventionalize the code on the type of configuration. We'll just have to revert that change if so.

@kgiusti kgiusti changed the title ISSUE-1134: Implement idle connection timeout for tcpListener Fixes #1134: Implement idle connection timeout for tcpListener Jun 30, 2023
@kgiusti
Copy link
Contributor Author

kgiusti commented Jun 30, 2023

Can we please change the commit message to "Fixes #1134:......" That way we will have link to the issue right on the commit which will help us navigate to the issue quickly and merging this PR will close the issue automatically

I can update the git commit message, but I've also set the issue in the github Development attribute associated with this pull request which will automagically close the related issue. See left sidebar.

Yes, update the commit message please. The commit message, when referencing an issue, would be nice if we start the commit message with "Fixes #yyy: ...." as we agreed to do here - https://github.com/skupperproject/skupper-router/blob/main/CONTRIBUTING.adoc

Hmmm... ok so I changed the commit message as described (needed to squash/rebase) and I've updated the name of this PR, but now the link to the actual issue in the Development field has gone away. Is this the correct message syntax?

@ganeshmurthy
Copy link
Contributor

Can we please change the commit message to "Fixes #1134:......"

I put in the wrong issue #, the correct issue # is #1145
Even after squash rebase, I see the commit message say "ISSUE-1134:
The commit message should say "Fixes #1145: ...."
My apologies.

@kgiusti kgiusti changed the title Fixes #1134: Implement idle connection timeout for tcpListener Fixes #1145: Implement idle connection timeout for tcpListener Jul 3, 2023
@kgiusti kgiusti linked an issue Jul 3, 2023 that may be closed by this pull request
@kgiusti kgiusti marked this pull request as draft July 3, 2023 21:24
@kgiusti
Copy link
Contributor Author

kgiusti commented Jul 3, 2023

Marking this as Draft:

This approach won't scale effectively due to its timer-per-connection implementation. Our timer implementation is the problem: it maintains a linear sorted linked list of timers. As the number of connections scale so do the timers. Scheduling a timer becomes prohibitively expensive timewise as the number of connections scales. In my tests when I scale to 65K timers scheduling a timer can take several milliseconds each (lock held).

A better approach might be to have a timer-per listener and sweep the list of associated connections when the timer expires. This is similar to the approach taken for stuck delivery detection.

I'll need to think more about this.

@ganeshmurthy
Copy link
Contributor

Temporarily closing this PR. We will revisit when this becomes relevant again.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Connection Idle timeout feature for TCP Adaptor
5 participants