-
Notifications
You must be signed in to change notification settings - Fork 19
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fixes #1145: Implement idle connection timeout for tcpListener #1148
Conversation
Codecov Report
Additional details and impacted files@@ Coverage Diff @@
## main #1148 +/- ##
==========================================
+ Coverage 77.89% 77.92% +0.03%
==========================================
Files 238 239 +1
Lines 60659 60741 +82
Branches 5576 5582 +6
==========================================
+ Hits 47248 47333 +85
Misses 10779 10779
+ Partials 2632 2629 -3
Flags with carried forward coverage won't be shown. Click here to find out more.
|
Can we please change the commit message to "Fixes #1134:......" |
@@ -1254,6 +1254,13 @@ | |||
"type": ["up", "down"], | |||
"description": "The operational status of TCP socket listener: up - the service is active and incoming connections are permitted; down - the service is not active and incoming connection attempts will be refused.", | |||
"create": false | |||
}, | |||
"idleTimeoutSeconds": { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
From a code standpoint it's easier to have it in the listener. Having it in the listener also allows us to disable/adjust it per-service type. More flexibility. You'll also notice that the AMQP listener and connector management object have the same 'idleTimeoutSeconds' attribute, so adding it to the tcpListener is consistent with that.
I don't see a compelling advantage to having a single global value for all services. Is there a use-case where it makes sense?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I did not realize that we are only terminating idle connections on the client side but not on the server side. Terminating the client side connections will automatically terminate the server side connections as well, so we should be ok. I cannot think of a case where idle server side connections will need to be terminated.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The counter to that is whether we really envisage users wanting to set different values for this? What will the default be? Could there be a way to configure the default (i.e. a way to change it for all listeners in one place instead of changing the configuration of each individual listener)?
What advice would we give users about when to use this option and what to set it to? And can we make it so that in our opinion the majority of users can ignore it?
I can update the git commit message, but I've also set the issue in the github Development attribute associated with this pull request which will automagically close the related issue. See left sidebar. |
Yes, update the commit message please. The commit message, when referencing an issue, would be nice if we start the commit message with "Fixes #yyy: ...." as we agreed to do here - https://github.com/skupperproject/skupper-router/blob/main/CONTRIBUTING.adoc |
src/adaptors/tcp/tcp_adaptor.c
Outdated
@@ -1089,6 +1115,21 @@ static void handle_connection_event(pn_event_t *e, qd_server_t *qd_server, void | |||
case PN_RAW_CONNECTION_WAKE: { | |||
qd_log(LOG_TCP_ADAPTOR, QD_LOG_DEBUG, "[C%" PRIu64 "] PN_RAW_CONNECTION_WAKE %s", conn->conn_id, | |||
qdr_tcp_connection_role_name(conn)); | |||
if (CLEAR_ATOMIC_FLAG(&conn->check_idle_conn)) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we do this bit of code only if conn->ingress is true ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good idea! How about we check for existence of the idle_timer? The patch already does that elsewhere since the idle_timer is only allocated if idleTimeoutSeconds != 0.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Cool yo
@@ -1209,9 +1250,14 @@ static qdr_tcp_connection_t *qdr_tcp_connection(qd_tcp_listener_t *listener, qd_ | |||
assert(tcp_stats); | |||
assert(server); | |||
|
|||
if (tc->config->adaptor_config->idle_timeout) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Move this into the above if (listener) so we can have idle timers only for connections on the listener side ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We may get a request to add timeout support for the connector side as well. Still not carved in stone but if so I'd rather not conventionalize the code on the type of configuration. We'll just have to revert that change if so.
Hmmm... ok so I changed the commit message as described (needed to squash/rebase) and I've updated the name of this PR, but now the link to the actual issue in the Development field has gone away. Is this the correct message syntax? |
Marking this as Draft: This approach won't scale effectively due to its timer-per-connection implementation. Our timer implementation is the problem: it maintains a linear sorted linked list of timers. As the number of connections scale so do the timers. Scheduling a timer becomes prohibitively expensive timewise as the number of connections scales. In my tests when I scale to 65K timers scheduling a timer can take several milliseconds each (lock held). A better approach might be to have a timer-per listener and sweep the list of associated connections when the timer expires. This is similar to the approach taken for stuck delivery detection. I'll need to think more about this. |
Temporarily closing this PR. We will revisit when this becomes relevant again. |
No description provided.