-
Notifications
You must be signed in to change notification settings - Fork 119
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Keep TCP/TLS connection alive and close connection properly #3018
Conversation
9da8a61
to
b2c68dc
Compare
This looks good to me! I'd be curious to hear what other folks think to make sure all involved parties are happy. |
b2c68dc
to
95c78f1
Compare
95c78f1
to
cc2b14d
Compare
cc2b14d
to
bdd4d66
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Generally OK, there are few things to tweak.
# Do proper TLS shutdown handshake (TLS tear down) first | ||
if self.__conn.sock is not None: | ||
try: | ||
self.__conn.sock.unwrap() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it's disappointing how not even shutdown()
on the SSL socket will actually shutdown! python/cpython#63080
unwrap()
is documented to return the original socket, so its return value should be assigned back to self.__conn.sock
; unwrap()
is the opposite of wrap_socket()
, which encapsulates a socket object into SSLSocket.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It seems that unwrap()
returns exactly the same ssl.SSLSocket
object as self.__conn.sock.sock
is referencing before calling unwrap()
. The ssl.SSLSocket
is subclass of socket
. Thus I don't see any reason to rewriting it.
if self.consumer_auth_cp is not None: | ||
log.debug("Closing consumer authenticated connection...") | ||
self.consumer_auth_cp.conn.close_connection() | ||
if self.no_auth_cp is not None: | ||
log.debug("Closing no authenticated connection...") | ||
self.no_auth_cp.conn.close_connection() | ||
if self.basic_auth_cp is not None: | ||
log.debug("Closing basically authenticated connection...") | ||
self.basic_auth_cp.conn.close_connection() | ||
if self.keycloak_auth_cp is not None: | ||
log.debug("Closing keycloak authenticated connection...") | ||
self.keycloak_auth_cp.conn.close_connestion() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i'd make the debug messages here a bit more clear (at least for me); for example, "Closing auth/<something> connection..."
, where <something>
is e.g. "consumer", "none", "basic", "keycloak"
Some references on Keep-Alive HTTP header: https://developer.mozilla.org/en-US/docs/Web/HTTP/Headers/Keep-Alive |
@@ -281,7 +281,7 @@ def get_server_versions(cp, exception_on_timeout=False): | |||
|
|||
if cp: | |||
try: | |||
supported_resources = get_supported_resources() | |||
supported_resources = get_supported_resources(uep=cp) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This change reduce number of TLS handshakes. Without this change: one no auth connection was used for gathering list of supported resources and then another auth/consumer connection was used fot getting status.
cp = self.cp | ||
else: | ||
cp = self.no_auth_cp | ||
self.server_versions = get_server_versions(cp, exception_on_timeout=False) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When system is registered, then it is better to use auth/consumer connection than no auth connection.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Mostly OK; there are few things to change, in addition to what I wrote inline:
- this must be rebased, otherwise it cannot be tested properly
- the last commit "Improved debug print a little bit." does not make sense on its own, simply squash it with whichever commit adds that debug statement
- regarding auth_type: maybe it would be better as Enum, so it is properly typed, there are way less risks of typos in strings, and it is automatically converted to string (reduces the debug code a bit)
- beware of the
max_request_num
vsmax_requests_num
typo! this basically nullifies the whole "max requests" logic...
src/rhsm/connection.py
Outdated
if item.startswith("timeout=") is True: | ||
search_result = pattern.search(item) | ||
if search_result is not None: | ||
keep_alive_timeout = search_result.groups()[1] | ||
if keep_alive_timeout.isdecimal() is True: | ||
keep_alive_timeout = int(keep_alive_timeout) | ||
else: | ||
keep_alive_timeout = None | ||
# Maximal number of request on one connection | ||
if item.startswith("max=") is True: | ||
search_result = pattern.search(item) | ||
if search_result is not None: | ||
max_request_num = search_result.groups()[1] | ||
if max_request_num.isdecimal() is True: | ||
max_request_num = int(max_request_num) | ||
else: | ||
max_request_num = None |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
IMHO this way of parsing the values is more complex than it should be:
- there is no need for regexps
- the check for
isdecimal()
is redundant, as it the regexp already ensures the value is only numerical
I have a couple of ideas for this code, factorizing it in an helper function:
def parse(text, prefix, default=None):
found = False
ret = default
if text.startswith(prefix):
val = text[len(prefix):]
found = True
try:
ret = int(val)
except ValueError:
pass
return (ret, found)
to be called as e.g. parse(item, "timeout=")
.
An alternative version is:
def parse(text, prefix, default=None):
found = False
ret = default
parts = text.split("=", maxsplit=1)
if len(parts) == 2 and parts[0] == prefix:
found = True
try:
ret = int(parts[1])
except ValueError:
pass
return (ret, found)
``
to be called as e.g. `parse(item, "timeout")`.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I updated and simplified code in the method. What is wrong with regular expression?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's simply overkill and slower, as the same job can be done without.
src/rhsm/connection.py
Outdated
""" | ||
|
||
:return: | ||
""" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
still undocumented, i'd remove it then (i don't see any added value by this docstring)
3c72146
to
6cdc435
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is getting better, there are still few things to fix/improve; in addition to the inline notes, there is still the wrong max_request_num
typo (missing s).
src/rhsm/connection.py
Outdated
CONSUMER_CERT_AUTH = enum.auto() | ||
|
||
# Connection uses Keycloak token | ||
KEYCLOAK_AUTH = enum.auto |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
typo for auto()
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, sorry. Fixed.
src/rhsm/connection.py
Outdated
@@ -131,6 +133,24 @@ def __str__(self): | |||
return "Bad certificate at %s" % self.cert_path | |||
|
|||
|
|||
class ConnectionType(enum.Flag): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
considering the various connection types are mutually exclusive, IMHO the type should be Enum
; in addition, if the values will be the description/debug strings, it is possible to use the automatic string representation to avoid the manual string assignment on debug (see later on)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree that it should be enum.Enum
and not enum.Flag
. I don't to assign some values to BASIC_AUTH
, NO_AUTH
, etc. that are used for debug print in one method in another class.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't to assign some values to
BASIC_AUTH
,NO_AUTH
, etc. that are used for debug print in one method in another class.
I don't understand this sentence, sorry :-/
What I mean is that you can use strings as values for the enum items, and use them when printing for debug:
>>> class FooEnum(enum.Enum):
... FOO_1 = "foo_1"
... FOO_2 = "foo2?"
>>> x = FooEnum.FOO_1
>>> print(x.value)
foo_1
This way, in the (remote) case of adding a new authentication type, it is only needed to add the new enum item without adding it also to debug blocks.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see, but I don't like it, because you propose to use e.g. "basic auth" as a value, but this value is specific for one method. BTW: This string is not grammatically correct. I just do NOT want to use it in the way you are proposing.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Jirka, no need to be that drastic :)
Considering the enum was introduced for debugging purposes, it makes sense to use it for debugging. Having a "string representation" as value would make definitely more sense than an integer value for which there is no use.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, your are proposing and I'm not accepting. I opened this PR almost 2 months ago. Is it something critical that is blocking this PR from merging to main?
if self.auth_type == ConnectionType.KEYCLOAK_AUTH: | ||
auth = "keycloak auth" | ||
elif self.auth_type == ConnectionType.BASIC_AUTH: | ||
auth = "basic auth" | ||
elif self.auth_type == ConnectionType.CONSUMER_CERT_AUTH: | ||
auth = "consumer auth" | ||
elif self.auth_type == ConnectionType.NO_AUTH: | ||
auth = "no auth" | ||
else: | ||
auth = "undefined auth" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
enum
can do a string representation automatically, see above
src/rhsm/connection.py
Outdated
else: | ||
body = None | ||
|
||
log.debug("Making request: %s %s" % (request_type, handler)) | ||
|
||
if self.user_agent: | ||
self.headers["User-Agent"] = self.user_agent | ||
|
||
final_headers = self.headers.copy() | ||
if body is None: | ||
final_headers["Content-Length"] = "0" | ||
if headers: | ||
final_headers.update(headers) | ||
|
||
self._print_debug_info_about_request(request_type, handler, final_headers, body) | ||
|
||
@staticmethod | ||
def parse_keep_alive_header(keep_alive_header: str) -> tuple: | ||
""" | ||
Try to parse 'Keep-Alive' header received from candlepin server | ||
:param keep_alive_header: string with value of the header | ||
:return: Tuple containing connection timeout and maximal number of requests | ||
""" | ||
keep_alive_timeout = None | ||
max_requests_num = None | ||
# Regular expression pattern represents: key=number | ||
pattern = re.compile(r"^(.*)=(\d+)$") | ||
|
||
items = keep_alive_header.split() | ||
|
||
for item in items: | ||
search_result = pattern.search(item) | ||
if search_result is not None: | ||
key, value = search_result.groups() | ||
# Timeout of connection using keep-alive | ||
if key == "timeout": | ||
keep_alive_timeout = int(search_result.groups()[1]) | ||
# Maximal number of request on one connection | ||
if key == "max": | ||
max_requests_num = int(search_result.groups()[1]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this can be simplified a bit without a regexp:
--- a/src/rhsm/connection.py
+++ b/src/rhsm/connection.py
@@ -869,21 +869,19 @@ class BaseRestLib(object):
"""
keep_alive_timeout = None
max_requests_num = None
- # Regular expression pattern represents: key=number
- pattern = re.compile(r"^(.*)=(\d+)$")
items = keep_alive_header.split()
for item in items:
- search_result = pattern.search(item)
- if search_result is not None:
- key, value = search_result.groups()
+ parts = item.split("=", maxsplit=2)
+ if len(parts) == 2:
+ key, value = parts
# Timeout of connection using keep-alive
if key == "timeout":
- keep_alive_timeout = int(search_result.groups()[1])
+ keep_alive_timeout = safe_int(value)
# Maximal number of request on one connection
- if key == "max":
- max_requests_num = int(search_result.groups()[1])
+ elif key == "max":
+ max_requests_num = safe_int(value)
return keep_alive_timeout, max_requests_num
also please use a if
/elif
/etc chain, so it is a minor optimization
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would not use regular expression on server side, but it is client application and we are not going to parse many HTTP headers per second.
BTW: Your proposal does not have problem with following argument: timeout=123=456
. It will produce timeout with string "123=465"
and it could cause troubles later. I still believe that regular expression is more reliable in this case.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
BTW: Your proposal does not have problem with following argument:
timeout=123=456
. It will produce timeout with string"123=465"
and it could cause troubles later. I still believe that regular expression is more reliable in this case.
I'm not sure I understand: even in my proposal there is the conversion to int, and "123=465"
definitely makes int()
raise ValueError.
I checked my proposal with the tests you added, and they work fine.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
BTW: Your proposal does not have problem with following argument:
timeout=123=456
. It will produce timeout with string"123=465"
and it could cause troubles later. I still believe that regular expression is more reliable in this case.I'm not sure I understand: even in my proposal there is the conversion to int, and
"123=465"
definitely makesint()
raise ValueError.
I'm sorry for confusion. I tested only small snippet of your proposal. safe_int()
will not raise exception, but it will return safe_value
which is default None
in this case. It is probably fine, but floats and negative values should not be IMHO allowed according RFC.
I added more test cases.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is probably fine, but floats and negative values should not be IMHO allowed according RFC.
int()
already rejects floats. Regarding negative values, it can be an easy addition:
def safe_positive_int(value, safe_value=None):
num = safe_int(value, safe_value)
if num is not None and num <= 0:
return safe_value
return num
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I just don't like what you are proposing. Please stop enforce me to rewrite my code in different way. There is nothing wrong with regular expression.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Jirka, no need to be that drastic nor absolute :) We are having a polite conversation on the possible implementation of this, and I'm explaining my ideas with reasons, examples and code snippets. There is noone "enforcing you" in any way, I'm sorry if you feel that way.
I said earlier why I don't think it's ideal to use a regexp: while it works, it's too slow, and can be done even without.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I know that it is slow, but we talk about client (not server). We do not parse 10000 requests per second. We parse one response per second, when everything goes smoothly.
Also, please ensure the proper style of the patches, reindenting with |
61784f9
to
78b2caf
Compare
* The client tools try to keep TCP/TLS connection now * The HTTP header "Connection: keep-alive" is sent now and when server does not reply with "Connection: close", then RHSM tools try to keep existing connection. It helps to reduce traffic between client tools and candlepin server. It also reduce significantly time of executing all commands. E.g. The time of registration is reduced from 20 seconds to 13 seconds * Connection using consumer authentication has to be recreated during registration, because new consumer certificate is installed * Tested with rhsm.service and it also works as expected * Improved little bit debugging of traffic and it is visible what type of authentication is used for communication with candlepin server. You can see the difference, when "subscription-manager version" is used vs e.g. "subscription-manager identity" * Try to do TLS shutdown handshake (tear down) at the end of communication with candlepin server
* Improved debug print of authentication type * Close all connections, when consumer certs are deleted * New environment variable SUBMAN_DEBUG_TCP_IP is supported. When this env. var. exists, then information about TCP/IP is also printed. It helps to track connections. * TODO: - rhsm.service do not try to reuse connections - When connection is inactive for too long, then sub-man cannot resurrect from this situation.
78b2caf
to
de72a7e
Compare
* When connection is closed by server due to client inactivity and exception httplib.RemoteDisconnected is raised, then client close existing connections and it tries request once again. This happens only in situation, when server does not propagate keep-alive timeout using Keep-Alive HTTP header and timeout of server is lower than default timeout used by client (see following comments). * We try to use default values of connection timeout 50 seconds * When timeout is reached, then client closes opened connections and new connections are recreated. It minimize repeated requests due to client inactivity * When server sends in response HTTP header Keep-Alive, then client tries to parse this HTTP header and timeout and max parameters are used * Use `enum` for type of connections * Added several unit tests for parsing Keep-Alive HTTP header * Do not print status messages, when debug printing is enabled, because it could lead to mixing of two outputs
de72a7e
to
d11934d
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Everything works for me. I can confirm significant speedup of 25% [1]. LGTM.
[1]: I run subscription-manager register --force
three times for both main and this feature branch. Average speedup was 25.9%, min 18.4%, max 33.6%.
when server does not reply with "Connection: close", then
RHSM tools try to keep existing connection. It helps to
reduce traffic between client tools and candlepin server.
It also reduce significantly time of executing all commands.
E.g. The time of registration is reduced from 20 seconds
to 13 seconds.
during registration, because new consumer certificate is
installed
what type of authentication is used for communication with
candlepin server. You can see the difference, when
"subscription-manager version" is used vs e.g.
"subscription-manager identity"