-
Notifications
You must be signed in to change notification settings - Fork 5.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
websockets: fix ping_timeout #3376
base: master
Are you sure you want to change the base?
Conversation
* Closes tornadoweb#3258 * Fixes an issue with the calculation of ping timeout interval that could cause connections to be erroneously timed out and closed from the server end.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Testing this is probably going to involve some ugly hacks one way or another. My first thought is to just overwrite the write_ping method: self.ws_connection.write_ping = lambda x: None
(and factor out a write_pong
method for the other side where it currently calls self._write_frame(True, 0xA, data)
).
I think there are multiple distinct problems with this ping timeout code:
- If
ping_timeout == ping_interval
, the connection breaks before the first ping is sent, and if that is avoided, it is likely to break other times due to race conditions. This was your finding in websocket disconnect and reconnect every 10 minutes #3258 (comment) and is fixed by this PR - If
ping_timeout > ping_interval
, we are slow to detect failure - we don't check for timeout until the next ping is sent (websocket: Fix ping_timeout < ping_interval #2905). This is also fixed by your PR (and it's a good reason to restructure the code instead of a small change to thesince_last_(ping,pong)
logic inperiodic_ping
. - Other folks, such as the original message in websocket disconnect and reconnect every 10 minutes #3258, have reported problems when
ping_interval
andping_timeout
are not the same. It's not clear to me what exactly the problem is here, or whether this PR fixes it. - Does it actually make sense to set
ping_interval < ping_timeout
? That's the default, but since websockets go over TCP I'm not sure it makes sense to allow multiple pings in flight before detecting failure. I'm not sure what I was thinking when this was written/merged, but now I feel like a sensible configuration would be something likeping_interval=30; ping_timeout=5
.
|
||
if self.ping_timeout and self.ping_timeout > 0: | ||
# wait for the pong | ||
await asyncio.sleep(self.ping_timeout) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Unconditionally sleeping for ping_timeout will cause problems if ping_timeout is greater than ping_interval (which is the default) - we won't start the next ping until the old timeout sleep has passed.
Unless we want to effectively require that ping_interval >= ping_timeout
, we can't use a sleep here. (Since websockets are TCP, I don't think it really makes sense to use a short ping interval with a long timeout, so maybe we'd want to make that policy change, but we have the existing defaults and documentation that suggest otherwise). We could use other synchronization primitives (like an asyncio.Condition
) that would allow for an early wakeup, or we could go back to the original pattern that checks for a timeout at the ping interval.
I also find the combination of PeriodicCallback and coroutines kind of non-obvious and if we're going to use a coroutine here I think I'd rather get rid of the PeriodicCallback and send the pings from the same coroutine. This would make the interaction of the sending of pings and waiting for pongs more explicit.
I did find it a bit strange that this is possible.
I don't think pings & pongs can carry any [meta]data, so it is not possible to associate a pong with the ping that it was responding to. As a result,
👍
I did consider this, but thought of it more as the result of a quirky configuration rather than a bug in the implementation. For configurations where
One option would be to reject such configurations: assert ping_timeout is None or ping_timeout <= ping_interval However, that might be disruptive. Another option would be to raise a warning and fall back to the if ping_timeout > ping_interval:
app_log.warn(
'The configured websocket_ping_timeout is longer than the websocket_ping_interval.'
'\nSetting websocket_ping_timeout = websocket_ping_interval'
)
ping_timeout = ping_interval
Setting Here are my suggestions:
What do you think? |
They can, actually. RFC 6455 section 5.5.2 and 5.5.3 say that a ping frame may include "application data", and a pong frame must copy the application data from the ping frame it is responding to. So the information is there to match up pings and pongs. But even then, would it be useful to send a flurry of pings faster than the RTT? I'm not seeing any use case for that (at least in HTTP/1 and HTTP/2 - maybe in HTTP/3 depending on how they model the websocket stream in a reorderable connection)
I don't think that's true - there's no early exit from the sleep, so it will ping at ping_timeout instead of ping_interval. That's what I was getting at when I said "unconditional sleep". Am I missing something? I'd be OK with the behavior you're describing in this message but not what I think the code is doing.
This is probably the right answer; I only hesitate because the default
Yes, except that I'd rather not have a warning at all than have a warning every time a websocket is opened. I'm fine without a warning; if you'd like to keep it put it behind some kind of flag so it only logs once (I don't think we have an existing idiom for this). |
Closes #3258
Closes #2905
Fixes an issue with the calculation of ping timeout interval which caused connections to be erroneously closed from the server end. This could happen shortly after the connection was opened, before the ping was even sent (as reported in #3258) as the result of startup logic. It could also happen when pong responses were sent back from the client within the configured timeout as the result of a race condition.
To test this fix, try repeating the steps in this example: #3258 (comment)
I found a TODO to implement testing for ping timeout. I tried to fill this in but couldn't get it to work, any pointers appreciated (note, it has to be tested from the server side).