-
-
Notifications
You must be signed in to change notification settings - Fork 82
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
http/websocket: fix mid-size frames sometimes failing to be received #197
base: master
Are you sure you want to change the base?
http/websocket: fix mid-size frames sometimes failing to be received #197
Conversation
Thankyou for investingating! This indeed looks like it could be the issue. Are you able to create a test case: working backwards from the changed code, we should be able to reach+test this code by sending a frame of size >125 but less than 65536. |
I'll try my best, but as I said, I'm unsure what sort of test one would add for testing this, as the conditions don't seem easily reproducible from Lua. I did at some point try interposing socket.xwrite to introduce artificial throttling and fragmentation, but it didn't end well. |
I think it would look similar to lua-http/spec/websocket_spec.lua Lines 246 to 262 in 169c1a7
:close() on the writing side until after the receive side has completed.
|
Oh I see: Okay yeah, we need some way to only send half the message "now", half after a delay. Or to cap |
I'm pretty close to getting something working, will post in a bit. |
217300f
to
2513b20
Compare
Got it. The test |
local real_xwrite | ||
local fragments = 100 | ||
local delay = 1 / fragments -- Aim for 1s. | ||
real_xwrite = cs.interpose("xwrite", function(self, str, mode, timeout) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You don't need to interpose the whole library: you should be able to overwrite just the specific instance:
local c, s = new_pair()
local real_xwrite = s.socket.xwrite
s.socket.xwrite = ffunction(self, str, mode, timeout)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It just seemed like the intended way to do it, but sure, I could do that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah never mind, I remember now why I did it like this. CQS sockets are userdata, you can't just override their methods like that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm open to alternatives.
end | ||
before_first = before_first + nbytes | ||
cqueues.sleep(delay) | ||
until before_first > #str |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should probably be >=
. >
works because xwriting an empty string happens to be ok.
Are we in a deadlock here? I'm waiting for a response on the interposition thing, maybe you're also waiting for something? |
60-day poke. |
Sorry, I'm low on free time lately. |
Alright, noted. Excuse the poking, I was just trying to make sure I hadn't derailed some sort of 'workflow' by leaving a review comment or something. |
This is more an educated guess as to why #140 exists combined with some pattern matching, than a well-researched, proof-backed solution. But it does fix a service I maintain that otherwise exhibits the problem described in the aforementioned issue.
Edit: I'm not sure what sort of test one would add for something like this, request for comments on that.
To verify that this indeed fixes what I think* is the problem, I first severely handicapped my
lo
:And ran the following programs (code below):
server.lua
client.lua
Without the fix applied, I get
and the client never terminates. With the fix applied, I get
*: I see nothing else in the code that might cause the assertion to fail, but I'm unfamiliar with both the websocket RFCs and this codebase, so there may be things I'm missing.