regression requests on keep-alive HTTP connections only served after 40ms #338

MartinNowak · 2013-10-12T07:00:40Z

The method Libevent2TCPConnection.waitForData only returns after 40ms.
In v0.7.17 I see socket 11 write event (false)! after wait for data. The write event is missing on master.
I tried to bisect the issue but there were to many build errors in the git history to do so.

s-ludwig · 2013-10-11T06:42:00Z

The missing write event is probably because of the write watermark, which is now set to 4k instead of zero (was done while avoiding infinite data stack up due to #191). However, it's the read event that seems to get deferred somehow here. In my tests with Opera I see even a 600ms delay. Hm... or maybe it's actually the data that doesn't get flushed for the previous response. I'll dig a bit further...

- A deadlock is caused by Nagle's algorithm + delayed ACK and the write+write+read pattern (resp header + body + next req) when serving 2 HTTP requests on a keep-alive connection. - Resolved by buffering smaller writes so that responses don't get split unnecessary, i.e. blocking in write iff more than 4kB is buffered.

MartinNowak · 2013-10-12T06:56:21Z

This doesn't resolve the bug for me.
What I'm seeing in wireshark is that the HTTP response is send in 2 TCP packets (both with PSH set). The first only carries the header (113 bytes) and the second the content (13 bytes).
This in turn with a delayed ACK on the client side leads to the 40ms (Nagle's algorithm & Delayed ACK).

MartinNowak · 2013-10-12T07:00:41Z

fix Issue #338 - keep-alive HTTP requests only served after 40ms

s-ludwig · 2013-10-12T07:03:13Z

I see. The windows console is so unbelievably slow (1-3 ms for one line of text) that the delay is fully hidden by log output for me. So the flushing behavior needs to be checked (I'm also unsure what exactly the "normal" and "flush" modes of libevent actually do, need to check the sources).

MartinNowak · 2013-10-12T07:04:14Z

I converted the issue into a pull request to hopefully fix the issue.

s-ludwig · 2013-10-12T07:04:48Z

Hmm GitHub is currently messing up this ticket...

s-ludwig · 2013-10-12T07:05:09Z

Oh I see then this was probably the cause for the hiccups

s-ludwig · 2013-10-12T07:10:19Z

So if this fixes the issue for you, I'd also change the exit condition for write accordingly (i.e. > 0 -> > 4096 in line 275). Can you check if that by chance makes anything worse for you?

MartinNowak · 2013-10-12T07:18:11Z

So if this fixes the issue for you, I'd also change the exit condition for write accordingly (i.e. > 0 -> > 4096 in line 275). Can you check if that by chance makes anything worse for you?

Not sure I get you. You mean like in the pull request?

MartinNowak · 2013-10-12T07:20:04Z

AFAIU the main problem was this line 275 in write.
while (connected && evbuffer_get_length(outbuf) > 0) rawYield();
It would only return after libevent wrote the whole buffer to the network socket, but then the kernel would often already send a TCP packet. Sending multiple small creates the deadlock problem.

s-ludwig · 2013-10-12T07:22:01Z

Ok, forget what I've said. It's all there already, not sure what I've seen or thought to see.

regression requests on keep-alive HTTP connections only served after 40ms

s-ludwig · 2013-10-12T07:37:41Z

Looking with Wireshark at the stream I can also confirm that it fixed the packet fragmentation issue. vibed.org is also responding noticeably faster again.

s-ludwig · 2013-10-12T07:47:51Z

Assuming that the big delays that I've seen in my first tests were caused by the delayed ACK (didn't know of the bad interaction with Nagle's algorithm, thanks for the pointer!), maybe it makes sense to embrace libevent's buffering mechanism and enable tcpNoDelay by default. But that again opens up the question what exactly libevent does when bufferevent_flush is called. I'll open a ticket to remember looking into this at some point.

MartinNowak · 2013-10-23T15:04:49Z

As an improvement we could use the real MSS size (MTU - TCP/IP headers) of the socket instead of the hardcoded 4096 bytes. It can be obtained with getsockopt and TCP_MAXSEG.
This would solve any remaining occurence of this problem on the loopback device which has a 64kB MTU.

NB: The druntime network headers are a mess and TCP_MAXSEG is currently missing.

This works around linking problems such as vibe-d/vibe.d#338, caused by the frontend appending template instances to the wrong module. GitHub: Fixes ldc-developers#558.

s-ludwig closed this in e0aa165 Oct 11, 2013

s-ludwig reopened this Oct 12, 2013

s-ludwig added a commit that referenced this pull request Oct 12, 2013

Merge pull request #338 from dawgfoto/fix338

791be2a

regression requests on keep-alive HTTP connections only served after 40ms

s-ludwig merged commit 791be2a into vibe-d:master Oct 12, 2013

s-ludwig mentioned this pull request Oct 12, 2013

Look into enabling TCP nodelay by default and performing proper buffering/flushing instead #339

Open

MartinNowak deleted the fix338 branch October 23, 2013 15:00

s-ludwig mentioned this pull request Nov 21, 2013

Fiber.yield() very slow #320

Closed

dnadlinger mentioned this pull request Dec 26, 2013

Disable internalizing of nested functions. ldc-developers/ldc#559

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

regression requests on keep-alive HTTP connections only served after 40ms #338

regression requests on keep-alive HTTP connections only served after 40ms #338

MartinNowak commented Oct 12, 2013

s-ludwig commented Oct 11, 2013

MartinNowak commented Oct 12, 2013

MartinNowak commented Oct 12, 2013

s-ludwig commented Oct 12, 2013

MartinNowak commented Oct 12, 2013

s-ludwig commented Oct 12, 2013

s-ludwig commented Oct 12, 2013

s-ludwig commented Oct 12, 2013

MartinNowak commented Oct 12, 2013

MartinNowak commented Oct 12, 2013

s-ludwig commented Oct 12, 2013

s-ludwig commented Oct 12, 2013

s-ludwig commented Oct 12, 2013

MartinNowak commented Oct 23, 2013

regression requests on keep-alive HTTP connections only served after 40ms #338

regression requests on keep-alive HTTP connections only served after 40ms #338

Conversation

MartinNowak commented Oct 12, 2013

s-ludwig commented Oct 11, 2013

MartinNowak commented Oct 12, 2013

MartinNowak commented Oct 12, 2013

s-ludwig commented Oct 12, 2013

MartinNowak commented Oct 12, 2013

s-ludwig commented Oct 12, 2013

s-ludwig commented Oct 12, 2013

s-ludwig commented Oct 12, 2013

MartinNowak commented Oct 12, 2013

MartinNowak commented Oct 12, 2013

s-ludwig commented Oct 12, 2013

s-ludwig commented Oct 12, 2013

s-ludwig commented Oct 12, 2013

MartinNowak commented Oct 23, 2013