Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

regression requests on keep-alive HTTP connections only served after 40ms #338

Merged
merged 1 commit into from
Oct 12, 2013

Conversation

MartinNowak
Copy link
Contributor

The method Libevent2TCPConnection.waitForData only returns after 40ms.
In v0.7.17 I see socket 11 write event (false)! after wait for data. The write event is missing on master.
I tried to bisect the issue but there were to many build errors in the git history to do so.

@s-ludwig
Copy link
Member

The missing write event is probably because of the write watermark, which is now set to 4k instead of zero (was done while avoiding infinite data stack up due to #191). However, it's the read event that seems to get deferred somehow here. In my tests with Opera I see even a 600ms delay. Hm... or maybe it's actually the data that doesn't get flushed for the previous response. I'll dig a bit further...

@s-ludwig s-ludwig closed this in e0aa165 Oct 11, 2013
- A deadlock is caused by Nagle's algorithm + delayed ACK and
  the write+write+read pattern (resp header + body + next req)
  when serving 2 HTTP requests on a keep-alive connection.

- Resolved by buffering smaller writes so that responses don't
  get split unnecessary, i.e. blocking in write iff more than
  4kB is buffered.
@MartinNowak
Copy link
Contributor Author

This doesn't resolve the bug for me.
What I'm seeing in wireshark is that the HTTP response is send in 2 TCP packets (both with PSH set). The first only carries the header (113 bytes) and the second the content (13 bytes).
This in turn with a delayed ACK on the client side leads to the 40ms (Nagle's algorithm & Delayed ACK).

@MartinNowak
Copy link
Contributor Author

fix Issue #338 - keep-alive HTTP requests only served after 40ms

@s-ludwig
Copy link
Member

I see. The windows console is so unbelievably slow (1-3 ms for one line of text) that the delay is fully hidden by log output for me. So the flushing behavior needs to be checked (I'm also unsure what exactly the "normal" and "flush" modes of libevent actually do, need to check the sources).

@s-ludwig s-ludwig reopened this Oct 12, 2013
@MartinNowak
Copy link
Contributor Author

I converted the issue into a pull request to hopefully fix the issue.

@s-ludwig
Copy link
Member

Hmm GitHub is currently messing up this ticket...

@s-ludwig
Copy link
Member

Oh I see then this was probably the cause for the hiccups

@s-ludwig
Copy link
Member

So if this fixes the issue for you, I'd also change the exit condition for write accordingly (i.e. > 0 -> > 4096 in line 275). Can you check if that by chance makes anything worse for you?

@MartinNowak
Copy link
Contributor Author

So if this fixes the issue for you, I'd also change the exit condition for write accordingly (i.e. > 0 -> > 4096 in line 275). Can you check if that by chance makes anything worse for you?

Not sure I get you. You mean like in the pull request?

@MartinNowak
Copy link
Contributor Author

AFAIU the main problem was this line 275 in write.
while (connected && evbuffer_get_length(outbuf) > 0) rawYield();
It would only return after libevent wrote the whole buffer to the network socket, but then the kernel would often already send a TCP packet. Sending multiple small creates the deadlock problem.

@s-ludwig
Copy link
Member

Ok, forget what I've said. It's all there already, not sure what I've seen or thought to see.

s-ludwig added a commit that referenced this pull request Oct 12, 2013
regression requests on keep-alive HTTP connections only served after 40ms
@s-ludwig s-ludwig merged commit 791be2a into vibe-d:master Oct 12, 2013
@s-ludwig
Copy link
Member

Looking with Wireshark at the stream I can also confirm that it fixed the packet fragmentation issue. vibed.org is also responding noticeably faster again.

@s-ludwig
Copy link
Member

Assuming that the big delays that I've seen in my first tests were caused by the delayed ACK (didn't know of the bad interaction with Nagle's algorithm, thanks for the pointer!), maybe it makes sense to embrace libevent's buffering mechanism and enable tcpNoDelay by default. But that again opens up the question what exactly libevent does when bufferevent_flush is called. I'll open a ticket to remember looking into this at some point.

@MartinNowak
Copy link
Contributor Author

As an improvement we could use the real MSS size (MTU - TCP/IP headers) of the socket instead of the hardcoded 4096 bytes. It can be obtained with getsockopt and TCP_MAXSEG.
This would solve any remaining occurence of this problem on the loopback device which has a 64kB MTU.

NB: The druntime network headers are a mess and TCP_MAXSEG is currently missing.

@s-ludwig s-ludwig mentioned this pull request Nov 21, 2013
dnadlinger added a commit to dnadlinger/ldc that referenced this pull request Dec 26, 2013
This works around linking problems such as vibe-d/vibe.d#338,
caused by the frontend appending template instances to the wrong
module.

GitHub: Fixes ldc-developers#558.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants