better TCP_NODELAY handling: only use it when it is useful #619

totaam · 2014-07-30T03:49:45Z

Follow up from #514: at present we enable TCP_NODELAY globally which is a bit wasteful.

It ensures that packets go out as soon as we queue them, but when the packets contain large-ish binary data this means that the binary data and the actual xpra packet structure are likely to travel in separate TCP-level packets.

It would be better to only enable TCP_NODELAY when aggregating packets is not helping: when we have no more data to send or when the output buffer is full. As per: Is there a way to flush a POSIX socket? and this answer:
*What I do is enable Nagle, write as many bytes (using non-blocking I/O) as I can to the socket (i.e. until I run out of bytes to send, or the send() call returns EWOULDBLOCK, whichever comes first), and then disable Nagle again. This seems to work well (i.e. I get low latency AND full-size packets where possible) *

Good read: The Caveats of TCP_NODELAY

The text was updated successfully, but these errors were encountered:

totaam · 2016-08-15T10:42:43Z

See also: #1211, #639, #999, #401, #540, #417

totaam · 2018-01-25T05:18:46Z

I'm seeing some weird behaviour with win32 clients trying to improve #999 and detecting late acks.

The network layer's source_has_more function may not be the right place to set and unset NODELAY because lots of small screen updates can take under a millisecond to compress, which is still slower than it takes the network layer to send them...
Maybe we need to pass the screen update flush attribute down to the network layer too.

totaam · 2018-01-25T07:15:01Z

Done in r18149 + r18150. Implementation notes:

we add an attribute to the packet source functions so the sender can specify if there will be more packets coming (also used if we know the packet is not urgent and can be delayed by the network layer)
we no longer set NODELAY globally on startup, we change (if needed) for each packet
in the case of screen updates, we re-use the "flush" attribute (set to a non-zero value when we know that there are more screen update packets coming immediately after)
some other packets set the flush flag because they need to be sent in a timely manner (ie: ping) or because they affect the user experience (ie: lost-window)
TCP_SOCKET_NODELAY is now XPRA_SOCKET_NODELAY as it applies to more than just TCP sockets
we expose the "nodelay" attribute, it should always match the TCP sockopt
important bandwidth management updates in catch limited bandwidth issues sooner #999#comment:44

As of r18151, we can use XPRA_SOCKET_NODELAY to overrule the automatic settings:

XPRA_SOCKET_NODELAY=1 should be more or less equivalent to what we had before
XPRA_SOCKET_NODELAY=0 never sets NODELAY

TODO:

deal with other socket types - websockify does its own thing with a global flag, can we override it?
verify RFB and UDP still work after protocol layer changes (and proxy?)
maybe keep NODELAY off when there is congestion and we are batching? (not sure this is needed or helpful)
maybe audio packets need to be flushed quicker? (does this change cause audio latency to go up?)

@maxmylyn: this ticket is tailor made for the automated tests - we want to compare before and after to see if this helps, especially under bandwidth constrained conditions. (only slight problem is, there is a bug I'm working on which causes congestion detection to kick in too early, and the counter measures are too aggressive, causing the framerate to drop..)

totaam · 2018-01-25T19:28:19Z

2018-01-25 19:28:19: maxmylyn commented

I did some maintenance this morning on the test box - it's been acting up again. For starters, I've split the TC tests into two different files - one runs just the packet loss/delay tests, the other runs bandwidth limited cases. That way if something goes wrong with either test suite, we don't lost all the data.

However there's a separate problem that I'm going to look at now - seemingly at random the tests fail to stop the server properly, at which point all following tests fail consistently because they're unable to spin up a server on the display that's still in use. A simple xpra stop clears the server, so I'll sprinkle that command in to the scripts between test suites....but I'd like to figure out why I need to do that.

I'll hold on to this ticket for a few days so we can gather more data.

totaam · 2019-01-03T18:06:51Z

2019-01-03 18:06:51: maxmylyn commented

As I mentioned in #1840, my mathematician is unavailable for a little bit, and he was working on the new charts, so I'm going to post the raw data I have so far.

totaam · 2019-01-03T18:07:21Z

2019-01-03 18:07:21: maxmylyn uploaded file `tcp.zip` (277.2 KiB)

TCP output data

totaam · 2019-01-18T11:15:57Z

TCP output data

It's not clear what command lines were used for each run, as there are 3 possible values for XPRA_SOCKET_NODELAY (0, 1, unset)

It also doesn't look like this was being tested with any bandwidth constraints?
All encodings are tested in there, which is going to make the data very noisy. I would focus on one setting ("auto" or "rgb"), and only one test (one that generates lots of small-ish screen updates - maybe simulate console user, or gtkperf) and maybe vary only the throttling - if anything.

totaam · 2019-01-21T18:22:53Z

2019-01-21 18:22:53: maxmylyn commented

It's not clear what command lines were used for each run, as there are 3 possible values for XPRA_SOCKET_NODELAY (0, 1, unset)

For these tests, I have it run a series of three test runs. There are two sets - the ones with the prefix nodelay_ have XPRA_SOCKET_NODELAY=1 set, and delay_ has XPRA_SOCKET_NODELAY=0 set.

It also doesn't look like this was being tested with any bandwidth constraints?

I'll copy some of the Bash script from when I was running daily tests with bandwidth constraints - 25, 16, and 8 megabits should be a good enough starting point. I've changed the config file for this test run to only use rgb, and it only runs the two console tests and gtkperf - since I'm not hardware limited anymore, it doesn't hurt to run extra tests since I'll usually have a spare machine or two in case I need it for something else.

totaam · 2019-01-23T17:36:39Z

2019-01-23 17:36:39: maxmylyn uploaded file `619tcp.zip` (500.8 KiB)

totaam · 2019-01-23T17:43:21Z

2019-01-23 17:43:21: maxmylyn commented

I've posted a new set of data. This time I ran tc qdisc add dev lo root netem rate 25mbit between test runs with XPRA_SOCKET_NODELAY set to 0 and 1, with 25mbit, 16mbit, and 8mbit bandwidth constraints.

Unfortunately most of that data is useless. It looks like the IP Tables command didn't work (EDIT: This one's on me - I left USE_IPTABLES disabled in the config), so all the packet accounting didn't get recorded, and as mentioned in #1840 quite a few of the other columns are missing as well. I'll try to figure out why that's the case, but I'm posting it anyways since all the data is ~500KB.

totaam · 2019-01-24T05:55:41Z

.. as mentioned in #1840 quite a few of the other columns are missing as well ..

As mentioned in #1840, the breakage is likely quite recent whereas this ticket is a year old.
Why not test with a version that does record the data we want?

totaam · 2019-01-28T17:18:00Z

r21493 waits until after we have sent the last chunk before enabling NODELAY.
I believe that's correct and the kernel will then flush the socket, but it would be much better to verify that with test data.

totaam · 2019-01-29T14:03:26Z

r21495: also disable NODELAY for multiple chunks (doh)

totaam · 2019-01-31T12:50:20Z

2019-08-09 03:26:28: smo uploaded file `test_nodelay.tar.gz` (242.8 KiB)

nodelay charts

totaam · 2019-08-09T03:28:44Z

Attached some charts and data for this.

I'm not sure if the script for charting took into account the instances I ran with trickle.

I could have just included the network/packet stuff in the charts but left all the details there.

totaam · 2019-08-09T13:51:48Z

Please include the SOCKET_CORK option (#2130) in this test data.
I think it would make more of a difference if we could compare with and without trickle limits too, but I don't think that the "perf charts" code will let you do that as it is?
It would be useful compare per-bandwidth-limit / per-nodelay / per-cork settings for example. I assume that's why the "max-batch-delay" goes so high with all settings (300 ms!): it would help to see how that varies per-bandwidth-limit.
For some metrics, the data from low bandwidth-limits may skew the results.
All these screensaver tests mostly behave the same (full screen updates, high framerate), it would be more useful to have other tests in there: gtkperf, xterm, even x11perf.
As expected, the "server-number-of-threads" and "server-vsize" go up in "auto" mode since that uses multi-threaded video encoders. But I don't see the benefit of video encoders on framerate ("regions-per-second") or pixels-per-second. (though the "encoding-pixels-per-second" does show a very different profile)

Some thoughts on what I was expecting to see:

memscroller should perform worse with SOCKET_NODELAY=1 because the screen will update in multiple chunks bunched up together and the network can benefit from accumulating them in the kernel socket output buffer (the xterm tests might be even better for showing this effect) - it does, but not by much, probably because SOCKET_CORK does that even better.
NODELAY=1 should give a better overall latency, we need to record the overall paint latency to the data and graphs, I will try to add that (add total latency to perf charts #2381)
NODELAY=0 (or unset) should give better bandwidth utilisation - this may be masked by the fact that we can't distinguish bandwidth limits (the much higher values from "no-limit" will hide any differences)

totaam · 2019-08-15T03:07:31Z

2019-08-15 03:07:31: smo uploaded file `test_nodelay_cork.tar.gz` (633.5 KiB)

test of nodelay and cork

totaam · 2019-08-15T03:08:47Z

Attached is with the combinations of XPRA_SOCKET_NODELAY and XPRA_SOCKET_CORK compared.

Longer tests this time a few different ones.

totaam · 2019-08-15T12:52:53Z

Sorry, I forgot to ask you to include the default case with XPRA_SOCKET_NODELAY unset, both with and without CORK for completeness.

Very interesting to have 4 combinations already. Maybe we should combine more test results?

So far:

the memscroller problems are only with NODELAY=0 - added info to #2382#comment:3
CORK=1 is better: more regions-per-second, more pixels, lower batch delay, etc..

totaam · 2019-08-19T16:01:49Z

2019-08-19 16:01:49: smo uploaded file `test_nodelay_cork2.tar.gz` (727.6 KiB)

new test with nodelay unset data

totaam · 2019-08-19T16:03:43Z

2019-08-19 16:03:43: smo changed owner from smo to Antoine Martin

totaam · 2019-08-19T16:03:43Z

2019-08-19 16:03:43: smo commented

Okay I have attached the previous data and with the new stuff you asked for.

I was hoping to make the charts a bit more readable but after spending some time I was not able to. (I'm not the best at JS/HTML)

Take a look and let me know if this is good. Maybe we can finally close this ticket.

totaam · 2019-08-19T17:26:04Z

@smo: there are two sets of NODELAY Unset CORK 1 - what's the difference?

totaam · 2019-08-19T17:55:02Z

2019-08-19 17:55:02: smo uploaded file `test_nodelay_cork3.tar.gz` (727.6 KiB)

bad label on previous chart

totaam · 2019-08-19T17:56:06Z

2019-08-19 17:56:06: smo commented

Oops that was my bad. Wrong label for that one. Attached new tarball.

totaam · 2019-08-20T06:09:18Z

The charts are now available here: https://xpra.org/stats/nodelay-cork/

totaam · 2019-09-25T09:47:09Z

This option can now be enabled on a per-socket basis: #2424#comment:1.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

better TCP_NODELAY handling: only use it when it is useful #619

better TCP_NODELAY handling: only use it when it is useful #619

totaam commented Jul 30, 2014 •

edited

Loading

totaam commented Aug 15, 2016 •

edited

Loading

totaam commented Jan 25, 2018 •

edited

Loading

totaam commented Jan 25, 2018 •

edited

Loading

totaam commented Jan 25, 2018

totaam commented Jan 3, 2019

totaam commented Jan 3, 2019

totaam commented Jan 18, 2019 •

edited

Loading

totaam commented Jan 21, 2019

totaam commented Jan 23, 2019

totaam commented Jan 23, 2019

totaam commented Jan 24, 2019 •

edited

Loading

totaam commented Jan 28, 2019 •

edited

Loading

totaam commented Jan 29, 2019 •

edited

Loading

totaam commented Jan 31, 2019 •

edited

Loading

totaam commented Aug 9, 2019

totaam commented Aug 9, 2019 •

edited

Loading

totaam commented Aug 9, 2019 •

edited

Loading

totaam commented Aug 15, 2019

totaam commented Aug 15, 2019 •

edited

Loading

totaam commented Aug 15, 2019 •

edited

Loading

totaam commented Aug 19, 2019

totaam commented Aug 19, 2019

totaam commented Aug 19, 2019

totaam commented Aug 19, 2019 •

edited

Loading

totaam commented Aug 19, 2019

totaam commented Aug 19, 2019

totaam commented Aug 20, 2019 •

edited

Loading

totaam commented Sep 25, 2019 •

edited

Loading

better TCP_NODELAY handling: only use it when it is useful #619

better TCP_NODELAY handling: only use it when it is useful #619

Comments

totaam commented Jul 30, 2014 • edited Loading

totaam commented Aug 15, 2016 • edited Loading

totaam commented Jan 25, 2018 • edited Loading

totaam commented Jan 25, 2018 • edited Loading

totaam commented Jan 25, 2018

2018-01-25 19:28:19: maxmylyn commented

totaam commented Jan 3, 2019

2019-01-03 18:06:51: maxmylyn commented

totaam commented Jan 3, 2019

2019-01-03 18:07:21: maxmylyn uploaded file tcp.zip (277.2 KiB)

totaam commented Jan 18, 2019 • edited Loading

totaam commented Jan 21, 2019

2019-01-21 18:22:53: maxmylyn commented

totaam commented Jan 23, 2019

2019-01-23 17:36:39: maxmylyn uploaded file 619tcp.zip (500.8 KiB)

totaam commented Jan 23, 2019

2019-01-23 17:43:21: maxmylyn commented

totaam commented Jan 24, 2019 • edited Loading

totaam commented Jan 28, 2019 • edited Loading

totaam commented Jan 29, 2019 • edited Loading

totaam commented Jan 31, 2019 • edited Loading

totaam commented Aug 9, 2019

2019-08-09 03:26:28: smo uploaded file test_nodelay.tar.gz (242.8 KiB)

totaam commented Aug 9, 2019 • edited Loading

totaam commented Aug 9, 2019 • edited Loading

totaam commented Aug 15, 2019

2019-08-15 03:07:31: smo uploaded file test_nodelay_cork.tar.gz (633.5 KiB)

totaam commented Aug 15, 2019 • edited Loading

totaam commented Aug 15, 2019 • edited Loading

totaam commented Aug 19, 2019

2019-08-19 16:01:49: smo uploaded file test_nodelay_cork2.tar.gz (727.6 KiB)

totaam commented Aug 19, 2019

2019-08-19 16:03:43: smo changed owner from smo to Antoine Martin

totaam commented Aug 19, 2019

2019-08-19 16:03:43: smo commented

totaam commented Aug 19, 2019 • edited Loading

totaam commented Aug 19, 2019

2019-08-19 17:55:02: smo uploaded file test_nodelay_cork3.tar.gz (727.6 KiB)

totaam commented Aug 19, 2019

2019-08-19 17:56:06: smo commented

totaam commented Aug 20, 2019 • edited Loading

totaam commented Sep 25, 2019 • edited Loading

totaam commented Jul 30, 2014 •

edited

Loading

totaam commented Aug 15, 2016 •

edited

Loading

totaam commented Jan 25, 2018 •

edited

Loading

totaam commented Jan 25, 2018 •

edited

Loading

2019-01-03 18:07:21: maxmylyn uploaded file `tcp.zip` (277.2 KiB)

totaam commented Jan 18, 2019 •

edited

Loading

2019-01-23 17:36:39: maxmylyn uploaded file `619tcp.zip` (500.8 KiB)

totaam commented Jan 24, 2019 •

edited

Loading

totaam commented Jan 28, 2019 •

edited

Loading

totaam commented Jan 29, 2019 •

edited

Loading

totaam commented Jan 31, 2019 •

edited

Loading

2019-08-09 03:26:28: smo uploaded file `test_nodelay.tar.gz` (242.8 KiB)

totaam commented Aug 9, 2019 •

edited

Loading

totaam commented Aug 9, 2019 •

edited

Loading

2019-08-15 03:07:31: smo uploaded file `test_nodelay_cork.tar.gz` (633.5 KiB)

totaam commented Aug 15, 2019 •

edited

Loading

totaam commented Aug 15, 2019 •

edited

Loading

2019-08-19 16:01:49: smo uploaded file `test_nodelay_cork2.tar.gz` (727.6 KiB)

totaam commented Aug 19, 2019 •

edited

Loading

2019-08-19 17:55:02: smo uploaded file `test_nodelay_cork3.tar.gz` (727.6 KiB)

totaam commented Aug 20, 2019 •

edited

Loading

totaam commented Sep 25, 2019 •

edited

Loading