Sometimes it's important to check for socket writeability before trying to write #446

njsmith · 2016-10-16T05:39:05Z

I recently discovered that Linux/OS X provide an important API (TCP_NOTSENT_LOWAT) that lets applications avoid queuing up excessive data inside the kernel's socket send buffers. (The socket send buffers are generally too big, for various reasons.) Unfortunately, it turns out that this API works by controlling when a socket is marked writeable by select and friends, but does not affect whether a send call will succeed, so while you might think these are the same thing they actually aren't. [Edit: it turns out that this description is actually incorrect on Linux, though probably true on macOS -- see] I initially filed a bug on curio about this because curio was assuming they were the same, so I won't repeat all the details: dabeaz/curio#83

@dabeaz points out that asyncio seems to make the same invalid optimization, so filing a bug here too.

The text was updated successfully, but these errors were encountered:

njsmith · 2016-10-17T03:40:03Z

On further discussion (see the curio issue), it sounds like the tentative conclusion is:

There should be some way to specifically wait for a socket / StreamWriter to become writeable. (rationale: Better send buffer management: await sock.writeable() and TCP_NOTSENT_LOWAT dabeaz/curio#83 (comment))
For bonus points, it actually probably makes sense to enable TCP_NOTSENT_LOWAT on TCP sockets by default to get better buffering behavior.

gvanrossum · 2016-10-17T04:01:08Z

At the lowest level in asyncio (i.e. if you have a socket) if you just start sending, loop.sock_sendall() you will indeed be hit by this if the optimization misfires. But an app can work around this using loop.add_writer(). At the next level you have a Protocol/Transport pair, which has a synchronous write() method that contains this optimization. That's not so easy to work around at the app side, but there is a Protocol API that could be used for this: pause_writing()/resume_writing(). We can probably change the default transport implementation so that it uses these more aggressively, without API changes. asyncio streams are built on top of Protocol/Transport pairs, so at that level we should be able to benefit from whatever we do for the previous level. @glyph has this reached Twisted yet? PS Jim Gettys has been complaining about this for years. Glad something's finally being done about it. And @njsmith, thanks for the clear explanations!

gjcarneiro · 2016-10-17T15:21:04Z

I guess this is trying to address the buffer bloat problem?...

njsmith · 2016-10-18T20:32:20Z

@gjcarneiro: bufferbloat is a many-headed hydra, but yeah, this is about bufferbloat in the context of per-socket send buffers specifically. The discussion thread on the curio issue has lots more details.

glyph · 2016-10-26T22:42:15Z

@glyph has this reached Twisted yet?

Not TCP_NOTSENT_LOWAT, no. I'm sort of curious how our producer/consumer API interacts with this detail; I have a feeling it'll behave correctly, but I'm not entirely sure.

However, in the process of investigating this, I learned that we apparently removed the eager-write optimization many years ago:

twisted/twisted@c75d1eb

Digging into the history and viewing some of the discussion around that time, it seems that we were aware that it punished us pretty brutally on certain micro-benchmarks, but there's no realistic benchmark we could find where it impacts performance significantly. @dabeaz points out over on the other ticket that it's a massive performance penalty to an echo-server benchmark, and that's true; however, echo is not a realistic application.

If you want to do anything interesting you need to talk to at least one other back-end service, which means that you need to carefully manage the relationship between two transports, which means you need a producer/consumer hookup. Once you have that, you can't really get the meat of the optimization that eager-writes give you, which is the ability to avoid the extra select/epoll/kqueue(etc) syscall between recv and send, since you need to go back to the main loop to see if it's time to read again between each packet anyway.

It also does punish the writer on benchmarks where you are synthesizing data on the CPU rather than getting it or processing it from a different remote source, but /dev/urandom as a service also has pretty limited utility.

That said, I don't think Twisted is a great model to look towards for good support for tunables; tuning has historically been a weak point for us, because users who have significant performance demands almost always end up fixing them by making scaling up and down easier rather than optimizing throughput. Also, the only application where this sort of tuning tends to make any difference is something that is just shuttling around huge volumes of data without really processing it, and if you're doing that you're more likely to use HAProxy or something.

That said, I really appreciate learning about this nuance of send on linux. Hopefully at some point in the coming year we're going to do an overhaul of how we deal with tunable transport parameters (mostly focused on the more-portable SO_SENDBUF and SO_RECVBUF than this platform-specific detail) and it'll be good to keep it in mind for that.

Lukasa · 2016-11-15T12:43:45Z

I should note that I have an interest in adding support for TCP_NOTSENT_LOWAT into Twisted because it's highly-valuable for HTTP/2, where it's extremely valuable to keep send buffers small if possible to prevent control frames getting blocked behind buffered stream data. That means that support for APIs of that kind is likely to want to be something asyncio provides as well.

However, I disagree with @njsmith's assertion that asyncio just wants to start using it by default. In particular, for bulk unframed data transfers where throughput is more important than reactivity, applications will want to avoid spinning up the Python event loop wherever possible: for that reason, large writes are ideal and using TCP_NOTSENT_LOWAT with a bad value will have nasty negative performance impacts. The biggest case of this is for protocols like FTP and HTTP/1.1, particularly when sendfile is not available to the application, where we want to free the event loop up to do other things rather than repeatedly send smallish writes into the kernel.

In the worst-case of a 100% CPU-utilisation event loop, aggressively low values of TCP_NOTSENT_LOWAT can lead to pauses in data transfer because the event loop isn't able to respond to the POLLOUT event before the kernel send buffer empties entirely.

It is much better for asyncio to expose this kind of tuneable rather than opt-into it by default. Let application developers decide what the performance characteristics of their protocols should be.

njsmith · 2016-11-15T17:46:37Z

Ah, but that can be handled by the library too. On OS X, the splitting of
large writes isn't an issue at all, since TCP_NOTSENT_LOWAT only affects
select-and-friends, not send-and-friends. And in Linux, you can achieve the
same effect by having your send routine do: (1) turn off TCP_NOTSENT_LOWAT,
(2) call send, (3) turn it on again. The basic intuition here is that you
want to let the send buffer drain before signaling writeability to avoid
standing buffers, but once the application has committed to sending a large
chunk of data, you want to hand that off to the kernel as quickly as
possible, even if that does temporarily create a large buffer.
.
I agree that the actual TCP_NOTSENT_LOWAT value should be tuneable, and
that this is a somewhat experimental proposal. But theoretically at least
it seems like there are some pretty compelling arguments that the best
default value for TCP_NOTSENT_LOWAT is smaller than the "infinity" we
currently default to.

On Nov 15, 2016 04:44, "Cory Benfield" [email protected] wrote:

I should note that I have an interest in adding support for
TCP_NOTSENT_LOWAT into Twisted because it's highly-valuable for HTTP/2,
where it's extremely valuable to keep send buffers small if possible to
prevent control frames getting blocked behind buffered stream data. That
means that support for APIs of that kind is likely to want to be something
asyncio provides as well.

However, I disagree with @njsmith https://github.com/njsmith's
assertion that asyncio just wants to start using it by default. In
particular, for bulk unframed data transfers where throughput is more
important than reactivity, applications will want to avoid spinning up the
Python event loop wherever possible: for that reason, large writes are
ideal and using TCP_NOTSENT_LOWAT with a bad value will have nasty negative
performance impacts. The biggest case of this is for protocols like FTP and
HTTP/1.1, particularly when sendfile is not available to the application,
where we want to free the event loop up to do other things rather than
repeatedly send smallish writes into the kernel.

In the worst-case of a 100% CPU-utilisation event loop, aggressively low
values of TCP_NOTSENT_LOWAT can lead to pauses in data transfer because the
event loop isn't able to respond to the POLLOUT event before the kernel
send buffer empties entirely.

It is much better for asyncio to expose this kind of tuneable rather than
opt-into it by default. Let application developers decide what the
performance characteristics of their protocols should be.

—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
#446 (comment), or mute
the thread
https://github.com/notifications/unsubscribe-auth/AAlOaH3KMPWd80cI2CB8X2cdsddhk99Oks5q-akDgaJpZM4KX5Ye
.

njsmith mentioned this issue Nov 13, 2016

The eager read/write optimization leaves curio programs highly susceptible to event loop starvation dabeaz/curio#106

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Sometimes it's important to check for socket writeability before trying to write #446

Sometimes it's important to check for socket writeability before trying to write #446

njsmith commented Oct 16, 2016 •

edited

Loading

njsmith commented Oct 17, 2016

gvanrossum commented Oct 17, 2016 via email

gjcarneiro commented Oct 17, 2016

njsmith commented Oct 18, 2016

glyph commented Oct 26, 2016

Lukasa commented Nov 15, 2016

njsmith commented Nov 15, 2016

Sometimes it's important to check for socket writeability before trying to write #446

Sometimes it's important to check for socket writeability before trying to write #446

Comments

njsmith commented Oct 16, 2016 • edited Loading

njsmith commented Oct 17, 2016

gvanrossum commented Oct 17, 2016 via email

gjcarneiro commented Oct 17, 2016

njsmith commented Oct 18, 2016

glyph commented Oct 26, 2016

Lukasa commented Nov 15, 2016

njsmith commented Nov 15, 2016

njsmith commented Oct 16, 2016 •

edited

Loading