Skip to content
This repository has been archived by the owner on Jan 13, 2021. It is now read-only.

Better 'concurrency'. #65

Open
Lukasa opened this issue Jul 15, 2014 · 12 comments
Open

Better 'concurrency'. #65

Lukasa opened this issue Jul 15, 2014 · 12 comments

Comments

@Lukasa
Copy link
Member

Lukasa commented Jul 15, 2014

Right now hyper doesn't read frames until the user attempts to read data or our connection window closes. This is obviously a problem:

  1. We will send all the data on a stream before we get a chance to read the RST_STREAM frame that the server sent after our HEADERS frame. This kind of behaviour will likely cause a server to kill our connection.
  2. More generally, we don't find out about changes in connection state until a read event. This is a bit troubling: we'll be slow to respond to SETTINGS with ACKs, for example.
  3. We allow clients to put themselves into a position where they can shoot themselves in the foot, e.g. by sending large numbers of requests with small (or no) bodies before reading anything. In this case, hyper just won't find out at all.
  4. Our TCP buffer will fill and push back on the TCP connection, hurting throughput unnecessarily. HTTP/2 flow control exists to avoid this problem.

We need some solutions to this. Proposals:

  1. Have a separate thread that reads from the socket to handle grabbing control frames, one per connection. This will work. Downsides: should libraries launch their own threads? Also, this will cause problems if the main thread is heavily CPU loaded. There's also a perf overhead for threads in Python.
  2. Have a method that will read N frames off the connection each time we send a frame. This method avoids the problem of having concurrency. Problem: if N is too small we have the same problem we have now, if N is too large we can severely slow down send-ing operations.
  3. A variation on (1), have those threads be explicitly launched by users. At least that way we can blame them for all the bad stuff.
  4. Have a design that puts HTTP/2 connections into separate processes and uses queues to send requests/responses between them. There's overhead here (SO MUCH MEMORY COPYING), and we can't force it to be used. Downsides: COPYING, require all users to launch extra processes?
  5. Accept that this is terrible, have a port for asyncio and basically tell people that HTTP in non-async is a bad idea.

Any other ideas? /cc @shazow, @sigmavirus24, @alekstorm, I need your wealth of experience. If you know anyone else with good ideas in this area I'd love to hear it.

@dimaqq
Copy link

dimaqq commented Jul 15, 2014

Proposal 1d:

In a typical setting, a single separate thread handles background tasks for all connections.

Default API automagically creates a background singleton object and its thread.

This background entry point is exposed allowing hard core user to change 1-N mapping to MxN or handle background tasks in their own background thread.

In any case, user can supply some form of callback to handle server push within the context of whatever background thread.

Databases do this, e.g. mysql/innodb has dedicated file io thread(s). For a while a single thread was enough, later multiple threads were implemented as performance expectations grew.

@Lukasa
Copy link
Member Author

Lukasa commented Jul 15, 2014

@dimaqq That's a good proposal as well. The server push issue is unnecessary, the current development branch handles server push just fine, we won't need any separate logic (though a callback might be nice).

The issue it suffers from is the same as 1: threads are painful on CPython. The GIL on Pythons earlier than 3.2 does not play nice with multiple cores, leading to GIL thrashing. This causes a performance penalty from adding threads to the code. This is Bad.

Even worse, the new GIL from Python 3.2 also plays poorly with IO, leading to huge performance problems when under heavy IO load (like in HTTP/2!).

I am concerned that adding threads here might send performance through the floor, and that's before we even begin to consider how rude it is for a library to start casually launching its own threads.

@dimaqq
Copy link

dimaqq commented Jul 15, 2014

@Lukasa please validate your statement with benchmarks.

From my experience, threads are quite fine for [essentially] blocking io.

Yes there is lag in synch primitives when using timeouts in 2.x series, but does that mean threads are banned?

Wrt. magically created thread, I reckon it's quite fine as long as it's exported via some sort of Session or Context object. Sure, some care is needed to gc connections when user discards their references, but then that's what weakref is for :)

@Lukasa
Copy link
Member Author

Lukasa commented Jul 15, 2014

@dimaqq Dave Beazley's got the best evaluation I've seen, here, with even more data here. Highlights:

With multiple cores, runnable threads get scheduled simultaneously (on different cores) and battle over the GIL.

  • I/O ops often do not block.
  • Due to buffering, the OS is able to fulfill I/O requests immediately and keep a thread running
  • However, the GIL is always released
  • Results in GIL thrashing under heavy load

Regarding the Python 3.2 GIL, see slides 50 through 54, with the headline numbers being:

Send 10MB of data to an echo server thread that's competing with a CPU-bound thread

  • Python 2.6.4 (2 CPU) : 0.57s (10 sample average)
  • Python 3.2 (2 CPU) : 12.4s (20x slower)

What if echo competes with 2 CPU threads?

  • Python 2.6.4 (2 CPU) : 0.25s (Better performance?)
  • Python 3.2 (2 CPU) : 46.9s (4x slower than before)
  • Python 3.2 (1 CPU) : 0.14s (330x faster than 2 cores?)

@dimaqq
Copy link

dimaqq commented Jul 15, 2014

@Lukasa, Dave's presentations are 5 and 4 years old respectively, a lot has been improved since.

Please run your own benchmarks.

Wrt. actual use cases, I suppose you have a point that hyper targets at least two markedly different environments:

  • sibling vm's: 100usec RTT, no TLS
  • WAN: 100ms RTT, always with TLS

IMO as long as these two are fine, any approach is valid.

@Lukasa
Copy link
Member Author

Lukasa commented Jul 15, 2014

A lot has been improved, but hyper supports Python 2.7 whose GIL has not been improved. I am also not aware of any further GIL improvements. Without changes to the GIL I see no reason to assume the performance characteristics have changed at this time.

Benchmarking hyper in this regard is difficult because hyper is not thread safe. I'd have to spend a few days adding thread safety into hyper before any benchmark can be run. It's generally better to take advantage of any benchmarking that already exists, and I've not yet seen a compelling reason to disagree with Dave's original assessment.

@dimaqq
Copy link

dimaqq commented Jul 15, 2014

@Lukasa, please note that

  • Dave's original workload was cpu-bound in pure python. hyper's is not.
  • io convoy issue affects... correction: all py 3.x version affected, degradation 3x ~ 4x

@Lukasa
Copy link
Member Author

Lukasa commented Jul 15, 2014

hyper is substantially more compute heavy than you would expect naively. HTTP/2 adds a large quantity of compute overhead to previously highly IO-based workloads. For example, uploading a 50 MB file will now result in the emission of at least 3200 DATA frames (requiring the frames be built and written) and the reception of at least 800 WINDOW_UPDATE frames (requiring reads, deserialization and processing). Multiplexing makes this worse because we now need to do this for every outstanding HTTP request to the same host.

Note also that hyper is still pure-Python.

As for the IO convoy issue only affecting 3.2, that's not my understanding. Python Issue 7964 tracks this issue and is still open, at the very least against Python 3.3.

I agree that it may not be as bad as Dave's example, but I suspect it has the potential to be problematic. I'll see if I can find time to do this benchmarking though.

@dimaqq
Copy link

dimaqq commented Jul 15, 2014

correct re: 3.x regression; updated.

personally I don't see that as a valid reason to discard threading altogether; perhaps in the end a benchmark of hyper specifically will resolve this concern once and for all. anyway you are the maintainer, it's your decision.

@sigmavirus24
Copy link
Contributor

@dimaqq as the only person arguing for threading, why don't you do benchmarks to convince @Lukasa otherwise?

@shazow
Copy link

shazow commented Jul 15, 2014

From my experience dealing with threads (largely with workerpool and urllib3's dummyserver), it's nothing short of a disaster. It's almost impossible to make a Python library spawn its own thread and not somehow interfere with the program trying to use it. [0]

If you want to be really generous, you could provide an interface that people can use within their own managed side thread. Maybe even have some helpers for launching a thread for them. But I'd be very very reluctant to try to launch threads and manage them in the background without the user's awareness.

Personally, I'd just target Py34's new asyncio stuff and worry about backporting it later.

[0] Dealing with threads sanely in Python usually involves exit hooks, signal capturing, and all kinds of unfun things.

@Lukasa
Copy link
Member Author

Lukasa commented Mar 10, 2015

Why not both?

Seriously, I don't know why it took me so long to come up with this idea, but it should be entirely possible to write a Connection class that expects to run in a 'concurrent' world. In fact, it should be possible to have it be 'Go-like' in its philosophy: no shared memory, all communication via queues. In that model, I believe (though I cannot yet prove) that I can do enough by saying 'give me a callable I can treat as a thread-launcher, and give me another callable I can treat as a queue-factory'.

Such a class would have one 'thread' (or greenlet or coroutine or whatever) that owns the socket for reading/writing, one that manages connection state and receives requests on a queue, and sends responses on a queue (or satisfies a future or whatever), and then one that provides a 'thread-safe UI' (basically just builds objects and chucks them on queues). That might allow a relatively high performance (certainly more concurrent) programming model without preventing us satisfying a 'requests-like' interface.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

No branches or pull requests

4 participants