-
Notifications
You must be signed in to change notification settings - Fork 193
Better 'concurrency'. #65
Comments
Proposal 1d: In a typical setting, a single separate thread handles background tasks for all connections. Default API automagically creates a background singleton object and its thread. This background entry point is exposed allowing hard core user to change 1-N mapping to MxN or handle background tasks in their own background thread. In any case, user can supply some form of callback to handle server push within the context of whatever background thread. Databases do this, e.g. mysql/innodb has dedicated file io thread(s). For a while a single thread was enough, later multiple threads were implemented as performance expectations grew. |
@dimaqq That's a good proposal as well. The server push issue is unnecessary, the current development branch handles server push just fine, we won't need any separate logic (though a callback might be nice). The issue it suffers from is the same as 1: threads are painful on CPython. The GIL on Pythons earlier than 3.2 does not play nice with multiple cores, leading to GIL thrashing. This causes a performance penalty from adding threads to the code. This is Bad. Even worse, the new GIL from Python 3.2 also plays poorly with IO, leading to huge performance problems when under heavy IO load (like in HTTP/2!). I am concerned that adding threads here might send performance through the floor, and that's before we even begin to consider how rude it is for a library to start casually launching its own threads. |
@Lukasa please validate your statement with benchmarks. From my experience, threads are quite fine for [essentially] blocking io. Yes there is lag in synch primitives when using timeouts in 2.x series, but does that mean threads are banned? Wrt. magically created thread, I reckon it's quite fine as long as it's exported via some sort of |
@dimaqq Dave Beazley's got the best evaluation I've seen, here, with even more data here. Highlights:
Regarding the Python 3.2 GIL, see slides 50 through 54, with the headline numbers being:
|
@Lukasa, Dave's presentations are 5 and 4 years old respectively, a lot has been improved since. Please run your own benchmarks. Wrt. actual use cases, I suppose you have a point that
IMO as long as these two are fine, any approach is valid. |
A lot has been improved, but Benchmarking |
@Lukasa, please note that
|
Note also that hyper is still pure-Python. As for the IO convoy issue only affecting 3.2, that's not my understanding. Python Issue 7964 tracks this issue and is still open, at the very least against Python 3.3. I agree that it may not be as bad as Dave's example, but I suspect it has the potential to be problematic. I'll see if I can find time to do this benchmarking though. |
correct re: 3.x regression; updated. personally I don't see that as a valid reason to discard threading altogether; perhaps in the end a benchmark of hyper specifically will resolve this concern once and for all. anyway you are the maintainer, it's your decision. |
From my experience dealing with threads (largely with workerpool and urllib3's dummyserver), it's nothing short of a disaster. It's almost impossible to make a Python library spawn its own thread and not somehow interfere with the program trying to use it. [0] If you want to be really generous, you could provide an interface that people can use within their own managed side thread. Maybe even have some helpers for launching a thread for them. But I'd be very very reluctant to try to launch threads and manage them in the background without the user's awareness. Personally, I'd just target Py34's new asyncio stuff and worry about backporting it later. [0] Dealing with threads sanely in Python usually involves exit hooks, signal capturing, and all kinds of unfun things. |
Why not both? Seriously, I don't know why it took me so long to come up with this idea, but it should be entirely possible to write a Such a class would have one 'thread' (or greenlet or coroutine or whatever) that owns the socket for reading/writing, one that manages connection state and receives requests on a queue, and sends responses on a queue (or satisfies a future or whatever), and then one that provides a 'thread-safe UI' (basically just builds objects and chucks them on queues). That might allow a relatively high performance (certainly more concurrent) programming model without preventing us satisfying a 'requests-like' interface. |
Right now
hyper
doesn't read frames until the user attempts to read data or our connection window closes. This is obviously a problem:We need some solutions to this. Proposals:
send
-ing operations.Any other ideas? /cc @shazow, @sigmavirus24, @alekstorm, I need your wealth of experience. If you know anyone else with good ideas in this area I'd love to hear it.
The text was updated successfully, but these errors were encountered: