-
Notifications
You must be signed in to change notification settings - Fork 272
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Further step_or_park
implementations [WIP]
#268
Conversation
Heads up to @comnik @ryzhyk @benesch The current version seems to work with the The downside is that it slows things down in this benchmark, from about 6s using two workers to about 9s. Something like this is to be expected, as threads may now be parked where they would otherwise be just about to receive some more work to do, and so get unparked, and .. it wasn't going to go faster for sure. So, latency has the potential to increase, significantly in tight loops, which may mean that users need to be smart (only park after so many microseconds of no work). We may need to communicate more back to the user (I don't think they can tell if there was no work). If anyone wants to take this out for a spin and see how it looks, that would be great. There are still certain operators that will keep a worker live and spinning (e.g. |
Oops. Except some tests fail now that I actually started calling the code correctly. Await further instructions! |
I believe this is now in a plausibly functional state. I'll look in to adding the other implementations soon; it was surprisingly easy for the MPSC case (modulo bugs we haven't found yet)! |
So David (I can't seem to tag him here) reports that the original single-threaded step_or_park implementation is working for us. He did run into the missing implementation for the generic allocator, but that shouldn't be a problem on this branch. We haven't tried this one yet, but the basic usage model seems to fit our needs. Thanks a lot for this! |
Yes, that missing implementation meant that for a while default timely wasn't even using any of the new code. That got added and things should be actually parking now. :) |
Ok, we swapped Declarative around to make use of this. We settled on this: https://github.com/comnik/declarative-dataflow/blob/7c344c5496d7b7a1040c3552a661312e1e2f7e96/server/src/main.rs#L712, where we perform a fixed number of non-parking steps, then advance traces, and only after that The reasoning was that doing step_or_park() in the loop seemed a bit wasteful, and potentially dangerous in terms of blocking network inputs further up the loop. And advancing before the final step_or_park gives us a chance to do some useful work (merging) before going to sleep. Unfortunately the dependency mess foiled my attempt at running this with this PR included, I'll try that again later. (edit: but David did confirm, with |
My guess is that the right idiom here will probably be to have any thread calling If the worker threads wants to regularly perform network IO itself, then it should probably call either Any pushback on the appropriateness of the threading model is welcome, of course, but everyone wants to be the one to own blocking and unblocking of threads, so it is hard... :) |
No I think the trade-off is good that way :) Our only risk now is a bit of delay in accepting client inputs (which are not latency critical anyways), I think, and we can always move to a dedicated networking thread. |
Tried again on this branch. With a single process, everything seems to work fine, with significantly reduced cpu usage. However in intra-process configurations, weird things happen.
(edit: apart from that, CPU usage is also significantly reduced in cluster configurations, so a partial victory!) |
I've clearly screwed something up then. :) I'll take a peek on Thursday at the earliest, I'm afraid! |
Peek taken, and at least one bug fixed. Small examples seem to work at the moment, in multi-process mode. |
Things seem to work correctly now :) Tested all configurations. |
This PR is a work in progress of
await_events
methods for communicators which enable thestep_or_park
functionality. The first commit adds preliminary support for the intra-process allocator (not the serializing one). It has not been tested beyond the standard test suite (which has a few multi-worker tests, but not many).