-
Notifications
You must be signed in to change notification settings - Fork 50
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Improve efficiency of asynchronous futures #1840
Conversation
5e17ad8
to
1159c30
Compare
Codecov Report
@@ Coverage Diff @@
## master #1840 +/- ##
==========================================
+ Coverage 79.91% 79.91% +<.01%
==========================================
Files 196 196
Lines 35267 35263 -4
==========================================
- Hits 28185 28182 -3
+ Misses 7082 7081 -1
|
Ok, I've pushed some updates to |
Here's a little test that indicates this PR has a positive impact on scaling of concurrent RPCs, versus current master (results are wall clock, based on one sample, run on my single-core Ubuntu VM, no flux-security):
(each run was in a fresh instance, so KVS content was not cumulative) |
My vote is to put this in. It might be good to get one more set of eyes on it though first - @chu11? |
a63fe70
to
873cc7f
Compare
873cc7f
to
d457dc9
Compare
took a look and everything LGTM |
restarted a builder that hit
@grondo if you're happy with it i can hit the button |
History might look cleaner if #1850 goes in first, so we don't have a future sandwich between two kvs improvements. ;-) |
Mmm, sandwich. One builder hit this valgrind error. I'll go ahead and restart it.
|
Problem: several places in libflux/future.c test if a future is ready or not ready by checking both f->result_valid *and* f->fatal_errnum_valid. This requirement could too easily lead to a future maintainer (hah) forgetting one of these checks, so abstract this simple test into a convenience function and use it throughout the code. This change also cleans up `flux_future_is_ready()` to use the new function. Though the function handily used `flux_future_wait_for (f, 0.)` to test for readiness, in the end that amounted to the same check implemented in the new `future_is_ready`, and use of that function is more clear.
Problem: futures run in asynchronous mode have their prepare and check watchers started immediately when `flux_future_then(3)` is called. This means that the `prepare_cb` and `check_cb` are run for every unfulfilled future on every reactor loop iteration. In a process with many futures (e.g. thousands of outstanding RPCs) this can result in a large slowdown. Instead of starting the prepare and check watchers at the time `flux_future_then` is called, start the watchers only after the future has been fulfilled (with result or fatal error) by calling `then_context_start` from `post_fulfill` Fixes flux-framework#1839
The flux_future_t prepare watcher callback is currently used only to start the idle watcher. Eliminate the middle man and start the idle watcher directly in `then_context_start`.
Add unit tests to ensure fatal errors in flux_future_t are handled in asynchronous mode (then context) both before and after a synchronous get of the error.
Clean up leaked flux_reactor_t in libflux/test/future.c: test_simple().
Ensure a case where a multiple-result future is use first synchronously then asynchronously is covered in the unit tests.
d457dc9
to
21cf90a
Compare
Hit another "no output received" timeout after |
man, another hang, restared |
finally it all passed! |
As described in #1839, this PR improves efficiency of asynchronous use of
flux_future_t
by eliminating theprepare
watcher and only starting thecheck
andidle
watchers at the time of fulfillment instead of immediately whenflux_future_then(3)
is called. This reduces the number of active watchers significantly when there are many unfulfilled futures associated with the reactor loop.This PR should be carefully examined and tested to ensure I haven't missed some subtle use case that is not covered in our testsuite. During development, I did find one case that was missed by the unit tests and luckily caught by another test in
make check
. I'll see if I can figure out what that particular use case was, and codify it in the future_t unit tests.