Improve efficiency of asynchronous futures #1840

grondo · 2018-11-16T16:23:52Z

As described in #1839, this PR improves efficiency of asynchronous use of flux_future_t by eliminating the prepare watcher and only starting the check and idle watchers at the time of fulfillment instead of immediately when flux_future_then(3) is called. This reduces the number of active watchers significantly when there are many unfulfilled futures associated with the reactor loop.

This PR should be carefully examined and tested to ensure I haven't missed some subtle use case that is not covered in our testsuite. During development, I did find one case that was missed by the unit tests and luckily caught by another test in make check. I'll see if I can figure out what that particular use case was, and codify it in the future_t unit tests.

codecov-io · 2018-11-16T16:43:06Z

Codecov Report

Merging #1840 into master will increase coverage by <.01%.
The diff coverage is 100%.

@@            Coverage Diff             @@
##           master    #1840      +/-   ##
==========================================
+ Coverage   79.91%   79.91%   +<.01%     
==========================================
  Files         196      196              
  Lines       35267    35263       -4     
==========================================
- Hits        28185    28182       -3     
+ Misses       7082     7081       -1

Impacted Files	Coverage Δ
src/common/libflux/future.c	`87.29% <100%> (-0.17%)`	⬇️
src/common/libflux/response.c	`79.62% <0%> (-1.24%)`	⬇️
src/common/libflux/message.c	`81.51% <0%> (-0.13%)`	⬇️
src/broker/module.c	`83.83% <0%> (+0.27%)`	⬆️
src/common/libflux/mrpc.c	`87.89% <0%> (+1.17%)`	⬆️

grondo · 2018-11-16T18:35:32Z

Ok, I've pushed some updates to libflux/test/future.c that I think exercise the case I hit in development of this PR. The main case, iirc, was a multiple result future where the first result is obtained synchronously. In one version of this PR the subsequent async continuation was never called because watchers were not started (I don't remember the exact reason why, sorry). This case was luckily exercised by t/kvs/commit_order.c.

garlick · 2018-11-17T15:30:50Z

Here's a little test that indicates this PR has a positive impact on scaling of concurrent RPCs, versus current master (results are wall clock, based on one sample, run on my single-core Ubuntu VM, no flux-security):

$ time flux job submitbench --fanout=FANOUT --repeat=4096 basic.yaml

fanout	master `8c23603` (sec)	future-efficiency (sec)
256	14.156	13.101
512	14.844	12.116
1024	15.781	12.772
2048	16.235	11.336
4096	16.813	10.101

(each run was in a fresh instance, so KVS content was not cumulative)

garlick · 2018-11-17T15:33:56Z

My vote is to put this in. It might be good to get one more set of eyes on it though first - @chu11?

grondo · 2018-11-17T16:50:43Z

Thanks for taking an extra careful look @garlick, @chu11!

chu11 · 2018-11-20T21:39:01Z

took a look and everything LGTM

chu11 · 2018-11-20T22:52:20Z

restarted a builder that hit

  python/t0009-security.py:  PASS: N=2   PASS=2   FAIL=0 SKIP=0 XPASS=0 XFAIL=0
No output has been received in the last 10m0s, this potentially indicates a stalled build or something wrong with the build itself.
Check the details on how to adjust your build configuration on: https://docs.travis-ci.com/user/common-build-problems/#Build-times-out-because-no-output-was-received

@grondo if you're happy with it i can hit the button

grondo · 2018-11-20T22:56:39Z

History might look cleaner if #1850 goes in first, so we don't have a future sandwich between two kvs improvements. ;-)

garlick · 2018-11-20T23:19:08Z

Mmm, sandwich. One builder hit this valgrind error. I'll go ahead and restart it.

==1624== HEAP SUMMARY:
==1624==     in use at exit: 6,346,975 bytes in 182 blocks
==1624==   total heap usage: 952,580 allocs, 952,398 frees, 223,263,756 bytes allocated
==1624== 
==1624== 1,048,593 bytes in 1 blocks are possibly lost in loss record 99 of 102
==1624==    at 0x4C2FB0F: malloc (in /usr/lib/valgrind/vgpreload_memcheck-amd64-linux.so)
==1624==    by 0x4E74898: cbuf_create (cbuf.c:233)
==1624==    by 0x4E639FC: flux_buffer_create (buffer.c:80)
==1624==    by 0x4E6F0AC: remote_channel_setup (remote.c:354)
==1624==    by 0x4E6F69B: remote_setup_stdio (remote.c:443)
==1624==    by 0x4E6F69B: subprocess_remote_setup (remote.c:493)
==1624==    by 0x4E72C1A: flux_rexec (subprocess.c:677)
==1624==    by 0xB7CD6CB: spawn_exec_handler (job.c:694)
==1624==    by 0xB7CD6CB: runevent_continuation (job.c:757)
==1624==    by 0x4E88E12: ev_invoke_pending (ev.c:3314)
==1624==    by 0x4E8C3D8: ev_run (ev.c:3717)
==1624==    by 0x4E589E2: flux_reactor_run (reactor.c:140)
==1624==    by 0xB7CE10F: mod_main (job.c:938)
==1624==    by 0x1144EB: module_thread (module.c:157)
==1624==    by 0x55BC6DA: start_thread (pthread_create.c:463)
==1624==    by 0x636388E: clone (clone.S:95)

chu11 · 2018-11-20T23:39:50Z

@garlick hmmm, appears to be new. Don't know if it's a new variant of #1641

Problem: several places in libflux/future.c test if a future is ready or not ready by checking both f->result_valid *and* f->fatal_errnum_valid. This requirement could too easily lead to a future maintainer (hah) forgetting one of these checks, so abstract this simple test into a convenience function and use it throughout the code. This change also cleans up `flux_future_is_ready()` to use the new function. Though the function handily used `flux_future_wait_for (f, 0.)` to test for readiness, in the end that amounted to the same check implemented in the new `future_is_ready`, and use of that function is more clear.

Problem: futures run in asynchronous mode have their prepare and check watchers started immediately when `flux_future_then(3)` is called. This means that the `prepare_cb` and `check_cb` are run for every unfulfilled future on every reactor loop iteration. In a process with many futures (e.g. thousands of outstanding RPCs) this can result in a large slowdown. Instead of starting the prepare and check watchers at the time `flux_future_then` is called, start the watchers only after the future has been fulfilled (with result or fatal error) by calling `then_context_start` from `post_fulfill` Fixes flux-framework#1839

The flux_future_t prepare watcher callback is currently used only to start the idle watcher. Eliminate the middle man and start the idle watcher directly in `then_context_start`.

Add unit tests to ensure fatal errors in flux_future_t are handled in asynchronous mode (then context) both before and after a synchronous get of the error.

Clean up leaked flux_reactor_t in libflux/test/future.c: test_simple().

Ensure a case where a multiple-result future is use first synchronously then asynchronously is covered in the unit tests.

grondo · 2018-11-21T00:33:02Z

Hit another "no output received" timeout after python/t0009-security.py and restarted

chu11 · 2018-11-21T01:49:13Z

man, another hang, restared

chu11 · 2018-11-21T02:31:17Z

finally it all passed!

grondo force-pushed the future-efficiency branch from 5e17ad8 to 1159c30 Compare November 16, 2018 16:25

grondo force-pushed the future-efficiency branch from a63fe70 to 873cc7f Compare November 19, 2018 19:08

garlick requested a review from chu11 November 20, 2018 02:35

garlick mentioned this pull request Nov 20, 2018

kvs: Support FLUX_KVS_WATCH_FULL #1848

Merged

grondo force-pushed the future-efficiency branch from 873cc7f to d457dc9 Compare November 20, 2018 21:31

grondo added 6 commits November 20, 2018 15:43

libflux: eliminate prepare watcher for futures

b3810a3

The flux_future_t prepare watcher callback is currently used only to start the idle watcher. Eliminate the middle man and start the idle watcher directly in `then_context_start`.

test: libflux: test fatal errors on futures in async mode

375bc88

Add unit tests to ensure fatal errors in flux_future_t are handled in asynchronous mode (then context) both before and after a synchronous get of the error.

test: libflux: free reactor in future unit tests

529c205

Clean up leaked flux_reactor_t in libflux/test/future.c: test_simple().

test: libflux: cover queued result futures in async mode

21cf90a

Ensure a case where a multiple-result future is use first synchronously then asynchronously is covered in the unit tests.

grondo force-pushed the future-efficiency branch from d457dc9 to 21cf90a Compare November 20, 2018 23:43

chu11 approved these changes Nov 20, 2018

View reviewed changes

chu11 merged commit bbe885e into flux-framework:master Nov 21, 2018

grondo deleted the future-efficiency branch February 8, 2019 00:43

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Improve efficiency of asynchronous futures #1840

Improve efficiency of asynchronous futures #1840

grondo commented Nov 16, 2018

codecov-io commented Nov 16, 2018 •

edited

Loading

grondo commented Nov 16, 2018

garlick commented Nov 17, 2018 •

edited

Loading

garlick commented Nov 17, 2018

grondo commented Nov 17, 2018

chu11 commented Nov 20, 2018

chu11 commented Nov 20, 2018

grondo commented Nov 20, 2018

garlick commented Nov 20, 2018

chu11 commented Nov 20, 2018

grondo commented Nov 21, 2018

chu11 commented Nov 21, 2018

chu11 commented Nov 21, 2018

Improve efficiency of asynchronous futures #1840

Improve efficiency of asynchronous futures #1840

Conversation

grondo commented Nov 16, 2018

codecov-io commented Nov 16, 2018 • edited Loading

Codecov Report

grondo commented Nov 16, 2018

garlick commented Nov 17, 2018 • edited Loading

garlick commented Nov 17, 2018

grondo commented Nov 17, 2018

chu11 commented Nov 20, 2018

chu11 commented Nov 20, 2018

grondo commented Nov 20, 2018

garlick commented Nov 20, 2018

chu11 commented Nov 20, 2018

grondo commented Nov 21, 2018

chu11 commented Nov 21, 2018

chu11 commented Nov 21, 2018

codecov-io commented Nov 16, 2018 •

edited

Loading

garlick commented Nov 17, 2018 •

edited

Loading