-
Notifications
You must be signed in to change notification settings - Fork 51
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
broker: eliminate some message copies #5559
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I started going through this before I noticed that it was still a WIP. Found a couple commit message typos that I'll leave here in case it's helpful.
@@ -78,9 +78,15 @@ static void broker_request_sendmsg (broker_ctx_t *ctx, const flux_msg_t *msg); | |||
static int broker_request_sendmsg_internal (broker_ctx_t *ctx, | |||
const flux_msg_t *msg); | |||
|
|||
static void h_internal_watcher (flux_reactor_t *r, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Commit message typo: shoudl
static void overlay_recv_cb (const flux_msg_t *msg, | ||
overlay_where_t where, | ||
void *arg); | ||
static int overlay_recv_cb (flux_msg_t **msg, overlay_where_t where, void *arg); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Commit message typo: overlay_fecv_f
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
BTW, I've taken a look at this one a couple times and it looks great! (only noticed the commit message typos so far). I was going to do some profiling again with this + #5585 just to see if anything pops out, but marking this approved even though it is still a WIP in case that helps move things along.
Thanks! I was a little disappointed by the test results, but I guess might as well get it moving. |
Here's a series of 10 throughput tests (each run in a freshly started instance) on this branch (avg 540 job/s) $ for i in {1..10}; do src/cmd/flux start -s4 ./src/test/throughput.py -n 2048; done | grep ^throughput
throughput: 515.5 job/s (script: 487.8 job/s)
throughput: 546.1 job/s (script: 515.4 job/s)
throughput: 537.4 job/s (script: 508.3 job/s)
throughput: 528.4 job/s (script: 500.0 job/s)
throughput: 529.0 job/s (script: 499.9 job/s)
throughput: 548.9 job/s (script: 517.1 job/s)
throughput: 539.0 job/s (script: 508.8 job/s)
throughput: 543.2 job/s (script: 513.0 job/s)
throughput: 577.2 job/s (script: 543.4 job/s)
throughput: 539.3 job/s (script: 509.9 job/s) vs current master (avg 516 job/s) $ for i in {1..10}; do src/cmd/flux start -s4 ./src/test/throughput.py -n 2048; done | grep ^throughput
throughput: 472.0 job/s (script: 449.1 job/s)
throughput: 504.0 job/s (script: 477.8 job/s)
throughput: 525.8 job/s (script: 497.4 job/s)
throughput: 494.5 job/s (script: 469.1 job/s)
throughput: 519.7 job/s (script: 491.9 job/s)
throughput: 495.0 job/s (script: 469.5 job/s)
throughput: 531.8 job/s (script: 502.6 job/s)
throughput: 511.7 job/s (script: 485.2 job/s)
throughput: 554.3 job/s (script: 522.9 job/s)
throughput: 559.9 job/s (script: 526.9 job/s) |
Averaging the jobs/s (first number), I get 540 for this PR vs 516 for master which is like a 4% improvement. 🤷 better than a poke in the eye I guess. |
Yeah, I think what we're hitting here is other things are slower for running jobs that moving messages (probably kvs). With profiling I did notice a drop in the contribution of the module_sendmsg/module_sendmsg_new so at least we're not heating up the universe as much as before. |
Dropping the WIP. I think this is OK but I want to take one more pass through the code to make sure. |
Problem: some older broker code does not conform to modern project norms. Break long parameter lists to one per line. Break long conditionals to one per line with one indent. Begin conditional blocks at the same indent as above.
Problem: the broker has a strange flux_t handle configuration dating back to early days of Flux before we had nice things. It is an exposed handle implementation with only a send() operation. Instead of queuing messages, send() directly passes messages to routing logic. When routing logic wants to send a message back to the handle, it uses flux_requeue (RQ_TAIL). Replace that with back to back interthread flux_t handles. This more conventional setup should be less fragile and easier to reason about. One material change is that messages sent on the handle (e.g. when some part of the broker makes an RPC) are now queued and later dequeued by a reactor handler and routed, rather than immediately routed. As a consequence, synchronous RPCs on the broker handle cannot work at all.
Problem: module_sendmsg() and service_send() prototypes encourage message copying. Add module_sendmsg_new() and service_send_new(), variants that steal their message argument.
Problem: the overlay_recv_f callback prototype encourages message copying. Change it so the callback steals the msg argument.
Problem: overlay_sendmsg() prototype encourages message copying. Add overlay_sendmsg_new() which steals the msg reference. Update internal "mcast" code to avoid copying the message.
Problem: broker_response_sendmsg(), the internal function used to route response messages, uses interfaces that copy messages. Convert to broker_response_sendmsg_new() which steals its message reference. Internally, use the _new() functions to route messages and avoid copying.
Problem: broker_event_sendmsg(), the internal function used to route event messages, uses interfaces that copy messages. Convert to broker_event_sendmsg_new() which steals its message reference. Internally, use the _new() functions to route messages and avoid copying.
Problem: broker_request_sendmsg(), the internal function used to route request messages, uses interfaces that copy messages. Convert to broker_request_sendmsg_new() which steals its message reference. Internally, use the _new() functions to route messages and avoid copying. Also convert broker_request_sendmsg_internal() to broker_request_sendmsg_new_internal().
Problem: modhash_response_sendmsg() encourages message copying. Replace it with modhash_response_sendmsg_new() which steals the 'msg' argument.
Problem: service_send() has no users. Drop the function. Update unit tests to use service_send_new() exclusively.
Problem: module_sendmsg() has no users. Drop the function.
OK, I didn't spot any problems so I'll set MWP. |
497eecf
to
0c336eb
Compare
Codecov Report
Additional details and impacted files@@ Coverage Diff @@
## master #5559 +/- ##
==========================================
- Coverage 83.46% 83.42% -0.04%
==========================================
Files 486 486
Lines 82458 82450 -8
==========================================
- Hits 68822 68783 -39
- Misses 13636 13667 +31
|
Problem: the broker copies messages when it doesn't have to.
This PR eliminates some places in the broker where messages are needlessly copied. I had hoped that by making the local paths between ingest -> kvs, ingest -> job manager, kvs -> content cache effectively zero copy, there would be a measurable effect on the job throughput test. Well, not really.
I'll mark this WIP for now and revisit tomorrow to make sure I'm not missing something.
The other thing here is I replaced the strange half-constructed connector with back-to-back interthread handles. It should make the broker code a little easier to understand.