Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

broker: eliminate some message copies #5559

Merged
merged 11 commits into from
Nov 22, 2023

Conversation

garlick
Copy link
Member

@garlick garlick commented Nov 15, 2023

Problem: the broker copies messages when it doesn't have to.

This PR eliminates some places in the broker where messages are needlessly copied. I had hoped that by making the local paths between ingest -> kvs, ingest -> job manager, kvs -> content cache effectively zero copy, there would be a measurable effect on the job throughput test. Well, not really.

I'll mark this WIP for now and revisit tomorrow to make sure I'm not missing something.

The other thing here is I replaced the strange half-constructed connector with back-to-back interthread handles. It should make the broker code a little easier to understand.

Copy link
Contributor

@grondo grondo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I started going through this before I noticed that it was still a WIP. Found a couple commit message typos that I'll leave here in case it's helpful.

@@ -78,9 +78,15 @@ static void broker_request_sendmsg (broker_ctx_t *ctx, const flux_msg_t *msg);
static int broker_request_sendmsg_internal (broker_ctx_t *ctx,
const flux_msg_t *msg);

static void h_internal_watcher (flux_reactor_t *r,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Commit message typo: shoudl

static void overlay_recv_cb (const flux_msg_t *msg,
overlay_where_t where,
void *arg);
static int overlay_recv_cb (flux_msg_t **msg, overlay_where_t where, void *arg);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Commit message typo: overlay_fecv_f

Copy link
Contributor

@grondo grondo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

BTW, I've taken a look at this one a couple times and it looks great! (only noticed the commit message typos so far). I was going to do some profiling again with this + #5585 just to see if anything pops out, but marking this approved even though it is still a WIP in case that helps move things along.

@garlick
Copy link
Member Author

garlick commented Nov 21, 2023

Thanks! I was a little disappointed by the test results, but I guess might as well get it moving.

@grondo
Copy link
Contributor

grondo commented Nov 21, 2023

Here's a series of 10 throughput tests (each run in a freshly started instance) on this branch (avg 540 job/s)

$  for i in {1..10}; do src/cmd/flux start -s4 ./src/test/throughput.py -n 2048; done | grep ^throughput
throughput:     515.5 job/s (script: 487.8 job/s)
throughput:     546.1 job/s (script: 515.4 job/s)
throughput:     537.4 job/s (script: 508.3 job/s)
throughput:     528.4 job/s (script: 500.0 job/s)
throughput:     529.0 job/s (script: 499.9 job/s)
throughput:     548.9 job/s (script: 517.1 job/s)
throughput:     539.0 job/s (script: 508.8 job/s)
throughput:     543.2 job/s (script: 513.0 job/s)
throughput:     577.2 job/s (script: 543.4 job/s)
throughput:     539.3 job/s (script: 509.9 job/s)

vs current master (avg 516 job/s)

$  for i in {1..10}; do src/cmd/flux start -s4 ./src/test/throughput.py -n 2048; done | grep ^throughput
throughput:     472.0 job/s (script: 449.1 job/s)
throughput:     504.0 job/s (script: 477.8 job/s)
throughput:     525.8 job/s (script: 497.4 job/s)
throughput:     494.5 job/s (script: 469.1 job/s)
throughput:     519.7 job/s (script: 491.9 job/s)
throughput:     495.0 job/s (script: 469.5 job/s)
throughput:     531.8 job/s (script: 502.6 job/s)
throughput:     511.7 job/s (script: 485.2 job/s)
throughput:     554.3 job/s (script: 522.9 job/s)
throughput:     559.9 job/s (script: 526.9 job/s)

@garlick
Copy link
Member Author

garlick commented Nov 21, 2023

Averaging the jobs/s (first number), I get 540 for this PR vs 516 for master which is like a 4% improvement. 🤷 better than a poke in the eye I guess.

@grondo
Copy link
Contributor

grondo commented Nov 21, 2023

Averaging the jobs/s (first number), I get 540 for this PR vs 516 for master which is like a 4% improvement

Yeah, I think what we're hitting here is other things are slower for running jobs that moving messages (probably kvs). With profiling I did notice a drop in the contribution of the module_sendmsg/module_sendmsg_new so at least we're not heating up the universe as much as before.

@garlick garlick changed the title WIP broker: eliminate some message copies broker: eliminate some message copies Nov 21, 2023
@garlick
Copy link
Member Author

garlick commented Nov 21, 2023

Dropping the WIP. I think this is OK but I want to take one more pass through the code to make sure.

Problem: some older broker code does not conform to modern
project norms.

Break long parameter lists to one per line.
Break long conditionals to one per line with one indent.
Begin conditional blocks at the same indent as above.
Problem: the broker has a strange flux_t handle configuration
dating back to early days of Flux before we had nice things.

It is an exposed handle implementation with only a send() operation.
Instead of queuing messages, send() directly passes messages to routing
logic.  When routing logic wants to send a message back to the handle, it
uses flux_requeue (RQ_TAIL).

Replace that with back to back interthread flux_t handles. This more
conventional setup should be less fragile and easier to reason about.

One material change is that messages sent on the handle (e.g. when some
part of the broker makes an RPC) are now queued and later dequeued by
a reactor handler and routed, rather than immediately routed.  As a
consequence, synchronous RPCs on the broker handle cannot work at all.
Problem: module_sendmsg() and service_send() prototypes
encourage message copying.

Add module_sendmsg_new() and service_send_new(), variants
that steal their message argument.
Problem: the overlay_recv_f callback prototype encourages
message copying.

Change it so the callback steals the msg argument.
Problem: overlay_sendmsg() prototype encourages message copying.

Add overlay_sendmsg_new() which steals the msg reference.

Update internal "mcast" code to avoid copying the message.
Problem: broker_response_sendmsg(), the internal function
used to route response messages, uses interfaces that copy
messages.

Convert to broker_response_sendmsg_new() which steals its message
reference. Internally, use the _new() functions to route messages
and avoid copying.
Problem: broker_event_sendmsg(), the internal function
used to route event messages, uses interfaces that copy
messages.

Convert to broker_event_sendmsg_new() which steals its message
reference. Internally, use the _new() functions to route messages
and avoid copying.
Problem: broker_request_sendmsg(), the internal function
used to route request messages, uses interfaces that copy
messages.

Convert to broker_request_sendmsg_new() which steals its message
reference. Internally, use the _new() functions to route messages
and avoid copying.

Also convert broker_request_sendmsg_internal() to
broker_request_sendmsg_new_internal().
Problem: modhash_response_sendmsg() encourages message copying.

Replace it with modhash_response_sendmsg_new() which steals the
'msg' argument.
Problem: service_send() has no users.

Drop the function.

Update unit tests to use service_send_new() exclusively.
Problem: module_sendmsg() has no users.

Drop the function.
@garlick
Copy link
Member Author

garlick commented Nov 22, 2023

OK, I didn't spot any problems so I'll set MWP.

Copy link

codecov bot commented Nov 22, 2023

Codecov Report

Merging #5559 (0c336eb) into master (fcee970) will decrease coverage by 0.04%.
The diff coverage is 83.33%.

Additional details and impacted files
@@            Coverage Diff             @@
##           master    #5559      +/-   ##
==========================================
- Coverage   83.46%   83.42%   -0.04%     
==========================================
  Files         486      486              
  Lines       82458    82450       -8     
==========================================
- Hits        68822    68783      -39     
- Misses      13636    13667      +31     
Files Coverage Δ
src/broker/modhash.c 85.00% <100.00%> (ø)
src/broker/service.c 84.48% <100.00%> (ø)
src/broker/module.c 78.23% <88.88%> (-0.12%) ⬇️
src/broker/overlay.c 84.32% <85.10%> (-0.07%) ⬇️
src/broker/broker.c 78.78% <80.95%> (-0.18%) ⬇️

... and 16 files with indirect coverage changes

@mergify mergify bot merged commit 420e8e7 into flux-framework:master Nov 22, 2023
32 of 33 checks passed
@garlick garlick deleted the broker_nocopy branch November 26, 2023 00:03
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants