[CHANGED] refactored message delivery for future SubscribePullAsync #772

levb · 2024-07-19T12:59:06Z

Consolidated deliverMsg logic into a single function (see dispatch.c)
Added nats_OpenWithConfig for thread pool control, and in preparation for JSON-ifying default options, a separate request from @jnmoyne
[EXPERIMENTAL, not used/tested] Allow using a thread pool for reply delivery.
Code organization - broke up nats.c into ./glib/*.c

src/conn.c

codecov · 2024-07-19T13:04:12Z

Codecov Report

Attention: Patch coverage is 74.89025% with 286 lines in your changes missing coverage. Please review.

Please upload report for BASE (lev-bench2@1e675d6). Learn more about missing BASE report.

Files	Patch %	Lines
src/glib/glib.c	69.53%	21 Missing and 57 partials ⚠️
src/glib/glib_last_error.c	64.82%	18 Missing and 33 partials ⚠️
src/nats.c	56.97%	28 Missing and 9 partials ⚠️
src/sub.c	84.37%	10 Missing and 25 partials ⚠️
src/glib/glib_timer.c	81.50%	3 Missing and 24 partials ⚠️
src/glib/glib_dispatch_pool.c	70.12%	7 Missing and 16 partials ⚠️
src/glib/glib_async_cb.c	82.81%	5 Missing and 6 partials ⚠️
src/glib/glib_gc.c	78.57%	2 Missing and 4 partials ⚠️
src/js.c	81.48%	1 Missing and 4 partials ⚠️
src/dispatch.c	80.00%	0 Missing and 3 partials ⚠️
... and 6 more

Additional details and impacted files

@@              Coverage Diff              @@
##             lev-bench2     #772   +/-   ##
=============================================
  Coverage              ?   68.88%           
=============================================
  Files                 ?       49           
  Lines                 ?    15210           
  Branches              ?     3136           
=============================================
  Hits                  ?    10478           
  Misses                ?     1688           
  Partials              ?     3044

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

src/conn.c

src/glib/glib.c

levb · 2024-07-19T13:19:53Z

src/glib/glib.c

@@ -0,0 +1,491 @@
+// Copyright 2015-2024 The NATS Authors


NS: consider removing glib_ prefix from most filenames, and using more intention-revealing names (e.g. glib_async_cb -> async_error_cb?)

levb · 2024-07-19T13:21:20Z

src/js.c

@@ -840,19 +842,10 @@ _timeoutPubAsync(natsTimer *t, void *closure)
        m->sub = js->rsub;
        natsMsg_setTimeout(m);

+        // Best attempt, ignore NATS_SLOW_CONSUMER errors which may be returned


I did not test the changes in this path beyond passing the existing tests.

src/js.c

src/nats.h

src/natstime.c

test/test.c

kozlovic

That's a big one! Will likely have to have another look, but this is a first pass. I think we need to run some perf test to make sure that we are not degrading performance too much.

src/conn.c

kozlovic · 2024-07-19T21:36:14Z

src/conn.c

-        }
-        else
+        if (s == NATS_OK)
+            s = natsSub_enqueueMsg(sub, msg);


Have you tested the performance implication to invoke a function here as opposed to add to the list/notify as it used to be? Does not have to be super elaborate, but you could have nats-sub and nats-pub and see. If there is a major difference, you should see right away.

I did some cursory checks, but was planning to focus on it weekend, on the final 'PullSubscribeAsync' branch once I finish my struggle with the missing heartbeats. Any other ideas beyond [js-]pub->sub a million messages? I will add the comparison here. Also on this, did you have any tooling for CPU/mem profiling, and is there a built-in thread counter that I missed?

From my experience, you need more than 1M message, because otherwise the run may be a bit too fast to be representative. In early dev, I was using callgrind to do profiling.

re: is there a built-in thread counter, not sure what you are asking.

DISPATCH is blue

So there is a slowdown, if minimal. Strangely, I see it more on my machine (M2) with a particularly large difference at 5 threads (500 subs). Increasing the number of messages 10x did not introduce a meaningful difference. Investigating.

@kozlovic With the new benchmark changes in #774, will be posting updates soon, please stand by.

@kozlovic after some struggles I had to roll back significantly, going back to the original 2 deliverMsg functions.

My locking changes generally outperformed with a large number of threads/subs, but underperformed with a small number which is the dominant use case.

The bool->unsigned :1 change was surprisingly (for me) slower -1% to -3% on x86, while barely noticeable on my M2. It also didn't affect the direct injection test much, but was showing a consistent slowness on Large and especially Small (no flush) tests.

I left the 2 deliverMsg functions in their places for now to compare the diffs, will move them to dispatch.c in the next PR.

My trickery with changing the PR base to get the numbers compared here did not work, but here is the summary from my fork,

== BenchSubscribeAsync_Small == Best: -3.88% Average: -4.58% Worst: -6.61% ... == BenchSubscribeAsync_Inject == Best: -7.36% Average: -3.22% Worst: -0.43% ... == BenchSubscribeAsync_InjectSlow == Best: -0.35% Average: -0.02% Worst: 0.15% ... == BenchSubscribeAsync_Large == Best: -0.03% Average: -0.49% Worst: 1.90%

Can you say if negative numbers here mean better or worse? Meaning is it that the new code is -3.88% number of messages per second, meaning it is slower than main?

Negative is better; these are "sums of averages of all runs", not directly proportionate to the TPM. I pasted from here: https://github.com/levb/nats.c/actions/runs/10243884983/job/28336496607?pr=102, you can see both run's results and the comparison details (the comparison does not display records with less than 5% difference).

(Unfortunately rebasing this PR did not change the github internal "base" reference from main, I'd need to cut a new PR to run the comparison here)

src/dispatch.c

src/nats.h

src/opts.c

src/sub.c

…chers

kozlovic · 2024-08-05T16:57:20Z

@levb I have to say that your PR reviews become hard to follow. You make use a lot of merge and separate PRs (which I understand why), but specifically for this PR, I am not sure what to review, and may have to do full review again.

When branches don't live for too long, I think a rebase makes more sense. Say that you are submitting a PR and during the review process, we agree that something else needs to be done before. Maybe open a PR that has the new change, once committed, update (with a rebase) the original PR so that the new code "sits on top" of the main + the intermediate change.

I think this one is different because you had to rollback some changes, but still.

I will wait on your comment to know what I have to review, and it is ok if the answer is "everything" :-)

levb · 2024-08-05T17:12:12Z

@kozlovic Yes the entire PR, please. If it makes it easier for you I'd be very happy to replace this with a new (single commit) PR now, it would also have the proper base and run the benchmark properly.

Apologies for so many changes, the performance benchmark/investigation went down too many rabbit holes.

kozlovic · 2024-08-05T17:15:10Z

Yes the entire PR, please. If it makes it easier for you I'd be very happy to replace this with a new (single commit) PR now

If it is the entire PR, it does not really matter to not have all those commits because I need to review everything.

it would also have the proper base and run the benchmark properly.

Then maybe it is better to do so if you don't mind. I am busy with a NATS Server issue investigation at the moment, so I may not be able to review right away anyway, so that give you time to "clean it up". Ok?

kozlovic · 2024-08-05T22:14:15Z

@levb Just to be clear: you are going to post a new PR instead of this one, correct?

levb and others added 15 commits July 12, 2024 21:37

[TEST+CI only] cleanup of sanitizeThread

943b315

Removed repeat 10

eb9412c

use TSAN_OPTIONS

b1a6e9c

PR feedback: use natsSock_Shutdown in natsConnection_Reconnect

7723d1a

PR feedback

23e8b68

disabled sanitize with gcc on travis again

4154667

Fix data race in _resendSubscriptions: add a delivery worker lock

9966b92

removed a blank line

f54999c

removed TSAN extra verbosity

4846196

fixed another race, in JetStreamSubscribeIdleHeartbeat

ee40b68

experiment with double-coverage

6804cae

Merge branch 'main' into lev-ci-logging-threadsanitizer

caddb65

Merge branch 'main' into lev-ci-logging-threadsanitizer

80a9b40

Fixed test_MicroAsyncErrorHandler_MaxPendingMsgs

bf65830

Squashed and merged

98f8da1