Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[FIXED] LeafNode's queue group load balancing and Sublist.NumInterest #5982

Merged
merged 2 commits into from
Oct 10, 2024

Conversation

kozlovic
Copy link
Member

While writing the test, I needed to make sure that each server in
the hub has registered interest for 2 queue subscribers from the
same group. I noticed that Sublist.NumInterest() (that I was
invoking from Account.Interest() was returning 1, even after
I knew that the propagation should have happened. It turns out
that NumInterest() was returning the number of queue groups, not
the number of queue subs in all those queue groups.

For the leafnode queue balancing issue, the code was favoring
local/routed queue subscriptions, so in the described issue,
the message would always go from HUB1->HUB2->LEAF2->QSub instead
of HUB1->LEAF1->QSub.

Since we had another test that was a bit reversed where we had
a HUB and LEAF1<->LEAF2 connecting to HUB and a qsub on both
HUB and LEAF1 and requests originated from LEAF2, and we were
expecting all responses to come from LEAF1 (instead of the
responder on HUB), I went with the following approach:

If the message originates from a client that connects to a server
that has a connection from a remote LEAF, then we pick that LEAF the
same as if it was a local client or routed server.
However, if the client connects to a server that has a leaf
connection to another server, then we keep track of the sub
but do not sent to that one if we have local or routed qsubs.

This makes the 2 tests pass, solving the new test and maintaining
the behavior for the old test.

Resolves #5972

Signed-off-by: Ivan Kozlovic [email protected]

While writing the test, I needed to make sure that each server in
the hub has registered interest for 2 queue subscribers from the
same group. I noticed that `Sublist.NumInterest()` (that I was
invoking from `Account.Interest()` was returning 1, even after
I knew that the propagation should have happened. It turns out
that `NumInterest()` was returning the number of queue groups, not
the number of queue subs in all those queue groups.

For the leafnode queue balancing issue, the code was favoring
local/routed queue subscriptions, so in the described issue,
the message would always go from HUB1->HUB2->LEAF2->QSub instead
of HUB1->LEAF1->QSub.

Since we had another test that was a bit reversed where we had
a HUB and LEAF1<->LEAF2 connecting to HUB and a qsub on both
HUB and LEAF1 and requests originated from LEAF2, and we were
expecting all responses to come from LEAF1 (instead of the
responder on HUB), I went with the following approach:

If the message originates from a client that connects to a server
that has a connection from a remote LEAF, then we pick that LEAF the
same as if it was a local client or routed server.
However, if the client connects to a server that has a leaf
connection to another server, then we keep track of the sub
but do not sent to that one if we have local or routed qsubs.

This makes the 2 tests pass, solving the new test and maintaining
the behavior for the old test.

Signed-off-by: Ivan Kozlovic <[email protected]>
@kozlovic
Copy link
Member Author

@neilalexander I believe there was an issue with Sublist.NumInterest for queue subs since it looked like it was simply counting the number of groups, not the total number of queue subscriptions. Let me know if I misunderstood the intent.

@derekcollison Please review the PR description and see if the choice I made is ok.

@kozlovic
Copy link
Member Author

You can review the first commit for the leafnode/sublist issues. The second is simply a bunch of missing "defer nc.Close()" and the likes.

Copy link
Member

@neilalexander neilalexander left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM.

I notice looking back at #5918 that even before NumInterest() was added, the Account.Interest() function was still returning len(res.psubs) + len(res.qsubs), so I think the bug is not new and I've just ported it over to the new code as-is.

That said, I think what you're proposing here makes sense, particularly if we're relying on the number of subscriptions to balance in this way.

@neilalexander
Copy link
Member

Something else that's just occurred to me is that NumInterest() was never back ported into 2.10.x, so if there's a problem on those versions too (as opposed to just on main), it's probably because of the Account.Interest() doing len(res.psubs) + len(res.qsubs).

@derekcollison Don't know whether we want to cherry-pick in NumInterest() into 2.10.x and apply this on top, or if we want to raise a separate PR against the release/v2.10.22 branch to just fix Account.Interest()?

@derekcollison
Copy link
Member

@neilalexander let's pull those into 2.10.22 from main once this lands.

Copy link
Member

@derekcollison derekcollison left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM - Thanks @kozlovic

@derekcollison derekcollison merged commit 7e9c93f into main Oct 10, 2024
5 checks passed
@derekcollison derekcollison deleted the fix_5972 branch October 10, 2024 14:23
neilalexander added a commit that referenced this pull request Oct 10, 2024
Includes the following:

- #5918
- #5982
- #5983 (although only the 1.22.8 upgrade, since 1.21.x is no longer
receiving updates)

Signed-off-by: Neil Twigg <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Queue Groups on leaf clusters not balancing correctly when messages are routed in from hub cluster [v2.10.21]
3 participants