Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

jetstream: ordered consumer: handle closed subscriptions gracefully #1666

Closed
wants to merge 1 commit into from

Conversation

zonque
Copy link
Contributor

@zonque zonque commented Jun 25, 2024

In orderedConsumer.reset(), retryWithBackoff() is returned from with false, nil in case the subscription is closed. In this case, var cons Consumer is still nil, and the code crashes in the attempt to cast it to a pullConsumer later on.

Fix this by catching the nil case explicitly.

Signed-off-by: Daniel Mack [email protected]

@zonque
Copy link
Contributor Author

zonque commented Jun 25, 2024

Closes #1656

@zonque zonque changed the title jetstream: ordered consumer: handled closed subscriptions gracefully jetstream: ordered consumer: handle closed subscriptions gracefully Jun 25, 2024
@zonque zonque force-pushed the daniel/ordered-consumer-closed branch from 32fb4da to f28e14a Compare June 25, 2024 13:01
In `orderedConsumer.reset()`, `retryWithBackoff()` is returned from with
`false, nil` in case the subscription is closed. In this case,
`var cons Consumer` is still `nil`, and the code crashes in the attempt to
cast it to a `pullConsumer` later on.

Fix this by catching the `nil` case explicitly.

Signed-off-by: Daniel Mack <[email protected]>
@zonque zonque force-pushed the daniel/ordered-consumer-closed branch from f28e14a to 1de5b00 Compare June 25, 2024 15:33
@zonque
Copy link
Contributor Author

zonque commented Jun 25, 2024

This is a regression from 66b92a9 it appears.

@zonque
Copy link
Contributor Author

zonque commented Jul 3, 2024

@piotrpio ping

@Jarema
Copy link
Member

Jarema commented Jul 3, 2024

@zonque sorry for the delay.
I wonder - are you able to reproduce it?
If yes - can we have a test for it?

@zonque
Copy link
Contributor Author

zonque commented Jul 3, 2024

It's tricky to test because the issue is triggered by a race condition between the creation of a consumer and the teardown of a NATS server. Not sure how this could be tested, but this is very clearly a bug, as you can easily identify the code path that leads to a nil pointer dereference 🤷

@piotrpio
Copy link
Collaborator

Hello @zonque, thanks for finding this and sorry it tool so long to review.

Your solution solves the nil pointer, but I have a few suggestions on how it can be improved:

  1. Don't use nats.ErrSubscriptionClosed - this is a core NATS error implicating that a NATS subscription was closed - in the context of JetStream consumers it does not make much sense.
  2. I don't think just returning an error is enough - the way it works after your change it will trigger an error callback for Consume(), which I don't think should be done since the consuming was stopped by the user (either with Stop() or Drain()). Similarly, for Messages(), you'll get an ambiguous nats.ErrSubscriptionClosed when calling Next().

My suggestion would be to:

  1. introduce a new error variable, e.g. errOrderedConsumerStopped (similarly to errOrderedSequenceMismatch) which can be internal, we don't have to expose it to the user. We can return this error if cons == nil.
  2. Handle the error properly in Consume() and Next(). For Consume(), I think we can simply return (here), for Next(), we can return ErrMsgIteratorClosed since we're sure there will be no more messages (here and here)

Again, sorry it took that long. Let me know if you'll be working on that or I should take it off your hands.

@zonque
Copy link
Contributor Author

zonque commented Jul 17, 2024

Again, sorry it took that long. Let me know if you'll be working on that or I should take it off your hands.

Thanks for the feedback. As you have a better view on the bigger picture, I think it might in fact be easier if you roll your own version of the fix?

@piotrpio
Copy link
Collaborator

@zonque, sure, I'll take care of this and make sure the fix is included in the next release.

@piotrpio
Copy link
Collaborator

I created a PR with the fix: #1686

So I'll be closing this one - thanks for your contribution!

@piotrpio piotrpio closed this Jul 26, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants