-
Notifications
You must be signed in to change notification settings - Fork 56
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
MaxMsg=1 not respected. second message ALWAYS gets redelivered as if its ackwait started before yielded #608
Comments
This is reproducible even if I bump to 45seconds.
|
Here's a trace
|
There's no +ack between delivery of ahh1 and ahh2 (okay I used a different word than ahh above). |
For avoidance of doubt, this also happens with await i.AckAsync(new AckOpts() { DoubleAck = true}); too.. |
I've tried in bun.js (probably works in node too if you want loose hair over configuring ESM modules etc..). Exact same setup I believe. The second message is not seen twice. i.e. Bun/Node work as expected against same server/stream/consumer.
|
The bun consumer correctly flushes the ack, and waits 32.5 seconds before calling JS.API.CONSUMER.MSG.NEXT . The dotnet version seems to have a seperate loop calling JS.API.CONSUMER.MSG.NEXT which is ignorant of the consuming loop
|
Temporarily using a hack around it now so I dont have to change all my handlers. I get it's quite damaging performance wise but hopeless we get back to near 0 unexpected re-deliveries..
|
Eek I think this bug is pretty serious.
This should leave 1 outstanding ack on new/fresh stream. But we get Outstanding Acks: 100 out of maximum 1,000 |
maybe related to #508 |
First tests passes. No ahh2 twice. Second test passes.. 99 messages Last Delivered Message: Consumer sequence: 1,458 Stream sequence: 1,046 Last delivery: 6.26s ago Looking good!! Thank you. But that's two use-cases so far. It probably needs more real world testing but then again it's strange no one else seems to have noticed this, it would immediately drain the whole queue as fast as it could loop doing JS.API.CONSUMER.MSG.NEXT upto max ack pending irrespective if you had even finish processing the first message with max msg 1. Maybe people aren't using ConsumeAsync yet? I'm comfortable enough to try it in our prod environment (our use case at the moment isn't a business critical). I'll wire it up tomorrow and let it run over weekend. |
There was a discussion on slack: https://natsio.slack.com/archives/C1V81FKU6/p1717497381540349 unfortunately slack clears after certain amount of time. here is a copy of the initial message:
|
Thanks, this is concerning though, can you confirm the latest branch is absent of this additional pre-fetching (other than upto max msgs) or if it can be disabled entirely, pre-fetching beyond maxmsgs would be problematic for me and make everything non-determistic, I want to establish a sensible ackwait accepting the performance penalty of max-msg 1, but I wouldn't know how if I have no idea when my ackwait timer starts for each message, it would be need to desired ackwait * max prefetching which would be far too loose. I guess these problems become apparent when processing a message is rather slow (like talking to crappy SaaS like mandrill that can spike to 12seconds to respond) coupled with unprocessed messages (i.e. a storm of emails to be sent), I going to see a lot of redelivery if theres some unknown prefetching too (with say an ackwait for 30s).. .Update: If I'm reading this correctly https://github.com/nats-io/nats.net/pull/508/files nothing in there doing any sneaky prefetching beyond max-msgs? that would correlate with 99 unprocessed on test 2 I done... so maybe its all good :) |
Yes, pre-fetching should not exceed the max-msgs limit. The idea is to fetch additional messages (up to the max-msgs limit) when the number of consumed messages reaches a threshold, which is set by default to half of max-msgs. This approach ensures a smooth flow of messages without overwhelming the application with too many at once. Our issue was that we were deciding whether to issue a new pull request based on when messages were written to the channel that the application code yielded from. PR #508 addresses this issue by adjusting the calculation to occur after each message is yielded, thereby avoiding erroneous internal buffering. |
So let's say you have MaxMsgs=30, pulls 30 first, then after consuming 15, you would fetch upto another 15 (15x JS.API.CONSUMER.MSG.NEXT presuming messages are available), and the internal buffer would never exceed 30. I would probably still apperciate a way to disable this though if that's possible. I have a particular use case in one worker where the API partner lets us batch requests to save money, so I have an abstraction over ConsumerAsync that basically Task.WhenAny on MoveNexyAsync and a Task.Delay (as we dont want to wait too long before wrapping up the batch). With this I have to manage ackwait timing very carefully. I worry this would throw it off although I havent drawn it all out yet. Maybe NoOverlapFetching : boolean ? |
We could potentially do that but I wonder if you can get what you need by adjusting the threshold or instead by using |
Yeah that's probably a good idea. Thanks |
MaxMsg=1 not respected. second message ALWAYS gets redelivered when ack'ed within time, it is as if its ackwait started before yielded (as if MaxMsg 1 wasn't originally respected and this is was in the buffer before the loop back to foreach). Theory only. Reproducible consistently. I get the point that one should handle redeliveries via idempotency onwards or some KV lookup to check, due to anything from network partitions, server restarts, client crashes, internal buffers being too strained, and so on, but this is consistently reproducible locally.
Observed behavior (local setup VM and client app next to each other, no network chat)
consumer
Once setup run this:
Expected behavior
shouldnt see second message 2 times. as acked within 32.5 seconds when ackwait is 36s
Server and client version
v2.10.17 and "NATS.Net" Version="2.3.3". All local Fedora 40. Dotnet 8. Plenty of resources. No docker.
Host environment
linux, localhost, all same machine
Steps to reproduce
setup stream and consumer via cli
build and start dotnet simple app
nats publish email.queue 'ahhh1'
nats publish email.queue 'ahhh2'
The text was updated successfully, but these errors were encountered: