-
-
Notifications
You must be signed in to change notification settings - Fork 536
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Provide a pause()
helper to eachMessage/eachBatch
#1364
Conversation
What's the intended use-case that makes increasing the API surface here worth it? The example from the documentation, just for comparison: await consumer.run({
eachMessage: async ({ topic, partition, message }) => {
try {
await sendToDependency(message)
} catch (e) {
if (e instanceof TooManyRequestsError) {
consumer.pause([{ topic, partitions: [partition] }])
setTimeout(() => {
consumer.resume([{ topic, partitions: [partition] }])
}, e.retryAfter * 1000)
}
throw e
}
},
}) And this proposal: await consumer.run({ eachMessage: async ({ topic, message, pause }) => {
try {
await sendToDependency(message)
} catch (e) {
if (e instanceof TooManyRequestsError) {
pause(e.retryAfter * 1000) // returns control to KafkaJS until timeout has expired
}
throw e
}
}}) To me they look very similar. The new proposal is a bit terser, but I'm not sure it's really eliminating much complexity. You still need, as a user, to understand that it's pausing a specific topic-partition, even if that parameter is now hidden internally. The one benefit that I find interesting is that it allows a way to communicate back to the calling code that we actually want to exit out of the message loop without having to throw an error and everything this entails. Instead, we have a way for the consumer to know that we should continue operating as usual, just without processing any further fetched messages on that topic for as long as it's paused. The way it's communicated, by throwing a special error, I'm not so keen on, but the same thing can be achieved in other ways. Then there's the thing about keeping the timer for resuming. Any time I see stateful code that includes timers, I get a bit nervous, because it's so hard to predict whether things are in a valid state to proceed with what you're doing when the timer fires. For example, does the consumer still exist? Is it still running? Is it still subscribed to the same topics? Is it still assigned the relevant partition? What if the user explicitly pauses the topic afterwards? And so on. Some of these might not be relevant in this case - I'm just pointing out that whenever you're dealing with doing something in the future based on nothing but a timer, things tend to get complicated and lead to complex bugs down the line. |
This function will take care of pausing (and optionally, resuming) message consumption on the current topic/partition when processing messages either within the `eachMessage` or `eachBatch` handler functions.
d2d48a9
to
96ad17c
Compare
@Nevon thank you for your feedback. I've tweaked the implementation and interface a bit based on your suggestions (timers are left to the user, no special exceptions are used for flow control).
I do think the primary benefit of this means of pausing/resuming is being able to stop processing messages from a batch in the middle without having to keep track of whether the current message is from a topic/partition that was paused. The other thing that this helps with is the ability to pause processing without passing around a reference to the
Agreed. Would an implementation that simply checks if the current topic/partition are paused after each
💯 This is a very good point and was short-sighted on my part. I've left one vestigial convenience in here for resuming (the There are a few paths forward, as I see them:
Let me know what you think, and thanks so much for your time reviewing and giving quality feedback. |
If the provided However, we need to ensure that this functionality works the same whether you are pausing using Funnily enough, someone else was doing similar work in #1382. What they found was that if you pause a topic-partition within the eachMessage or eachBatch functions, and then threw an error, the error would bubble up to the retrier and the retrier would retry processing the same batch from the now paused partition. There, the solution is to not invoke |
Just to set expectations, I will be going on vacation tomorrow and will be back on June 25th. So if you haven't heard from me before then, I'm not ghosting you, I'm just enjoying a coconut drink on a beach. 😄 |
If a topic/partition is paused within the `eachMessage` or `eachBatch` callback, we want to stop processing messages and avoid retrying the batch or processing additional messages in the current batch.
pause(optionalTimeout)
helper to eachMessage/eachBatchpause()
helper to eachMessage/eachBatch
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Alright, I am back from vacation and have gone through this another round. Looks good to me overall, just have some notes on the comms as well as one possibly important note on handling errors that require rejoining.
Merged master and resolved the conflicts. With #1382 the handling of paused partitions during error handling is already taken care of. |
This function will take care of pausing (and optionally, resuming) message consumption on the current topic/partition when processing messages either within the
eachMessage
oreachBatch
handler functions.