-
-
Notifications
You must be signed in to change notification settings - Fork 532
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Make the default partitioner topic-aware when in round-robin mode #1112
Make the default partitioner topic-aware when in round-robin mode #1112
Conversation
Changing the behavior of the default partitioner is a tricky proposition. It essentially means that when you upgrade KafkaJS, we can no longer guarantee ordering, since messages produced with the previous version may end up on a different partition from the new version. Currently the tests are passing, but I'm guessing that's because they are only running the partitioner with a single topic. Looking at the default partitioner from the Java client, it doesn't look like they are topic aware either, so I think this is a case where you having a Custom Partitioner for your use-case makes more sense than shipping it like this by default. Certainly we wouldn't be able to modify the default partitioner in a way that breaks backwards compatibility, so the only other option would be to ship an alternate partitioner out of the box, but I don't think this is a common enough case that it makes sense to do. |
I would disagree with this statement. Under the existing default partitioning scheme, an app may reproduce messages to the same partition in the same order on a subsequent run in this scenario, but only if:
Edit: This also doesn't address the current highly unbalanced distribution if your messages are produced between topics in any sort of pattern. Making the default partitioner topic-aware actually allows us to guarantee consistent message assignment on subsequent runs much more strongly - extra messages produced to topic A won't impact the partition assignments of messages produced to topic B.
Hmm... the use case here is any app which produces to multiple topics using the default partitioner without explicit partition assignment or a key. To me, that actually seems fairly broad - almost the 'default' way of producing messages.
That's what we did after discovering this behaviour in KafkaJS - we just took the round robin logic from the default partitioner and made it topic-aware. However, it took a while to figure out, because we took the documentation's statement that messages are assigned to partitions in round-robin fashion at face value. That statement isn't currently true in a multi-topic scenario without keys - at a minimum, the documentation should be updated to reflect this. Another potential workaround would by just assigning some sort of key, such as an autoincrementing integer. However, that would require extra tracking and require calculating the murmur2 hash for each message unnecessarily. |
You are indeed correct. I was incorrectly thinking that this would also affect messages produced with a key, but this is of course only changing the behavior when not providing a key. In that scenario, we can't really guarantee any ordering anyway, so this change should be fine. That's what I get for multi-tasking I suppose 😅 Do you think you could add a test to the partitioner tests to verify the behavior that your change intends to make, so that we avoid breaking it in the future? |
I was working on an application in which we produced to two topics - a
logs
topic and amytopic
topic, each having 6 partitions.After writing a message to the
mytopic
topic, we would immediately write a logs message to thelogs
topic.Because the default partitioner is not topic-aware, it would round-robin the partition selection between the two topics, meaning that three partitions of both the
logs
andmytopic
topic would get heavily used, while the other three partitions within each topic would be almost untouched.This PR adds separate counters per-topic to the default partitioner. This means that each topic gets it's position in the round-robin calculated independently.