-
Notifications
You must be signed in to change notification settings - Fork 843
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Kafka input: consumer group not behaving as expected when restricting the number of topic partitions #1806
Comments
Hey @elukey, I've recently been looking into this a bit, the kafka version might be a clue as I've been struggling to reproduce the issue even though others have reported similar stuff (#1058, #1802). There's more info here overall (thanks for the write up) so I'm going to use this as the conglomerate of "dodgy kafka consumer" issues. Unfortunately a common theme of our Kafka woes is when running a combination of some niche situation that might not be a popular configuration alongside our baseline We therefore have two options, we can either push forward with diagnosing these issues within sarama (if that's where they are) and then either fork for fixes (or maybe they'll correct the breaking changes and we can upgrade). Or the alternative is to try and get the I'm leaning towards adding the bells and whistles to our |
@Jeffail makes sense! We checked
|
This can be done with But also -- specifying partitions is unrelated to this issue, right?
@elukey not sure what you mean w.r.t. batching -- I might be missing some Benthos context here. The franz-go client consumes through batches only (always -- all clients do). Edit: re-reading the above, this looks like the Benthos concept. A few batching tuning knobs can be controlled with (this comment is about what can be added to Benthos for Benthos authors :D) |
Hi!
In theory no, without specifying partitions the kafka client behaves as expected, but when we apply a range we start seeing the issues described above.
I didn't see it specified in the kafka_franz documentation, meanwhile there is a specific barching option in the kafka one, this is why I was asking :) |
Yeah this is a limitation with the features benthos exposes rather than the underlying franz-go library. @twmb we have everything we need I just need to add a bit of a fork in our plugin to use explicit partitions rather than balancing. |
@Jeffail hi! Checking in again to see if there are news :) |
hey @elukey I've just added partition consumption to the |
Batching added now: 463169c, for anyone finding this issue after experiencing similar problems I urge you to try out the |
In redpanda-data/connect#1806 the upstream devs added support for batching and select topic partitions to kafka_franz, a new component that seems preferred from the kafka one. I tested the new Benthos version 4.15.0 on stat1004 and it worked nicely. The idea is to replace the kafka component/client with kafka_franz, see how it goes and then reduce the topic partitions (and adjust sampling) later on. Bug: T331801 Change-Id: Ib2c05e3dcd0632ff512ca382e269e8b6a720b591
@Jeffail thanks a lot! We tested the new kafka input/output and it works really well. One caveat - when we migrated from
We expected something similar but maybe a reference in the docs could be good for people attempting the same migration. We also discovered that limiting the topic partitions to consume from doesn't work with a consumer group setting in |
Does the Sarama (default |
Yep I was suggesting to add this gotcha to the documentation since it is not super easy to know.
In theory yes, but we had a lot of problems when we tried, and then we moved to kafka_franz. Not a big deal if we can't, it would be nice to reduce the amount of data pulled from huge topics :) |
Hi folks!
At Wikimedia we are happy users of Benthos for a stream processing pipeline. We pull what we call "Webrequests" (basically the JSON dump of every HTTP request hitting our front end caches) from Kafka, sample and enrich it with some GeoIP and additional data, to then re-insert it to Kafka again (so other tools can consume this stream).
Since the total volume of events to pull from Kafka is huge, we decided to test the feature of kafka input to select the number of topic partitions to pull from. This seemed to work nicely up to some days ago, when a consumer group rebalance was needed (a host was decommissioned and a new one was added instead), leading to https://phabricator.wikimedia.org/T331801. We can't really explain what the issue was, but it seemed that Benthos started pulling from less partitions than expected, leading to inconsistent results later on in our pipeline (we have some tools that visualize the data that we save in Kafka, and the traffic flows were totally off).
We tried to change the consumer group name, reset its state in Kafka, etc.. Nothing fixed the issue except changing the input kafka config to pull from all the topic partitions, instead of selecting a range of them. We have no idea why it was working before, maybe it was by chance/luck and the rebalance made the issue more clear?
This is a simplified version of the config that we use:
And these are the env variables:
If I use KAFKA_TOPICS=webrequest_text, I see the following in Kafka:
Otherwise, if I use KAFKA_TOPICS=webrequest_text:1-6, I see the following:
The kafka version that we use is 1.1.0, so we are wondering if there is any gotcha that we still don't have about how to use a range of partitions instead of the full topic volume. Any help would be really appreciated!
The text was updated successfully, but these errors were encountered: