-
Notifications
You must be signed in to change notification settings - Fork 4.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Metricbeat - Kafka consumer lag per partition stats #3608
Comments
regarding metricbeats and the kafka module...Kafka lag per topic/partition would be the most useful metric for monitoring processing |
I was equally interested in having the consumer lag per partition_id/topic and i labored to make it work in my repository --> https://github.com/antonking/GOLANG |
that's awesome @antonking, could you open a PR with your changes applied to metricbeat? |
@exekias how do you do that? I've no idea what PR means to be honest. |
Sorry @antonking PR == Pull Request, I can assist in the process if needed :) |
@exekias please do assist and if you need anything from me in the process, kindly let me know. Thx lots |
@antonking @exekias I would like this to be a feature of kafka metricbeat, for us its the metric most useful. Did you have any joy with the PR? |
@Mrclark05 please feel free to make the feature a PR because i've never done that before. Thx |
A kafka dashboard has been merged to master (#8457) it includes a visualization for consumer lag. This value is still not available as a single value, but it is calculated using offset values from consumergroup and partition metricsets. It'd be great if you could give a try to it, we plan to release this first dashboard on 6.5.0. |
Let us know if the dashboards are not enough. |
I've been working with the new Kafka dashboard, and if I'm understanding it correctly, it's not quite right. The current lag visualization takes the highest offset value from the partition stats and the highest offset value from the consumer group stats. The lag value is generated by taking the difference between the two (or returning zero if the consumergroup offset is higher). We can then group the results by another field, for example, the Kafka topic name. There are still a few issues with this approach, though.
The root of the issue is that we have to compare across data types to get the lag value (the consumergroup metricset vs. the partition metricset). The easiest solution looks to be including the lag value in the consumergroup metricset. Lag can then be associated with any combination of client ID, consumergroup ID, topic, and partition. |
@tkinkead thanks a lot for your feedback, I am going to reopen the issue.
Yes, you are right about that, this can be a not so bad overview value if all partitions grow at the same rate, but I am not sure if this is always the case. If you select topic and partition, is the value shown correct?
Yes, you are right, I am afraid this case won't work. We should probably aggregate per consumer group, not only per topic.
Metricbeat aims to collect mainly metrics of local services. It can also be configured to monitor services on other hosts or in cloud providers, but the intended premise is that metricbeat don't have to connect to other hosts unless otherwise needed. Said that, maybe it worths in this case to (at least optionally) connect to the partition leader from the consumergroup metricset and directly calculate the lag to gain in simplicity for this important value. |
No, because we have consumers in multiple consumer groups connected to the same topic and partition. If I select topic and partition in the dashboard, I will still only see the consumer group with the highest lag value for that topic and partition, not the lag values for all consumer groups.
That's understandable. The best way to get the lag value is to pull it from the zookeeper API directly, rather than connecting to partition leaders and collecting offsets and then doing the calculations. It's possible that this simply isn't the correct way to do this. Perhaps lag and consumer group data should be built into a kafka metricset on the zookeeper module instead? |
Hi, can I just ask how does Metricbeat gets its offsets since from Kafka 2.1.x onwards offsets are stored in the kafka brokers? Is it zookeeper or the individual brokers? |
Hi @Lumotheninja , consumer offsets are being retrieved from brokers. |
Right, just wanted to raise that question since @tkinhead suggested zookeeper. +1 on PR |
Closing for now. Please reopen if the issue is not solved. |
Feature:
For Metricbeat Kafka module, it would be nice to:
The issue should probably be a part of #3005
The text was updated successfully, but these errors were encountered: