Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Metricbeat] Kafka module refactoring #14852

Closed
3 tasks
ChrsMark opened this issue Nov 29, 2019 · 8 comments
Closed
3 tasks

[Metricbeat] Kafka module refactoring #14852

ChrsMark opened this issue Nov 29, 2019 · 8 comments
Labels
enhancement Metricbeat Metricbeat module Stalled Team:Services (Deprecated) Label for the former Integrations-Services team

Comments

@ChrsMark
Copy link
Member

ChrsMark commented Nov 29, 2019

Current situation

Currently Kafka module connects to the local broker and retrieve information only from there. Given the fact of the distributed nature of Kafka, this might be problematic in many cases that we need some "cluster-wide" metrics. In addition problems have been reported in cases of topics with many partitions.

Goal of this issue

In #14822 we introduced the use of a "cluster-wide" client which connects to a given broker but can retrieve information for the whole cluster through that.

This issue aims to refactor the module so as to:

@jsoriano @exekias @mtojek feel free to add/comment anything!

@ChrsMark ChrsMark added enhancement module Metricbeat Metricbeat Team:Integrations Label for the Integrations team labels Nov 29, 2019
@mtojek mtojek self-assigned this Feb 21, 2020
@mtojek mtojek added [zube]: In Review Team:Services (Deprecated) Label for the former Integrations-Services team and removed [zube]: Backlog Team:Integrations Label for the Integrations team labels Feb 21, 2020
@mtojek
Copy link
Contributor

mtojek commented Feb 21, 2020

@ChrsMark @jsoriano

The source code looks complex in terms of error handling, retries and waiting for results, so let me rephrase this description to fully understand what should be done/achieved:

  1. Metricsets partition and consumergroup (belonging to kafka module) should ALWAYS use sarama.Client while perfoming Fetch(), to get a reference to the leading broker for a given topic partition.
  2. The same broker can be a leader for a different topic partitions. It's fair point to optimize here and combine requests for offsets to several topic partitions.
  3. Speaking about consumergroup: first fetch partition offsets from leader, then compare it with consumer group status to calculate lag. The consumer group status can be requested from the leading broker for the topic partition?

It looks like a bigger refactoring (incl. tests), hence wondering if isn't it easier to rewrite the code.

@ChrsMark
Copy link
Member Author

@ChrsMark @jsoriano

The source code looks complex in terms of error handling, retries and waiting for results, so let me rephrase this description to fully understand what should be done/achieved:

  1. Metricsets partition and consumergroup (belonging to kafka module) should ALWAYS use sarama.Client while perfoming Fetch(), to get a reference to the leading broker for a given topic partition.

I would say that yes, this is the ideal situation, to make use of the "cluster-wide" client if possible. This will most cover points 1 and 2 from the initial description.

  1. The same broker can be a leader for a different topic partitions. It's fair point to optimize here and combine requests for offsets to several topic partitions.

I guess that any kind of optimisation is more than welcome :).

  1. Speaking about consumergroup: first fetch partition offsets from leader, then compare it with consumer group status to calculate lag. The consumer group status can be requested from the leading broker for the topic partition?

Maybe https://github.com/Shopify/sarama/blob/master/admin.go#L85 ?

It looks like a bigger refactoring (incl. tests), hence wondering if isn't it easier to rewrite the code.

I would agree with a rewriting. If I'm not mistaken the key question to answer before this is if we can replace the old client with the "cluster-wide" one. If yes then all the functionalities could be re-written on top of this concept.

@mtojek
Copy link
Contributor

mtojek commented Feb 24, 2020

@ChrsMark Thank you for answers!

@jsoriano
Copy link
Member

+1 to rewrite these metricsets if we can simplify the code by using more features of the sarama client and less custom code.

@mtojek
Copy link
Contributor

mtojek commented Mar 4, 2020

Lessons learnt here:

Ok, so I tried to solve this first and second issue (use cluster-wide client, "broker is not leader"), but apparently grouping requests made the codebase relatively complex and the original issue ("broker is not leader" due to not fresh metadata) might still occur.

After a discussion with Jaime, the way to go might be putting some retries around fetching topic metadata, but this can also make the code not readable.

I will leave this issue open, unless there are no easy steps to solve it (comparing to the total gain).

@botelastic
Copy link

botelastic bot commented Feb 2, 2021

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

@botelastic
Copy link

botelastic bot commented Jan 3, 2022

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

@botelastic botelastic bot added the Stalled label Jan 3, 2022
@jlind23
Copy link
Collaborator

jlind23 commented Mar 31, 2022

Backlog grooming: Closing for now.

@jlind23 jlind23 closed this as completed Mar 31, 2022
@zube zube bot added [zube]: Done and removed [zube]: Inbox labels Mar 31, 2022
@zube zube bot removed the [zube]: Done label Jun 29, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement Metricbeat Metricbeat module Stalled Team:Services (Deprecated) Label for the former Integrations-Services team
Projects
None yet
Development

No branches or pull requests

5 participants