Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add kafka fields needed for dashboards #8504

Merged
merged 6 commits into from
Oct 18, 2018

Conversation

jsoriano
Copy link
Member

@jsoriano jsoriano commented Oct 1, 2018

Kafka dashboards calculate cardinalities of composed
fields, this is expensive in query time, but can be optimized if the
composed fields are precalculated.

Continues with #7767, needed for #8457.

Kafka dashboards calculate cardinalities of composed
fields, this is expensive in query time, but can be optimized if the
composed fields are precalculated.

Continues with elastic#7767, needed for elastic#8457.
@jsoriano jsoriano added review needs_backport PR is waiting to be backported to other branches. v6.5.0 labels Oct 1, 2018
@@ -141,6 +141,7 @@ https://github.com/elastic/beats/compare/v6.4.0...master[Check the HEAD diff]
- Support for Kafka 2.0.0 {pull}8399[8399]
- Add container image for docker metricsets. {issue}8214[8214] {pull}8438[8438]
- Add support for `full` status page output for php-fpm module as a separate metricset called `process`. {pull}8394[8394]
- Precalculate composed id fields for kafka dashboards {pull}8504[8504]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: Dot missing at the end.

@@ -111,6 +112,10 @@ func (m *MetricSet) Fetch(r mb.ReporterV2) {
}

emitEvent := func(event common.MapStr) {
// Helpful IDs for dashboards
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would rather describe it as helpful for queries.

For the future: I wonder if this is work that should be done in the processors to have it config based instead of in the code.

}
"name": "foo-1538389014-739473801"
},
"topic_broker_id": "0-foo-1538389014-739473801-0",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not topic.id and topic.broker_id?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not really sure about the naming and placing of these composed ids, in demo they are directly at the module level as kafka.topic_partition_broker_id, I moved them to partition as in some way they identify the partition. For the same reason I'd keep them as fields directly under the partition, and I wouldn't expect other fields under kafka.partition.topic. kafka.partition.topic.id may be seen also as if it should be the same as kafka.topic.id.

In any case I don't have a strong opinion on these names, so as you prefer.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see now that topic.id shows up in different events. So kafka.topic.id would probably make most sense? Meaning that we should move out topic from partition to also have kafka.topic.name (would be a breaking change)?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

kafka.topic.name was added in #7767 to replace kafka.partition.topic.name and kafka.consumergroup.topic.name as a common name. These fields contain the topic name as seen by kafka, old ones are marked as deprecated.

The fields added in this PR are created by us to help on queries and don't exist in kafka, they identify the partition in the topic and in the broker (there can be two partitions with the same id, but of different topics, and two partitions with the same id and different topics and/or brokers), this is why I'd add them under kafka.partition, I see them more like identifiers of the partition.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it possible that that more then one topic.id shows up in a single event?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure of following you... a single event will have an only kafka.topic.id and an only kafka.partition.topic_id (or however we call it), and both ids will be different, being the second one composed by the first one and kafka.partition.id.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That actually answers my questions. There are 2 topic.id in 1 event so there have to be different fields like you suggested.

@ruflin
Copy link
Contributor

ruflin commented Oct 16, 2018

@jsoriano Please move forward with your suggestion. Can you rebase this on master?

@jsoriano
Copy link
Member Author

jenkins, test this again please

@jsoriano
Copy link
Member Author

jsoriano commented Oct 17, 2018

I have done some last time changes. The id with the broker ID was using the leader ID, so it generated the same for all replicas (and the same for all events with the same topic and partition). Now I use the replica ID, so it is different and we can calculate things like number of replicas using this.
I am not adding this field to consumer groups, because there there is no information about replicas, so it doesn't make a lot of sense, and kafka.partition.topic_id provides enough info for cardinalities on partition + topic.

@jsoriano jsoriano merged commit 78ef120 into elastic:master Oct 18, 2018
@jsoriano jsoriano deleted the kafka-id-fields branch October 18, 2018 10:34
@jsoriano jsoriano removed the needs_backport PR is waiting to be backported to other branches. label Oct 18, 2018
jsoriano added a commit to jsoriano/beats that referenced this pull request Oct 18, 2018
Kafka dashboards calculate cardinalities of composed
fields, this is expensive in query time, but can be optimized if the
composed fields are precalculated.

Continues with elastic#7767, needed for elastic#8457.

(cherry picked from commit 78ef120)
jsoriano added a commit that referenced this pull request Oct 18, 2018
Kafka dashboards calculate cardinalities of composed
fields, this is expensive in query time, but can be optimized if the
composed fields are precalculated.

Continues with #7767, needed for #8457.

(cherry picked from commit 78ef120)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants