-
Notifications
You must be signed in to change notification settings - Fork 4.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[filebeat] Add a Kafka input #12850
[filebeat] Add a Kafka input #12850
Conversation
}) | ||
} | ||
return array | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can we have duplicate headers?
Would it make sense to combine headers and only have the return type map[string]string
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I thought about this, but as far as I can tell duplicate header keys are valid (if not in the spec then at least in the implementation -- when I post events with duplicate headers, it passes them on unchanged to the consumer)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Checking the kafka implementation it is indeed just a list of headers, allowing for duplicates.
This is not optimal for querying/filtering events via kibana. A simple map[string]string would fit kibana UI better. I wonder if map[string][]string
might be a better compromise.
We should test how the differnce []map[string]string and map[string][]string affect the usability.
For adding end-to-end ACK support you will need this: #12997 end-to-end ACK becomes possible via:
The ACKEvents callback will be called with the 'Private' field of all events recently ACKed. The pipeline ensures that ACKs are returned in the same order as you published your events. Note: I have the sessions available in the callback. This is motivated by the fact that a pipeline.Client is not necessarily multithreading safe (we just made it safe because of the logs input in filebeat). |
I've revised the core run loop to be more robust, fixed end-to-end ACK per @urso's suggestions about I also looked at the header issue in a live index, and currently it's definitely not what we want:
I initially assumed a data blob was the safest since that's what's in the spec, but given this outcome, absent better suggestions, I'll make them strings and leave it to the receiver to re-extract a data blob if that's truly their intent. |
The motivation for this update is to support the IsolationFlag configuration parameter in #12850, but this PR is just a version bump with no functional changes.
filebeat/input/kafka/input.go
Outdated
select { | ||
case <-input.context.Done: | ||
return | ||
case <-time.After(input.config.ConnectBackoff): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We normally apply some exponential backoff with jitter. See package libbeat/common/backoff
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The backoff
package works only based on the Wait
function, not a channel, so I had to wrap it in an auxiliary channel to handle shutdown signals, let me know if you see a prettier approach.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Internally the implementation uses a channel. We could introduce a method like C() <- chan time.Time
, which is created by a timer. If the current timer is not nil, it should be stopped and reset when calling C
.
Update on the headers issue, following several tests and an offline discussion: the suggestion above about writing them to I can create index templates for fields like this manually, but when the template is built by filebeat the subfields are omitted. None of our beats / modules / etc currently use a |
@dedemorton Can you have a look at the docs? |
- name: partition | ||
type: long | ||
description: > | ||
Kafka partition number |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@faec @urso I wonder if we should use the same fields metricbeat uses for this, in case we want to correlate and/or monitor the kafka topics used. Metricbeat uses kafka.topic.name
for the topic and kafka.partition.id
for the partition.
And I wonder if message specific fields (offset
, key
, headers
) should be under an specific namespace as kafka.message
, at least to differentiate the offset from other offsets.
Add a Kafka input to Filebeat (#7641).