[filebeat] Add a Kafka input #12850

faec · 2019-07-10T16:52:04Z

Add a Kafka input to Filebeat (#7641).

filebeat/input/kafka/config.go

filebeat/input/kafka/input.go

urso · 2019-07-18T21:55:09Z

filebeat/input/kafka/input.go

+		})
+	}
+	return array
+}


can we have duplicate headers?

Would it make sense to combine headers and only have the return type map[string]string?

I thought about this, but as far as I can tell duplicate header keys are valid (if not in the spec then at least in the implementation -- when I post events with duplicate headers, it passes them on unchanged to the consumer)

Checking the kafka implementation it is indeed just a list of headers, allowing for duplicates.
This is not optimal for querying/filtering events via kibana. A simple map[string]string would fit kibana UI better. I wonder if map[string][]string might be a better compromise.

We should test how the differnce []map[string]string and map[string][]string affect the usability.

filebeat/input/kafka/input.go

urso · 2019-07-21T00:00:47Z

For adding end-to-end ACK support you will need this: #12997

end-to-end ACK becomes possible via:

    ...
    out, err := outlet.ConnectWith(cfg, beat.ClientConfig{
		Processing: beat.ProcessingConfig{
			DynamicFields: context.DynamicFields,
		},
        ACKEvents: func(privates []interface{}) {
            for _, priv := range []privates {
                if cm, ok := priv.(*sarama.ConsumerMessage); ok {
                    sess.MarkMessage(cm, "")
                }
            }
        },
	})
    ...

    var msg *sarama.ConsumerMessage
    ...
    out.OnEvent(beat.Event{
        Timestamp: ...
        Fields: ...
        Meta: ...
        Private: msg
    })

The ACKEvents callback will be called with the 'Private' field of all events recently ACKed. The pipeline ensures that ACKs are returned in the same order as you published your events.

Note: I have the sessions available in the callback. This is motivated by the fact that a pipeline.Client is not necessarily multithreading safe (we just made it safe because of the logs input in filebeat).

faec · 2019-08-01T16:27:54Z

I've revised the core run loop to be more robust, fixed end-to-end ACK per @urso's suggestions about groupHandler, and fixed a shutdown bug that I came across in the process. The integration test now waits for the input to shutdown and confirms it takes <30sec.

I also looked at the header issue in a live index, and currently it's definitely not what we want:

curl "http://localhost:9200/filebeat-8.0.0-2019.08.01-000001/_search?pretty"
...
            "headers" : [
              {
                "key" : [
                  107,
                  101,
                  121,
                  115,
                  32,
                  97,
                  110,
...

I initially assumed a data blob was the safest since that's what's in the spec, but given this outcome, absent better suggestions, I'll make them strings and leave it to the receiver to re-extract a data blob if that's truly their intent.

The motivation for this update is to support the IsolationFlag configuration parameter in #12850, but this PR is just a version bump with no functional changes.

urso · 2019-08-06T12:30:37Z

filebeat/input/kafka/input.go

+				select {
+				case <-input.context.Done:
+					return
+				case <-time.After(input.config.ConnectBackoff):


We normally apply some exponential backoff with jitter. See package libbeat/common/backoff.

The backoff package works only based on the Wait function, not a channel, so I had to wrap it in an auxiliary channel to handle shutdown signals, let me know if you see a prettier approach.

Internally the implementation uses a channel. We could introduce a method like C() <- chan time.Time, which is created by a timer. If the current timer is not nil, it should be stopped and reset when calling C.

filebeat/input/kafka/input.go

filebeat/input/kafka/kafka_integration_test.go

faec · 2019-08-08T21:10:06Z

Update on the headers issue, following several tests and an offline discussion: the suggestion above about writing them to map[string][]string doesn't allow for the kind of searching we'd like. The current container, []struct{ key, value: string }, can work in theory as long as it's indexed as a nested type. I've updated filebeat/_meta/fields.common.yml with something near the correct config, but for some reason the properties inside headers (key, value) do not appear in the index template.

I can create index templates for fields like this manually, but when the template is built by filebeat the subfields are omitted. None of our beats / modules / etc currently use a nested type with defined subfields, so it's not clear to me yet whether the definition in fields.common.yml is somehow incorrect, or if the template generator doesn't handle subfields of a nested field.

filebeat/input/kafka/input.go

urso · 2019-08-15T16:22:07Z

@dedemorton Can you have a look at the docs?

jsoriano · 2019-08-22T08:25:07Z

filebeat/_meta/fields.common.yml

+        - name: partition
+          type: long
+          description: >
+            Kafka partition number


@faec @urso I wonder if we should use the same fields metricbeat uses for this, in case we want to correlate and/or monitor the kafka topics used. Metricbeat uses kafka.topic.name for the topic and kafka.partition.id for the partition.

And I wonder if message specific fields (offset, key, headers) should be under an specific namespace as kafka.message, at least to differentiate the offset from other offsets.

faec added 5 commits June 26, 2019 15:20

Add kafka input to filebeat import list

c0dbd6a

Initial skeleton of filebeat kafka input

878a9e7

Cleanup

5f6c0d9

Turn on Wait() and Stop()

1c83d55

add InitialOffset configuration parameter

7298e9c

faec requested a review from a team as a code owner July 10, 2019 16:52

faec added 10 commits July 10, 2019 15:11

Document new kafka output fields

7345c18

Add username / password and ssl config

b925014

Add parameter for client id

1f6ecc3

Add metric registry to sarama reader

207bebe

Log kafka errors

ec386a0

Add remaining kafka metadata fields

0aaa710

document new metadata fields

95e6bff

Adjust kafka producer version / test message in tests

5bb711c

Don't record BlockTimestamp if it's zero

9601e0c

Remove debug printf

2d8d37c

faec requested a review from urso July 16, 2019 18:45

faec added 6 commits July 17, 2019 11:35

make fmt

e6b8b53

Merge branch 'master' into kafka-input

a843dbc

Add kafka container to filebeat integration tests

426c98f

regenerate docs

43f1cde

Remove unused test helpers

da3eb99

Add header verification to kafka integration test

3da5f99

faec changed the title ~~[WIP] [filebeat] Add a Kafka input~~ [filebeat] Add a Kafka input Jul 18, 2019

faec added Filebeat Filebeat docs labels Jul 18, 2019

urso reviewed Jul 18, 2019

View reviewed changes

exekias mentioned this pull request Jul 22, 2019

Add S3 input to retrieve logs from AWS S3 buckets #12640

Merged

Addressing review comments

6bdb13f

faec added 3 commits August 1, 2019 12:34

Use strings for kafka headers

c903177

Make kafka message keys strings on indexing

f5dd360

Adjust config parameter names

4e73ac8

faec added a commit that referenced this pull request Aug 1, 2019

Update vendored sarama version to 1.23.1 (#13131)

08bc430

The motivation for this update is to support the IsolationFlag configuration parameter in #12850, but this PR is just a version bump with no functional changes.

faec added 4 commits August 1, 2019 15:02

Update changed config fields in docs

f2441d8

Merge branch 'master' into kafka-input

6aa839e

Add IsolationLevel config option

6ad363e

Document IsolationLevel

df98c90

urso reviewed Aug 6, 2019

View reviewed changes

working on corrected index template

a90d7c5

faec added 4 commits August 9, 2019 16:20

add compromise data layout for kafka headers

c302125

addressing review comments

6ae3210

use exponential backoff for connecting

43b5bec

shutdown outlet via the CloseRef config field

62488ef

urso reviewed Aug 10, 2019

View reviewed changes

filebeat/input/kafka/input.go Outdated Show resolved Hide resolved

faec added 3 commits August 12, 2019 15:07

Fixing backoff wait call

60f436c

Add wait_close parameter

e223a64

Update header handling in integration test

135da31

urso reviewed Aug 15, 2019

View reviewed changes

filebeat/input/kafka/input.go Show resolved Hide resolved

faec added 2 commits August 15, 2019 11:22

Fix backoff handling again...

5f64da9

Adjust what the connection backoff responds to

419ddab

urso requested a review from dedemorton August 15, 2019 16:21

urso approved these changes Aug 15, 2019

View reviewed changes

faec merged commit be940a8 into elastic:master Aug 15, 2019

jsoriano reviewed Aug 22, 2019

View reviewed changes

andresrc mentioned this pull request Sep 5, 2019

[Filebeat] Azure Module #13385

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[filebeat] Add a Kafka input #12850

[filebeat] Add a Kafka input #12850

faec commented Jul 10, 2019 •

edited

Loading

urso Jul 18, 2019

faec Jul 22, 2019

urso Jul 22, 2019 •

edited

Loading

urso commented Jul 21, 2019

faec commented Aug 1, 2019

urso Aug 6, 2019

faec Aug 9, 2019

urso Aug 10, 2019

faec commented Aug 8, 2019

urso commented Aug 15, 2019

jsoriano Aug 22, 2019 •

edited

Loading

[filebeat] Add a Kafka input #12850

[filebeat] Add a Kafka input #12850

Conversation

faec commented Jul 10, 2019 • edited Loading

urso Jul 18, 2019

Choose a reason for hiding this comment

faec Jul 22, 2019

Choose a reason for hiding this comment

urso Jul 22, 2019 • edited Loading

Choose a reason for hiding this comment

urso commented Jul 21, 2019

faec commented Aug 1, 2019

urso Aug 6, 2019

Choose a reason for hiding this comment

faec Aug 9, 2019

Choose a reason for hiding this comment

urso Aug 10, 2019

Choose a reason for hiding this comment

faec commented Aug 8, 2019

urso commented Aug 15, 2019

jsoriano Aug 22, 2019 • edited Loading

Choose a reason for hiding this comment

faec commented Jul 10, 2019 •

edited

Loading

urso Jul 22, 2019 •

edited

Loading

jsoriano Aug 22, 2019 •

edited

Loading