You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I'm seeing very interesting bug on awsbeat(filebeat 6.2.4 built w/ golang 1.10.2 for linux-amd64).
In nutshell, most of your records have invalid data.
To reproduce this issue, send 5 distinct records to the stream:
$ foriin$(seq 5);do str="${str}${i}";echo'{"mykey":"mykey4","myvalue":"'$i'"}'| tee -a logs/foo.log;done{"mykey":"mykey4","myvalue":"1"}{"mykey":"mykey4","myvalue":"2"}{"mykey":"mykey4","myvalue":"3"}{"mykey":"mykey4","myvalue":"4"}{"mykey":"mykey4","myvalue":"5"}
Tail the stream:
kinesis-tail -stream kuokatest1
And then, you'll surprisingly see the last record({"mykey":"mykey4","myvalue":"5"}) is repeated 5x times!
ApproximateArrivalTimestamp: 2018-05-22 09:45:51 +0000 UTC
Data: {"@timestamp":"2018-05-22T09:45:50.888Z","@metadata":{"beat":"filebeat","type":"doc","version":"6.2.4"},"mykey":"mykey4","myvalue":"5","source":"/mnt/log/foo.log","beat":{"name":"bd98d11e2e3b","hostname":"bd98d11e2e3b","version":"6.2.4"},"offset":165
SequenceNumber: 49584698524577875697015278055900189069985746083771318274
ApproximateArrivalTimestamp: 2018-05-22 09:45:51 +0000 UTC
Data: {"@timestamp":"2018-05-22T09:45:50.888Z","@metadata":{"beat":"filebeat","type":"doc","version":"6.2.4"},"mykey":"mykey4","myvalue":"5","source":"/mnt/log/foo.log","beat":{"name":"bd98d11e2e3b","hostname":"bd98d11e2e3b","version":"6.2.4"},"offset":165
SequenceNumber: 49584698524577875697015278055901397995805360712946024450
ApproximateArrivalTimestamp: 2018-05-22 09:45:51 +0000 UTC
Data: {"@timestamp":"2018-05-22T09:45:50.888Z","@metadata":{"beat":"filebeat","type":"doc","version":"6.2.4"},"mykey":"mykey4","myvalue":"5","source":"/mnt/log/foo.log","beat":{"name":"bd98d11e2e3b","hostname":"bd98d11e2e3b","version":"6.2.4"},"offset":165
SequenceNumber: 49584698524577875697015278055902606921624975342120730626
ApproximateArrivalTimestamp: 2018-05-22 09:45:51 +0000 UTC
Data: {"@timestamp":"2018-05-22T09:45:50.888Z","@metadata":{"beat":"filebeat","type":"doc","version":"6.2.4"},"mykey":"mykey4","myvalue":"5","source":"/mnt/log/foo.log","beat":{"name":"bd98d11e2e3b","hostname":"bd98d11e2e3b","version":"6.2.4"},"offset":165}
SequenceNumber: 49584698524577875697015278055903815847444589971295436802
ApproximateArrivalTimestamp: 2018-05-22 09:45:51 +0000 UTC
Data: {"@timestamp":"2018-05-22T09:45:50.888Z","@metadata":{"beat":"filebeat","type":"doc","version":"6.2.4"},"mykey":"mykey4","myvalue":"5","source":"/mnt/log/foo.log","beat":{"name":"bd98d11e2e3b","hostname":"bd98d11e2e3b","version":"6.2.4"},"offset":165}
SequenceNumber: 49584698524577875697015278055905024773264204600470142978
This is due to the misuse of libbeat/outputs/codec/jsonhere.
The same thing applies to awsbeat's firehose plugin, as I have pointed out in #10.
The text was updated successfully, but these errors were encountered:
mumoshu
changed the title
bug(stream): Batched records goes fusion
bug(stream): Records sent to Kinesis Data Streams are incorrect/corrupted
May 22, 2018
mumoshu
added a commit
to mumoshu/awsbeats
that referenced
this issue
May 22, 2018
* Add instruction and example to run dockerized awsbeats w/ Kinesis Data Streams
This should be enhanced to cover firehose and other beats. This is just a starting point :)
* fix: Correct records sent to Kinesis Data Streams
Fixes#11
`glide up` wasnt necessarily but anyway I verified this to work with the latest version of aws-sdk-go
* Fix travis build
I'm seeing very interesting bug on awsbeat(filebeat 6.2.4 built w/ golang 1.10.2 for linux-amd64).
In nutshell, most of your records have invalid data.
To reproduce this issue, send 5 distinct records to the stream:
Tail the stream:
And then, you'll surprisingly see the last record(
{"mykey":"mykey4","myvalue":"5"}
) is repeated 5x times!This is due to the misuse of
libbeat/outputs/codec/json
here.You should never do this:
Instead you should do:
The json codec seems to be designed that way, as we can see in the implementation of the official kafka output plugin, and the json codec's usage of bytes.Buffer.Reset().
The same thing applies to awsbeat's firehose plugin, as I have pointed out in #10.
The text was updated successfully, but these errors were encountered: