Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

streams: Panic on collecting failed records #41

Closed
mumoshu opened this issue Jul 11, 2018 · 0 comments
Closed

streams: Panic on collecting failed records #41

mumoshu opened this issue Jul 11, 2018 · 0 comments

Comments

@mumoshu
Copy link
Collaborator

mumoshu commented Jul 11, 2018

I started to see persistent panics like the below in my production deployment:

panic: runtime error: invalid memory address or nil pointer dereference
[signal SIGSEGV: segmentation violation code=0x1 addr=0x8 pc=0x7fda4c8ce9b1]

goroutine 83 [running]:
github.com/s12v/awsbeats/streams.collectFailedEvents(0xc420ad1dd0, 0xc420f4b300, 0x32, 0x3b4, 0xc420ad1dd0, 0x0, 0x0)
        /go/src/github.com/s12v/awsbeats/streams/client.go:170 +0xd1
github.com/s12v/awsbeats/streams.(*client).publishEvents(0xc4200da930, 0xc420f4b300, 0x32, 0x3b4, 0x1d14c30, 0x0, 0x0, 0x0, 0x0)
        /go/src/github.com/s12v/awsbeats/streams/client.go:92 +0x2be
github.com/s12v/awsbeats/streams.(*client).Publish(0xc4200da930, 0x14b9860, 0xc422b27940, 0xc42008ce40, 0xc420f69f78)
        /go/src/github.com/s12v/awsbeats/streams/client.go:64 +0x45
github.com/elastic/beats/libbeat/outputs.(*backoffClient).Publish(0xc42000d8a0, 0x14b9860, 0xc422b27940, 0x0, 0x0)
        /go/src/github.com/elastic/beats/libbeat/outputs/backoff.go:43 +0x4b
github.com/elastic/beats/libbeat/publisher/pipeline.(*netClientWorker).run(0xc4200af280)
        /go/src/github.com/elastic/beats/libbeat/publisher/pipeline/output.go:90 +0x1a9
created by github.com/elastic/beats/libbeat/publisher/pipeline.makeClientWorker
        /go/src/github.com/elastic/beats/libbeat/publisher/pipeline/output.go:31 +0xf0

This indicates that we're receiving records with unexpected structures from kinesis streams. I have no reference to the concrete specification of kinesis records that I'm unable to "fix" it.

In the meantime, I'd make awsbeats gracefully degrade, that is to give up retrying failed records with unexpected structures but just leave some log messsage to help further investigation.

mumoshu added a commit to mumoshu/awsbeats that referenced this issue Jul 11, 2018
mumoshu added a commit that referenced this issue Jul 12, 2018
* fix(streams): Panic on collecting failed records

Fixes #41

* fix(ci): Fix travis build failures due to recent backward-incompatible changes in libbeat

Fixes https://travis-ci.org/s12v/awsbeats/builds/402503570

* test(streams): Add a test-case for the fix
mumoshu pushed a commit that referenced this issue Jul 25, 2018
These are similar fixes to what has been done in the streams plugin.

Please see the commit d0db8d4 as it looks like the same problem from #41 but I think this is a slightly neater way of handling it.

We've been running this branch for 3 weeks now and it's gone from crashing very often to not crashing at all.

Closes #39

Changelog:

* Properly format json for firehose

This was already done for streams in a086eea

* Fix panic on Firehose ack/retry

Not entirely sure what the problem is but we've seen panics from multiple places in the code. Mostly copying changes that were made to the streams client in ce91e04 and hoping it helps.

* Fix nil dereference in Firehose failed responses

The test fails with the old function and passes with the new one. I haven't seen the actual responses from the API but I suspect that when some records failed and others passed, only the ones that failed have an ErrorCode. So it needs to check if `r.ErrorCode != nil` before checking the value. It seems the `aws.StringValue` helper function does that and also removes the need for the other nil check.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant