Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

kinesis_firehose Failed to send log records randomly #2876

Closed
ivanich opened this issue Dec 18, 2020 · 10 comments
Closed

kinesis_firehose Failed to send log records randomly #2876

ivanich opened this issue Dec 18, 2020 · 10 comments
Labels

Comments

@ivanich
Copy link

ivanich commented Dec 18, 2020

Bug Report

Describe the bug
kinesis_firehose cant send log records

To Reproduce

  • configure kinesis_firehose output and check logs
    [OUTPUT]
        Name kinesis_firehose
        Match kube.*
        region ${aws_region}
        delivery_stream ${delivery_stream}
        Retry_Limit False
  • Steps to reproduce the problem:

Expected behavior
kinesis_firehose shouldn't drop data

Log

[2020/12/18 13:09:21] [debug] [output:kinesis_firehose:kinesis_firehose.1] Sending 1 records
[2020/12/18 13:09:21] [debug] [output:kinesis_firehose:kinesis_firehose.1] Sending log records to delivery stream logs-bucket
[2020/12/18 13:09:21] [debug] [http_client] not using http_proxy for header
[2020/12/18 13:09:21] [debug] [aws_credentials] Requesting credentials from the EC2 provider..
[2020/12/18 13:09:21] [debug] [http_client] header=POST / HTTP/1.1
Host: firehose.us-east-1.amazonaws.com
Content-Length: 1449
User-Agent: aws-fluent-bit-plugin
Content-Type: application/x-amz-json-1.1
X-Amz-Target: Firehose_20150804.PutRecordBatch
x-amz-date: 20201218T130921Z
x-amz-security-token: *token=*
Authorization: AWS4-HMAC-SHA256 Credential=*creds*/20201218/us-east-1/firehose/aws4_request, SignedHeaders=content-length;content-type;host;user-agent;x-amz-date;x-amz-security-token;x-amz-target, Signature=*signature*

[2020/12/18 13:09:21] [debug] [http_client] server firehose.us-east-1.amazonaws.com:443 will close connection #106
[2020/12/18 13:09:21] [debug] [aws_client] firehose.us-east-1.amazonaws.com: http_do=0, HTTP Status: 400
[2020/12/18 13:09:21] [debug] [output:kinesis_firehose:kinesis_firehose.1] PutRecordBatch http status=400
[2020/12/18 13:09:21] [error] [output:kinesis_firehose:kinesis_firehose.1] PutRecordBatch API responded with error='InvalidSignatureException', message='The request signature we calculated does not match the signature you provided. Check your AWS Secret Access Key and signing method. Consult the service documentation for details.'
[2020/12/18 13:09:21] [error] [output:kinesis_firehose:kinesis_firehose.1] Failed to send log records to logs-bucket
[2020/12/18 13:09:21] [error] [output:kinesis_firehose:kinesis_firehose.1] Failed to send log records
[2020/12/18 13:09:21] [error] [output:kinesis_firehose:kinesis_firehose.1] Failed to send records
[2020/12/18 13:09:21] [debug] [retry] re-using retry for task_id=100 attempts=2

and successful attempts

[2020/12/18 13:08:10] [debug] [output:kinesis_firehose:kinesis_firehose.1] PutRecordBatch http status=200
[2020/12/18 13:08:10] [debug] [output:kinesis_firehose:kinesis_firehose.1] Sent events to logs-bucket
[2020/12/18 13:08:10] [ info] [output:kinesis_firehose:kinesis_firehose.1] Processed 192 records, sent 192 to logs-bucket

Your Environment

  • Version used: 1.6.8
  • Configuration:
  • Environment name and version (e.g. Kubernetes? What version?): EKS 1.15
  • Server type and version:
  • Operating System and version:
  • Filters and plugins:

Additional context
Actually fluentbit sometimes able to send logs to firehose successfully but according to the log above it drops some data, also must say that data rate is pretty low ~0.2 per second. I'm also tried to change Flush from 1s to 60s with no luck

@ivanich
Copy link
Author

ivanich commented Dec 18, 2020

This was due to kube2iam, switched to kiam and issue disappeared

@ivanich ivanich closed this as completed Dec 18, 2020
@ivanich ivanich reopened this Dec 21, 2020
@ivanich
Copy link
Author

ivanich commented Dec 21, 2020

Update: Tried with kiam, and it works fine for 10 minutes and after that

[2020/12/21 17:04:41] [error] [output:kinesis_firehose:kinesis_firehose.0] PutRecordBatch API responded with error='ExpiredTokenException', message='The security token included in the request is expired'
[2020/12/21 17:04:41] [error] [output:kinesis_firehose:kinesis_firehose.0] Failed to send log records to logs-bucket
[2020/12/21 17:04:41] [error] [output:kinesis_firehose:kinesis_firehose.0] Failed to send log records
[2020/12/21 17:04:41] [error] [output:kinesis_firehose:kinesis_firehose.0] Failed to send records

fluentbit pod restart helps for another 10 minutes

Also found this in the logs which explains why it fails

[2020/12/21 19:06:18] [ warn] [aws_credentials] Credential expiration '2020-12-21T19:11:57Z' is less than10 minutes in the future. Disabling auto-refresh.
[2020/12/21 19:06:18] [ warn] [aws_credentials] 'Expiration' was invalid or could not be parsed. Disabling auto-refresh of credentials.

I don't quite understand what's wrong with aws_credentials.
So I guess issue comes from https://github.com/fluent/fluent-bit/blob/master/src/aws/flb_aws_credentials.c#L599

@github-actions
Copy link
Contributor

github-actions bot commented Mar 6, 2021

This issue is stale because it has been open 30 days with no activity. Remove stale label or comment or this will be closed in 5 days.

@github-actions github-actions bot added the Stale label Mar 6, 2021
@chadthompsonallscripts
Copy link

chadthompsonallscripts commented Mar 10, 2021

I'm also having a similar issue with the kinesis_firehose plugin.
My configuration of the output plugin is identical to the above. Here are my logs.

[2021/03/10 23:23:45] [ info] [output:kinesis_firehose:kinesis_firehose.1] Processed 3 records, sent 3 to ve-dorn-sn-ue1-firehose-kong
[2021/03/10 23:23:58] [ info] [output:kinesis_firehose:kinesis_firehose.1] Processed 3 records, sent 3 to ve-dorn-sn-ue1-firehose-kong
[2021/03/10 23:24:06] [error] [output:kinesis_firehose:kinesis_firehose.1] Failed to send log records to ve-dorn-sn-ue1-firehose-kong
[2021/03/10 23:24:06] [error] [output:kinesis_firehose:kinesis_firehose.1] Failed to send log records
[2021/03/10 23:24:06] [error] [output:kinesis_firehose:kinesis_firehose.1] Failed to send records
[2021/03/10 23:24:06] [ warn] [engine] failed to flush chunk '1-1615418645.688339006.flb', retry in 8 seconds: task_id=0, input=tail.1 > output=kinesis_firehose.1 (out_id=1)
[2021/03/10 23:24:10] [ info] [output:kinesis_firehose:kinesis_firehose.1] Processed 2 records, sent 2 to ve-dorn-sn-ue1-firehose-kong```

@agup006
Copy link
Member

agup006 commented Mar 10, 2021

@PettitWesley have you seen this issue before? I'm wondering if it is something with was credentials, does debug potentially have more messages after saying Failed

@chadthompsonallscripts
Copy link

chadthompsonallscripts commented Mar 10, 2021

Its not creds on my end, it does look at bit like its getting to the end of the log file and erroring out, then restarting when new logs come in. But I cant be sure its not dropping records.
It notify add's all the files, then processes a bunch of records, then fails, processes half a dozen records, then fails, then half dozen and fail..etc etc.

How do I enable debug for output?

@PettitWesley
Copy link
Contributor

@chadthompsonallscripts You can set the env var FLB_LOG_LEVEL=debug

@PettitWesley
Copy link
Contributor

@ivanich Please upgrade to 1.7.2- there was a bug/issue in the credential refresh behavior which affected kiam users. It's been fixed now.

@PettitWesley PettitWesley reopened this Mar 11, 2021
@github-actions github-actions bot removed the Stale label Mar 11, 2021
@github-actions
Copy link
Contributor

This issue is stale because it has been open 30 days with no activity. Remove stale label or comment or this will be closed in 5 days.

@github-actions github-actions bot added the Stale label Apr 10, 2021
@github-actions
Copy link
Contributor

This issue was closed because it has been stalled for 5 days with no activity.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

4 participants