Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

kinesis_firehose: Crashing, log loss/duplication #3917

Closed
pranavmarla opened this issue Aug 5, 2021 · 18 comments
Closed

kinesis_firehose: Crashing, log loss/duplication #3917

pranavmarla opened this issue Aug 5, 2021 · 18 comments
Labels
AWS Issues with AWS plugins or experienced by users running on AWS bug Stale

Comments

@pranavmarla
Copy link

pranavmarla commented Aug 5, 2021

Bug Report

Describe the bug
We are currently doing performance testing, sending a burst of 25,000 logs from Fluent Bit to Kinesis Firehose (via the core kinesis_firehose plugin), and Fluent Bit seems to be consistently experiencing issues sending this many logs to Firehose, ranging from dropping logs to outright crashing -- worryingly, the issues get worse with newer versions of Fluent Bit.
Specifically:

  • v1.8.0+: Crashes within 20 seconds (segmentation fault); loses logs (only manages to send a fraction of the logs before crashing)
  • v1.7.6+: Doesn't crash, but log delivery is inconsistent -- sometimes loses logs, sometimes sends more logs (i.e. sends the same log multiple times, presumably caused by Fluent Bit's retry attempts)
  • v1.7.5: Doesn't crash and doesn't lose logs, but seems to always send more/duplicate logs
    (See below for more details).

Note that, if we switch to Amazon's Fluent Bit image (and use Amazon's firehose plugin instead of the core kinesis_firehose plugin), all these issues go away. Specifically:

  • Fluent Bit doesn't crash
  • Fluent Bit doesn't lose logs
  • Fluent Bit doesn't send more/duplicate logs
    Instead, it always sends the exact number of logs that were generated.

So, the issue seems to be with the core kinesis_firehose plugin specifically.

To Reproduce
Our testing is being done on a large Ubuntu EC2 instance. Fluent Bit is present on that EC2, and sends logs to a Kinesis Firehose delivery stream in the same AWS account. To avoid proxy issues, we have created a VPC endpoint for Firehose, so that we can directly send logs from EC2 to Firehose.

  • Fluent Bit installation: For simplicity, to let us quickly try out different versions of Fluent Bit, we do not directly install Fluent Bit on the EC2 -- instead, we run the core Fluent Bit Docker image.
  • Fluent Bit config: Reads logs from a local file and sends them to Firehose.
[INPUT]
  Name              tail
  Path              /data/perf-test/logFolder-fb/*.log
  refresh_interval  2
  rotate_wait       5
  db                /data/perf-test/fluentbit-logs.db
  db.sync           normal
  db.locking        true
  buffer_chunk_size 128KB
  buffer_max_size   50MB
  skip_long_lines   on
  mem_buf_limit     199800KB
    
[FILTER]
  Name              nest
  Match             *
  Operation         nest
  Wildcard          *
  Nest_under        event

[OUTPUT]
  name              kinesis_firehose
  match             *
  region            us-east-1
  delivery_stream   Test-Firehose
  • Run Fluent Bit: We run the Fluent Bit Docker image, and mount the required files, using the following command:
docker run --rm \
    --mount type=bind,source=/data/perf-test/fluent-bit.conf,destination=/fluent-bit/etc/fluent-bit.conf,readonly \
    --mount type=bind,source=/data/perf-test/,destination=/data/perf-test/ \
    fluent/fluent-bit:1.8.3-debug
  • Test data:
    We have a file called /data/perf-test/fakeData.txt containing fake data/logs, where each log is ~1,000 bytes in size.
    Eg.:
Katherine Stone [email protected] 760-16-6504 2006-05-09T07:04:56 1974-01-19T07:04:25 Griffinborough Clinical biochemist wuPmcdIIIlPKQddacCTDMHedOKrxhgUOTyVDUjmExZqqwRmGKSwakHiKDMTlQzSvmnNmSgsJkJmtpHDBkICOrKNiNGJYftCIgNuQopxZZMXxGGXLUyCNyuWhzCKUCuuXKhxotmyulQExiufWfjQdiDwbRDUctByhAZcJPrlbGlInbpYcwCAQeJJBOZOEnKlsAOqYtNAueXfAeXFzEtssxZUTVIFTjjlspeJiBggwYuAtwlXzSScLcQNkkFUtCpGZhdPVrpiyNlmdcKkqpIjQIVjRmnnKBvzOSSPvXHLzhOeRzApvmaJtJIkYYLMhftbLioTbnWpXGIzkMzWRVimRrCJkRaqtttLcOiaOekPQgYdrByRPIZMMwfTKdftfZPHKIHJeYryUljCVolxZFdYDihpRHHFJlEwvfViRouHYZPcUihkbnVSkQLGlGzLPHLodovHJjrVqifkdNssyspCGpGHFNSeLguqpWIxMWhJMLkLDizCtqAOzDccveDvowyLrRlEECVGjqrkFTIIIOntAEXgnheqVqLnJaFBBTWcdKdlhzeixnfRZgmorTXKxeHaDDGZAWPhIpiRArxXQkArJdBjCtGNuOoDdNBMgGTLKbbEnsXWUiuEZXELXpbOfJuBSnuaAtUOlTafuSgmoPgMUjBYeaIOplfSzqRNKLnZBzZrzoAmxnfvSosZbDefkeabOAvtrXNFHYdORBJyuzjtpURVmTRzMKjwJhImztWkGcFYEqRpEjfeWfLelRaUNTiLPINZhEfcZagnLPcZrjPAATOVLOiSpwooU F Engineer 3 105000 500
Barbara Owens [email protected] 017-15-1913 2008-07-19T14:00:16 1971-11-12T07:39:36 Lake Katherine Pilot, airline rlKxQDLcXioeDMvYUaizSaSaooGlZqxofWVQaSmgambARvYwiRJGnkEvjzLUUYXJYdfXbfsAfzKYfalLwtwbqHXzzecamaoYaaUeREAZMwIVUropudeFzAgCVWSDozegeQLvWfHxFZkSLNuKGIDzdZhdsuPAGwKLdipbtAruGNtiEKrZAtFNvEvKblMMlhjxDHlqMmbkpycFwbzjILiTPtXJyuDwPbxgZJhREMcXIFzbefpGRDcXoKhxopuSpzvxYEsPwpPwATqchsIDCKoAwuasioVRoQGDtzhQGdpoepIVvLtFAefIGEbezLHwWCWVhMQqudeQuIFybUNKmpPsekTlBaomPhOPKjiYDtDICSYegUavSaWAseqQOrRsjSCEHeBiKcVnsbsncQBvlJLvyAJRWLltDJqDbNQnDmUSIeLKUnFJanNCeYGkrAvNGxixvMLFkbFUOyMgDsDOOFmmtCaWwaEUxJxYQxojGWcjbJVgAMoEdnwJIIJCXgOcrZUBMTglJblUEgHfeNRlAtSZbWiAWEzdJSqVdIcVpkPYeifNcVGDjzHlAXVjwmgjiNHIeZaYZZteUpMEaGVzwcUCSXfZwUFOhyvkCjkaRkMCFUJzlqBbdJcwwTDFSmeqduzaqzcBGZNrkuZVUQcNKFiMWRGaUlwZXVMSDLzzuXimTeolCKnRpujnKSDyfFkUOKVeXUeamySmKqnwLLTHIVetBWrqdyHhuhSbnbvNDCchfsmRMsdpSrSqPHdpCoRFPNTKrOJZCrvOyWNIXNlbFbSALQRatImfnAIt M VP 4 114000 4500
Victoria Rivas [email protected] 593-88-0695 1993-05-08T06:22:07 1979-09-14T03:01:22 Lake Lance Teaching laboratory technician sNaVptJsxvviMPJOOZrVcQffnxFqkSaQullfJWapxyMOOntudKgOHGuvCdJYtcmLbhxfMxUfXXtPdRshcnEyLuoAJdtVtTTCixHBskHCpypMvtkOcuUcIcjSRQBRmBsojeHIohvaVzWkdiSwpovDIFAlAUrvpNtHvzGhlTfbwenITMTuvloyWKcaaAWhAsCtDyKrAOSZOrIlrJjwTwKMIuGuNeQXyvSxiUnXfbGjhRQvBBfrgwjYgLJZZLdWgSjVnwllbXvsqtVSUZulzpmZnmETxnIUwoLOUAdeqdiNJmedjqGOnDNGIZwjezZaRbOXHtcjFPvRqXFMNdhrNnYRgabKzhgiooKhLCNBcGqWTdpxqhGqpSfCDJkADHDgcJGxLldqNDmZjngadcoCrASBVgZJnVaSUAExVguyKSbRzWYjJNpBoIkkPIVkBJxlOVciayQakEawwYJXNKHJZRKGzrYdDEMaHAoUDXTabEcvTSWlyoKFJVJanMvCpjLySPTevaHxSCIRRgnlAxixFZiHJEzWSWEhUISKdMUiEJtOTypDdebUnQQNcBJuIzaorYxkyeWypHpDXbwDzQfmIGpMthHZJtMuqNyogwFCMihpJIyAbubWyILAXUKSZmwxvJraZSgUQZhtJwJdJUuYBJEelBApIFOryFeftifowvfeOgEwNmOJFTQnMzTezlVrjFigZnoyrVofGGriUzwmaNhSrTUAmCEYfGygmZxarJlvnjUDNQEwkzVLsePHkBJzmdvtsGhQJBnCxMBJLJqFmtlTwiYTXdZcKqLW M Sales 17 90000 500
...
  • Test: To actually run the performance test, we run the following script (/data/perf-test/runtest.sh), which essentially reads a certain number of logs per second from /data/perf-test/fakeData.txt and writes them to /data/perf-test/logFolder-fb/test.log, from where Fluent Bit tails them and sends them to Firehose:
#!/bin/bash

LOOPCNT=$1
LOGCOUNT=$2
i=0;

sleep 5
while [ $i -lt $LOOPCNT ];
do
        head -$LOGCOUNT /data/perf-test/fakeData.txt;
        let i=i+1;
        sleep 1;
done;
sleep 5

Thus, to have the above script generate 25,000 logs (5,000 logs/second * 5 seconds) for Fluent Bit to read, we run the following command:

/data/perf-test/runtest.sh 5 5000 > /data/perf-test/logFolder-fb/test.log

Expected behavior
Since we generated 25,000 logs to a file being tailed by Fluent Bit, we expect Fluent Bit to send exactly 25,000 logs to Firehose. Instead, as mentioned above, depending on which version of (core) Fluent Bit we use, it either crashes, loses logs or sends more/duplicate logs.

If we switch to Amazon's Fluent Bit image and Amazon's firehose plugin (i.e. replace name kinesis_firehose in the above Fluent Bit config with name firehose), then all the issues go away and Fluent Bit behaves as expected -- it sends exactly 25,000 logs to Firehose.

Error Logs
Here are the logs generated by Fluent Bit, for some of the versions we tested:

  • 1.8.3: Crashes
Fluent Bit v1.8.3
* Copyright (C) 2019-2021 The Fluent Bit Authors
* Copyright (C) 2015-2018 Treasure Data
* Fluent Bit is a CNCF sub-project under the umbrella of Fluentd
* https://fluentbit.io

[2021/08/04 19:12:23] [ info] [engine] started (pid=1)
[2021/08/04 19:12:23] [ info] [storage] version=1.1.1, initializing...
[2021/08/04 19:12:23] [ info] [storage] in-memory
[2021/08/04 19:12:23] [ info] [storage] normal synchronization mode, checksum disabled, max_chunks_up=128
[2021/08/04 19:12:23] [ info] [cmetrics] version=0.1.6
[2021/08/04 19:12:23] [ info] [sp] stream processor started

[2021/08/04 19:13:52] [ info] [input:tail:tail.0] inotify_fs_add(): inode=112 watch_fd=1 name=/data/perf-test/logFolder-fb/test.log
[2021/08/04 19:13:59] [ warn] [http_client] cannot increase buffer: current=4096 requested=36864 max=4096
[2021/08/04 19:13:59] [ warn] [http_client] cannot increase buffer: current=4096 requested=36864 max=4096
[2021/08/04 19:14:00] [ warn] [http_client] cannot increase buffer: current=4096 requested=36864 max=4096
[2021/08/04 19:14:07] [ warn] [http_client] cannot increase buffer: current=4096 requested=36864 max=4096
[2021/08/04 19:14:07] [error] [src/flb_http_client.c:1170 errno=11] Resource temporarily unavailable
[2021/08/04 19:14:07] [error] [output:kinesis_firehose:kinesis_firehose.0] Failed to send log records to Test-Firehose
[2021/08/04 19:14:07] [error] [output:kinesis_firehose:kinesis_firehose.0] Failed to send log records
[2021/08/04 19:14:07] [error] [output:kinesis_firehose:kinesis_firehose.0] Failed to send records
[2021/08/04 19:14:07] [ warn] [http_client] cannot increase buffer: current=4096 requested=36864 max=4096
[2021/08/04 19:14:08] [ warn] [http_client] cannot increase buffer: current=4096 requested=36864 max=4096
[2021/08/04 19:14:08] [ warn] [http_client] cannot increase buffer: current=4096 requested=36864 max=4096
[2021/08/04 19:14:09] [ warn] [http_client] cannot increase buffer: current=4096 requested=36864 max=4096
[2021/08/04 19:14:09] [ warn] [http_client] cannot increase buffer: current=4096 requested=36864 max=4096
[2021/08/04 19:14:10] [ warn] [http_client] cannot increase buffer: current=4096 requested=36864 max=4096
[2021/08/04 19:14:10] [error] [upstream] connection #-1 to firehose.us-east-1.amazonaws.com:443 timed out after 10 seconds
[2021/08/04 19:14:10] [error] [upstream] connection #-1 to firehose.us-east-1.amazonaws.com:443 timed out after 10 seconds
[2021/08/04 19:14:10] [error] [upstream] connection #-1 to firehose.us-east-1.amazonaws.com:443 timed out after 10 seconds
[2021/08/04 19:14:10] [ warn] [engine] failed to flush chunk '1-1628104441.528346298.flb', retry in 8 seconds: task_id=10, input=tail.0 > output=kinesis_firehose.0 (out_id=0)
[2021/08/04 19:14:10] [error] [net] socket #38 could not connect to firehose.us-east-1.amazonaws.com:443
[2021/08/04 19:14:10] [engine] caught signal (SIGSEGV)
[2021/08/04 19:14:10] [  Error] epoll_ctl: Bad file descriptor, errno=9 at /tmp/fluent-bit/lib/monkey/mk_core/mk_event_epoll.c:136
#0  0x55c4d91db242      in  __mk_list_del() at lib/monkey/include/monkey/mk_core/mk_list.h:88
#1  0x55c4d91db26d      in  mk_list_del() at lib/monkey/include/monkey/mk_core/mk_list.h:93
#2  0x55c4d91dbd1f      in  prepare_destroy_conn() at src/flb_upstream.c:390
#3  0x55c4d91dbd81      in  prepare_destroy_conn_safe() at src/flb_upstream.c:412
#4  0x55c4d91dc057      in  create_conn() at src/flb_upstream.c:501
#5  0x55c4d91dc4b9      in  flb_upstream_conn_get() at src/flb_upstream.c:640
#6  0x55c4d92d0cf2      in  request_do() at src/aws/flb_aws_util.c:285
#7  0x55c4d92d0905      in  flb_aws_client_request() at src/aws/flb_aws_util.c:161
#8  0x55c4d9291d3e      in  put_record_batch() at plugins/out_kinesis_firehose/firehose_api.c:828
#9  0x55c4d9290702      in  send_log_events() at plugins/out_kinesis_firehose/firehose_api.c:376
#10 0x55c4d9290e31      in  process_and_send_records() at plugins/out_kinesis_firehose/firehose_api.c:560
#11 0x55c4d928f396      in  cb_firehose_flush() at plugins/out_kinesis_firehose/firehose.c:326
#12 0x55c4d91c60de      in  output_pre_cb_flush() at include/fluent-bit/flb_output.h:490
#13 0x55c4d96b3066      in  co_init() at lib/monkey/deps/flb_libco/amd64.c:117
  • 1.8.2: Crashes
...
[2021/07/30 18:50:33] [ info] [input:tail:tail.0] inotify_fs_add(): inode=112 watch_fd=1 name=/data/perf-test/logFolder-fb/test.log
[2021/07/30 18:50:34] [ info] [output:kinesis_firehose:kinesis_firehose.0] Processed 1 records, sent 1 to Test-Firehose
[2021/07/30 18:50:42] [ warn] [http_client] cannot increase buffer: current=4096 requested=36864 max=4096
[2021/07/30 18:50:43] [ warn] [http_client] cannot increase buffer: current=4096 requested=36864 max=4096
[2021/07/30 18:50:43] [ warn] [http_client] cannot increase buffer: current=4096 requested=36864 max=4096
[2021/07/30 18:50:44] [ warn] [http_client] cannot increase buffer: current=4096 requested=36864 max=4096
[2021/07/30 18:50:44] [ warn] [http_client] cannot increase buffer: current=4096 requested=36864 max=4096
[2021/07/30 18:50:44] [ warn] [http_client] cannot increase buffer: current=4096 requested=36864 max=4096
[2021/07/30 18:50:52] [ warn] [net] getaddrinfo(host='firehose.us-east-1.amazonaws.com', err=12): Timeout while contacting DNS servers
[2021/07/30 18:50:52] [error] [aws_client] connection initialization error
[2021/07/30 18:50:52] [error] [output:kinesis_firehose:kinesis_firehose.0] Failed to send log records to Test-Firehose
[2021/07/30 18:50:52] [error] [output:kinesis_firehose:kinesis_firehose.0] Failed to send log records
[2021/07/30 18:50:52] [error] [output:kinesis_firehose:kinesis_firehose.0] Failed to send records
[2021/07/30 18:50:52] [ warn] [engine] failed to flush chunk '1-1627671037.751893694.flb', retry in 7 seconds: task_id=0, input=tail.0 > output=kinesis_firehose.0 (out_id=0)
[2021/07/30 18:50:54] [ warn] [net] getaddrinfo(host='firehose.us-east-1.amazonaws.com', err=12): Timeout while contacting DNS servers
[2021/07/30 18:50:54] [error] [aws_client] connection initialization error
[2021/07/30 18:50:54] [error] [output:kinesis_firehose:kinesis_firehose.0] Failed to send log records to Test-Firehose
[2021/07/30 18:50:54] [error] [output:kinesis_firehose:kinesis_firehose.0] Failed to send log records
[2021/07/30 18:50:54] [error] [output:kinesis_firehose:kinesis_firehose.0] Failed to send records
[2021/07/30 18:50:54] [ warn] [engine] failed to flush chunk '1-1627671038.789867966.flb', retry in 9 seconds: task_id=3, input=tail.0 > output=kinesis_firehose.0 (out_id=0)
[2021/07/30 18:50:54] [error] [upstream] connection #-1 to firehose.us-east-1.amazonaws.com:443 timed out after 10 seconds
[2021/07/30 18:50:54] [error] [upstream] connection #-1 to firehose.us-east-1.amazonaws.com:443 timed out after 10 seconds
[2021/07/30 18:50:54] [error] [upstream] connection #-1 to firehose.us-east-1.amazonaws.com:443 timed out after 10 seconds
[2021/07/30 18:50:54] [ warn] [net] getaddrinfo(host='firehose.us-east-1.amazonaws.com', err=12): Timeout while contacting DNS servers
[2021/07/30 18:50:54] [engine] caught signal (SIGSEGV)
#0  0x5631e07a4070      in  __mk_list_del() at lib/monkey/include/monkey/mk_core/mk_list.h:87
#1  0x5631e07a40a7      in  mk_list_del() at lib/monkey/include/monkey/mk_core/mk_list.h:93
#2  0x5631e07a4b59      in  prepare_destroy_conn() at src/flb_upstream.c:390
#3  0x5631e07a4bbb      in  prepare_destroy_conn_safe() at src/flb_upstream.c:412
#4  0x5631e07a4e91      in  create_conn() at src/flb_upstream.c:501
#5  0x5631e07a52f3      in  flb_upstream_conn_get() at src/flb_upstream.c:640
#6  0x5631e089255d      in  request_do() at src/aws/flb_aws_util.c:284
#7  0x5631e0892170      in  flb_aws_client_request() at src/aws/flb_aws_util.c:160
#8  0x5631e085804c      in  put_record_batch() at plugins/out_kinesis_firehose/firehose_api.c:828
#9  0x5631e0856a10      in  send_log_events() at plugins/out_kinesis_firehose/firehose_api.c:376
#10 0x5631e0856d6c      in  add_event() at plugins/out_kinesis_firehose/firehose_api.c:451
#11 0x5631e08570e2      in  process_and_send_records() at plugins/out_kinesis_firehose/firehose_api.c:551
#12 0x5631e08556a4      in  cb_firehose_flush() at plugins/out_kinesis_firehose/firehose.c:326
#13 0x5631e078f00a      in  output_pre_cb_flush() at include/fluent-bit/flb_output.h:490
#14 0x5631e0c5f546      in  co_init() at lib/monkey/deps/flb_libco/amd64.c:117
#15 0xffffffffffffffff  in  ???() at ???:0
  • 1.7.6: Does not crash, but log delivery is inconsistent: Sometimes loses logs, sometimes sends extra/duplicate logs
Fluent Bit v1.7.6
* Copyright (C) 2019-2021 The Fluent Bit Authors
* Copyright (C) 2015-2018 Treasure Data
* Fluent Bit is a CNCF sub-project under the umbrella of Fluentd
* https://fluentbit.io

[2021/08/05 17:17:00] [ info] [engine] started (pid=1)
[2021/08/05 17:17:00] [ info] [storage] version=1.1.1, initializing...
[2021/08/05 17:17:00] [ info] [storage] in-memory
[2021/08/05 17:17:00] [ info] [storage] normal synchronization mode, checksum disabled, max_chunks_up=128
[2021/08/05 17:17:00] [ info] [sp] stream processor started

[2021/08/05 17:27:43] [ info] [input:tail:tail.0] inotify_fs_add(): inode=112 watch_fd=2 name=/data/perf-test/logFolder-fb/test.log
[2021/08/05 17:27:44] [ info] [output:kinesis_firehose:kinesis_firehose.0] Processed 1 records, sent 1 to Test-Firehose
[2021/08/05 17:27:50] [error] [output:kinesis_firehose:kinesis_firehose.0] could not pack/validate JSON API response

[2021/08/05 17:27:50] [error] [output:kinesis_firehose:kinesis_firehose.0] PutRecordBatch response could not be parsed,
[2021/08/05 17:27:50] [error] [output:kinesis_firehose:kinesis_firehose.0] Failed to send log records
[2021/08/05 17:27:50] [error] [output:kinesis_firehose:kinesis_firehose.0] Failed to send records
[2021/08/05 17:27:50] [ warn] [engine] failed to flush chunk '1-1628184468.277267041.flb', retry in 8 seconds: task_id=2, input=tail.0 > output=kinesis_firehose.0 (out_id=0)
[2021/08/05 17:27:50] [error] [output:kinesis_firehose:kinesis_firehose.0] Failed to send log records to Test-Firehose
[2021/08/05 17:27:50] [error] [output:kinesis_firehose:kinesis_firehose.0] Failed to send log records
[2021/08/05 17:27:50] [error] [output:kinesis_firehose:kinesis_firehose.0] Failed to send records
[2021/08/05 17:27:50] [error] [src/flb_http_client.c:1163 errno=11] Resource temporarily unavailable
[2021/08/05 17:27:50] [error] [output:kinesis_firehose:kinesis_firehose.0] Failed to send log records to Test-Firehose
[2021/08/05 17:27:50] [error] [output:kinesis_firehose:kinesis_firehose.0] Failed to send log records
[2021/08/05 17:27:50] [error] [output:kinesis_firehose:kinesis_firehose.0] Failed to send records
[2021/08/05 17:27:50] [ warn] [engine] failed to flush chunk '1-1628184469.283845807.flb', retry in 6 seconds: task_id=3, input=tail.0 > output=kinesis_firehose.0 (out_id=0)
[2021/08/05 17:27:50] [ warn] [engine] failed to flush chunk '1-1628184468.265588851.flb', retry in 10 seconds: task_id=1, input=tail.0 > output=kinesis_firehose.0 (out_id=0)
[2021/08/05 17:27:50] [error] [src/flb_http_client.c:1163 errno=11] Resource temporarily unavailable
[2021/08/05 17:27:50] [error] [output:kinesis_firehose:kinesis_firehose.0] Failed to send log records to Test-Firehose
[2021/08/05 17:27:50] [error] [output:kinesis_firehose:kinesis_firehose.0] Failed to send log records
[2021/08/05 17:27:50] [error] [output:kinesis_firehose:kinesis_firehose.0] Failed to send records
[2021/08/05 17:27:50] [ warn] [engine] failed to flush chunk '1-1628184468.255456489.flb', retry in 11 seconds: task_id=0, input=tail.0 > output=kinesis_firehose.0 (out_id=0)
[2021/08/05 17:27:50] [ info] [output:kinesis_firehose:kinesis_firehose.0] Processed 1455 records, sent 1455 to Test-Firehose
[2021/08/05 17:27:55] [error] [output:kinesis_firehose:kinesis_firehose.0] could not pack/validate JSON API response

[2021/08/05 17:27:55] [error] [output:kinesis_firehose:kinesis_firehose.0] PutRecordBatch response could not be parsed,
[2021/08/05 17:27:55] [error] [output:kinesis_firehose:kinesis_firehose.0] Failed to send log records
[2021/08/05 17:27:55] [error] [output:kinesis_firehose:kinesis_firehose.0] Failed to send records
[2021/08/05 17:27:55] [ warn] [engine] failed to flush chunk '1-1628184470.309479199.flb', retry in 7 seconds: task_id=5, input=tail.0 > output=kinesis_firehose.0 (out_id=0)
[2021/08/05 17:27:55] [error] [src/flb_http_client.c:1163 errno=11] Resource temporarily unavailable
[2021/08/05 17:27:55] [error] [output:kinesis_firehose:kinesis_firehose.0] Failed to send log records to Test-Firehose
[2021/08/05 17:27:55] [error] [output:kinesis_firehose:kinesis_firehose.0] Failed to send log records
[2021/08/05 17:27:55] [error] [output:kinesis_firehose:kinesis_firehose.0] Failed to send records
[2021/08/05 17:27:55] [ warn] [engine] failed to flush chunk '1-1628184471.324442944.flb', retry in 7 seconds: task_id=7, input=tail.0 > output=kinesis_firehose.0 (out_id=0)
[2021/08/05 17:27:55] [error] [src/flb_http_client.c:1163 errno=11] Resource temporarily unavailable
[2021/08/05 17:27:55] [error] [output:kinesis_firehose:kinesis_firehose.0] Failed to send log records to Test-Firehose
[2021/08/05 17:27:55] [error] [output:kinesis_firehose:kinesis_firehose.0] Failed to send log records
[2021/08/05 17:27:55] [error] [output:kinesis_firehose:kinesis_firehose.0] Failed to send records
[2021/08/05 17:27:55] [error] [output:kinesis_firehose:kinesis_firehose.0] could not pack/validate JSON API response

[2021/08/05 17:27:55] [error] [output:kinesis_firehose:kinesis_firehose.0] PutRecordBatch response could not be parsed,
[2021/08/05 17:27:55] [error] [output:kinesis_firehose:kinesis_firehose.0] Failed to send log records
[2021/08/05 17:27:55] [error] [output:kinesis_firehose:kinesis_firehose.0] Failed to send records
[2021/08/05 17:27:55] [ warn] [engine] failed to flush chunk '1-1628184472.351715342.flb', retry in 10 seconds: task_id=10, input=tail.0 > output=kinesis_firehose.0 (out_id=0)
[2021/08/05 17:27:55] [ warn] [engine] failed to flush chunk '1-1628184471.333923382.flb', retry in 6 seconds: task_id=8, input=tail.0 > output=kinesis_firehose.0 (out_id=0)
[2021/08/05 17:27:55] [error] [src/flb_http_client.c:1163 errno=11] Resource temporarily unavailable
[2021/08/05 17:27:55] [error] [output:kinesis_firehose:kinesis_firehose.0] Failed to send log records to Test-Firehose
[2021/08/05 17:27:55] [error] [output:kinesis_firehose:kinesis_firehose.0] Failed to send log records
[2021/08/05 17:27:55] [error] [output:kinesis_firehose:kinesis_firehose.0] Failed to send records
[2021/08/05 17:27:55] [ warn] [engine] failed to flush chunk '1-1628184472.342344688.flb', retry in 9 seconds: task_id=9, input=tail.0 > output=kinesis_firehose.0 (out_id=0)
[2021/08/05 17:27:55] [error] [output:kinesis_firehose:kinesis_firehose.0] Failed to send log records to Test-Firehose
[2021/08/05 17:27:55] [error] [output:kinesis_firehose:kinesis_firehose.0] Failed to send log records
[2021/08/05 17:27:55] [error] [output:kinesis_firehose:kinesis_firehose.0] Failed to send records
[2021/08/05 17:27:55] [ warn] [engine] failed to flush chunk '1-1628184470.298943708.flb', retry in 11 seconds: task_id=4, input=tail.0 > output=kinesis_firehose.0 (out_id=0)
[2021/08/05 17:27:55] [ info] [output:kinesis_firehose:kinesis_firehose.0] Processed 2121 records, sent 2121 to Test-Firehose
[2021/08/05 17:27:56] [ info] [output:kinesis_firehose:kinesis_firehose.0] Processed 2153 records, sent 2153 to Test-Firehose
[2021/08/05 17:27:58] [ info] [output:kinesis_firehose:kinesis_firehose.0] Processed 2111 records, sent 2111 to Test-Firehose
[2021/08/05 17:27:59] [ info] [output:kinesis_firehose:kinesis_firehose.0] Processed 1 records, sent 1 to Test-Firehose
[2021/08/05 17:28:00] [ info] [output:kinesis_firehose:kinesis_firehose.0] Processed 2152 records, sent 2152 to Test-Firehose
[2021/08/05 17:28:01] [error] [output:kinesis_firehose:kinesis_firehose.0] Failed to send log records to Test-Firehose
[2021/08/05 17:28:01] [error] [output:kinesis_firehose:kinesis_firehose.0] Failed to send log records
[2021/08/05 17:28:01] [error] [output:kinesis_firehose:kinesis_firehose.0] Failed to send records
[2021/08/05 17:28:01] [error] [output:kinesis_firehose:kinesis_firehose.0] Failed to send log records to Test-Firehose
[2021/08/05 17:28:01] [error] [output:kinesis_firehose:kinesis_firehose.0] Failed to send log records
[2021/08/05 17:28:01] [error] [output:kinesis_firehose:kinesis_firehose.0] Failed to send records
[2021/08/05 17:28:02] [error] [output:kinesis_firehose:kinesis_firehose.0] Failed to send log records to Test-Firehose
[2021/08/05 17:28:02] [error] [output:kinesis_firehose:kinesis_firehose.0] Failed to send log records
[2021/08/05 17:28:02] [error] [output:kinesis_firehose:kinesis_firehose.0] Failed to send records
[2021/08/05 17:28:02] [ warn] [engine] chunk '1-1628184471.324442944.flb' cannot be retried: task_id=7, input=tail.0 > output=kinesis_firehose.0
[2021/08/05 17:28:02] [ warn] [engine] chunk '1-1628184470.309479199.flb' cannot be retried: task_id=5, input=tail.0 > output=kinesis_firehose.0
[2021/08/05 17:28:02] [ warn] [engine] chunk '1-1628184468.255456489.flb' cannot be retried: task_id=0, input=tail.0 > output=kinesis_firehose.0
[2021/08/05 17:28:02] [ info] [output:kinesis_firehose:kinesis_firehose.0] Processed 2191 records, sent 2191 to Test-Firehose
[2021/08/05 17:28:04] [ info] [output:kinesis_firehose:kinesis_firehose.0] Processed 2153 records, sent 2153 to Test-Firehose
[2021/08/05 17:28:05] [ info] [output:kinesis_firehose:kinesis_firehose.0] Processed 2079 records, sent 2079 to Test-Firehose
[2021/08/05 17:28:07] [ info] [output:kinesis_firehose:kinesis_firehose.0] Processed 2151 records, sent 2151 to Test-Firehose
  • 1.7.5: Does not crash, does not lose logs, but does send extra/duplicate logs
...
[2021/07/30 21:08:23] [ info] [input:tail:tail.0] inotify_fs_add(): inode=106 watch_fd=1 name=/data/perf-test/logFolder-fb/test.log
[2021/07/30 21:08:25] [ info] [output:kinesis_firehose:kinesis_firehose.0] Processed 1 records, sent 1 to Test-Firehose
[2021/07/30 21:08:31] [error] [aws_client] connection initialization error
[2021/07/30 21:08:31] [error] [output:kinesis_firehose:kinesis_firehose.0] Failed to send log records to Test-Firehose
[2021/07/30 21:08:31] [error] [output:kinesis_firehose:kinesis_firehose.0] Failed to send log records
[2021/07/30 21:08:31] [error] [output:kinesis_firehose:kinesis_firehose.0] Failed to send records
[2021/07/30 21:08:31] [ warn] [engine] failed to flush chunk '1-1627679309.426969060.flb', retry in 7 seconds: task_id=5, input=tail.0 > output=kinesis_firehose.0 (out_id=0)
[2021/07/30 21:08:31] [ info] [output:kinesis_firehose:kinesis_firehose.0] Processed 426 records, sent 426 to Test-Firehose
[2021/07/30 21:08:31] [error] [aws_client] connection initialization error
[2021/07/30 21:08:31] [error] [output:kinesis_firehose:kinesis_firehose.0] Failed to send log records to Test-Firehose
[2021/07/30 21:08:31] [error] [output:kinesis_firehose:kinesis_firehose.0] Failed to send log records
[2021/07/30 21:08:31] [error] [output:kinesis_firehose:kinesis_firehose.0] Failed to send records
[2021/07/30 21:08:31] [ warn] [engine] failed to flush chunk '1-1627679308.411100874.flb', retry in 7 seconds: task_id=3, input=tail.0 > output=kinesis_firehose.0 (out_id=0)
[2021/07/30 21:08:31] [error] [aws_client] connection initialization error
[2021/07/30 21:08:31] [error] [output:kinesis_firehose:kinesis_firehose.0] Failed to send log records to Test-Firehose
[2021/07/30 21:08:31] [error] [output:kinesis_firehose:kinesis_firehose.0] Failed to send log records
[2021/07/30 21:08:31] [error] [output:kinesis_firehose:kinesis_firehose.0] Failed to send records
[2021/07/30 21:08:31] [error] [output:kinesis_firehose:kinesis_firehose.0] could not pack/validate JSON API response

[2021/07/30 21:08:31] [error] [output:kinesis_firehose:kinesis_firehose.0] PutRecordBatch response could not be parsed,
[2021/07/30 21:08:31] [error] [output:kinesis_firehose:kinesis_firehose.0] Failed to send log records
[2021/07/30 21:08:31] [error] [output:kinesis_firehose:kinesis_firehose.0] Failed to send records
[2021/07/30 21:08:31] [error] [aws_client] connection initialization error
[2021/07/30 21:08:31] [error] [output:kinesis_firehose:kinesis_firehose.0] Failed to send log records to Test-Firehose
[2021/07/30 21:08:31] [error] [output:kinesis_firehose:kinesis_firehose.0] Failed to send log records
[2021/07/30 21:08:31] [error] [output:kinesis_firehose:kinesis_firehose.0] Failed to send records
[2021/07/30 21:08:31] [ warn] [engine] failed to flush chunk '1-1627679310.447924521.flb', retry in 9 seconds: task_id=7, input=tail.0 > output=kinesis_firehose.0 (out_id=0)
[2021/07/30 21:08:31] [ warn] [engine] failed to flush chunk '1-1627679307.380111794.flb', retry in 7 seconds: task_id=0, input=tail.0 > output=kinesis_firehose.0 (out_id=0)
[2021/07/30 21:08:31] [ warn] [engine] failed to flush chunk '1-1627679310.455127141.flb', retry in 6 seconds: task_id=8, input=tail.0 > output=kinesis_firehose.0 (out_id=0)
[2021/07/30 21:08:31] [ info] [output:kinesis_firehose:kinesis_firehose.0] Processed 2189 records, sent 2189 to Test-Firehose
[2021/07/30 21:08:31] [ info] [output:kinesis_firehose:kinesis_firehose.0] Processed 2181 records, sent 2181 to Test-Firehose
[2021/07/30 21:08:31] [ info] [output:kinesis_firehose:kinesis_firehose.0] Processed 2202 records, sent 2202 to Test-Firehose
[2021/07/30 21:08:31] [ info] [output:kinesis_firehose:kinesis_firehose.0] Processed 2152 records, sent 2152 to Test-Firehose
[2021/07/30 21:08:36] [ info] [output:kinesis_firehose:kinesis_firehose.0] Processed 696 records, sent 696 to Test-Firehose
[2021/07/30 21:08:36] [ info] [output:kinesis_firehose:kinesis_firehose.0] Processed 2153 records, sent 2153 to Test-Firehose
[2021/07/30 21:08:36] [ info] [output:kinesis_firehose:kinesis_firehose.0] Processed 2151 records, sent 2151 to Test-Firehose
[2021/07/30 21:08:37] [ info] [output:kinesis_firehose:kinesis_firehose.0] Processed 2154 records, sent 2154 to Test-Firehose
[2021/07/30 21:08:38] [ info] [output:kinesis_firehose:kinesis_firehose.0] Processed 2238 records, sent 2238 to Test-Firehose
[2021/07/30 21:08:38] [ info] [output:kinesis_firehose:kinesis_firehose.0] Processed 2154 records, sent 2154 to Test-Firehose
[2021/07/30 21:08:38] [ info] [output:kinesis_firehose:kinesis_firehose.0] Processed 2152 records, sent 2152 to Test-Firehose
[2021/07/30 21:08:40] [ info] [output:kinesis_firehose:kinesis_firehose.0] Processed 2152 records, sent 2152 to Test-Firehose
[2021/07/30 21:08:40] [ info] [output:kinesis_firehose:kinesis_firehose.0] Processed 1 records, sent 1 to Test-Firehose
  • Amazon Fluent Bit v2.19.0 (containing core Fluent Bit v1.8.3) with Amazon firehose plugin: Works as expected -- no crashing, no log loss, no extra/duplicate logs
AWS for Fluent Bit Container Image Version 2.19.0
Fluent Bit v1.8.3
* Copyright (C) 2019-2021 The Fluent Bit Authors
* Copyright (C) 2015-2018 Treasure Data
* Fluent Bit is a CNCF sub-project under the umbrella of Fluentd
* https://fluentbit.io

[2021/08/04 19:35:46] [ info] [engine] started (pid=1)
[2021/08/04 19:35:46] [ info] [storage] version=1.1.1, initializing...
[2021/08/04 19:35:46] [ info] [storage] in-memory
[2021/08/04 19:35:46] [ info] [storage] normal synchronization mode, checksum disabled, max_chunks_up=128
[2021/08/04 19:35:46] [ info] [cmetrics] version=0.1.6
time="2021-08-04T19:35:46Z" level=info msg="A new higher performance Firehose plugin has been released; you are using the old plugin. Check out the new plugin's documentation and consider migrating.\nhttps://docs.fluentbit.io/manual/pipeline/outputs/firehose"
time="2021-08-04T19:35:46Z" level=info msg="[firehose 0] plugin parameter delivery_stream = 'Test-Firehose'"
time="2021-08-04T19:35:46Z" level=info msg="[firehose 0] plugin parameter region = 'us-east-1'"
time="2021-08-04T19:35:46Z" level=info msg="[firehose 0] plugin parameter data_keys = ''"
time="2021-08-04T19:35:46Z" level=info msg="[firehose 0] plugin parameter role_arn = ''"
time="2021-08-04T19:35:46Z" level=info msg="[firehose 0] plugin parameter endpoint = ''"
time="2021-08-04T19:35:46Z" level=info msg="[firehose 0] plugin parameter sts_endpoint = ''"
time="2021-08-04T19:35:46Z" level=info msg="[firehose 0] plugin parameter time_key = ''"
time="2021-08-04T19:35:46Z" level=info msg="[firehose 0] plugin parameter time_key_format = ''"
time="2021-08-04T19:35:46Z" level=info msg="[firehose 0] plugin parameter log_key = ''"
time="2021-08-04T19:35:46Z" level=info msg="[firehose 0] plugin parameter replace_dots = ''"
time="2021-08-04T19:35:46Z" level=info msg="[firehose 0] plugin parameter simple_aggregation = 'false'"
[2021/08/04 19:35:46] [ info] [sp] stream processor started
[2021/08/04 19:36:03] [ info] [input:tail:tail.0] inotify_fs_add(): inode=106 watch_fd=1 name=/data/perf-test/logFolder-fb/test.log

Your Environment

  • Versions:
    Core Fluent Bit Docker image versions tested:
  • 1.8.3-debug
  • 1.8.2-debug
  • 1.8.0-debug
  • 1.7.9-debug
  • 1.7.8-debug
  • 1.7.7-debug
  • 1.7.6-debug
  • 1.7.5-debug
    Amazon Fluent Bit Docker image versions tested:
  • 2.19.0 (contains Fluent Bit v1.8.3)
  • Configuration: See above
  • Operating System and version:
> uname -a
Linux ip-10-249-29-83 5.4.0-1051-aws #53~18.04.1-Ubuntu SMP Fri Jun 18 14:53:38 UTC 2021 x86_64 x86_64 x86_64 GNU/Linux
@pranavmarla
Copy link
Author

FYI @PettitWesley

@pranavmarla pranavmarla changed the title kinesis_firehose: Log loss/duplication, crashing kinesis_firehose: Crashing, log loss/duplication Aug 5, 2021
@PettitWesley
Copy link
Contributor

@pranavmarla This is very surprising. The new kinesis_firehose should perform better and support higher throughput than the old kinesis plugin.

Can you try enabling workers with the kinesis_firehose plugin?

Can you also please open an issue referencing this issue at the AWS repo, that is more or less a requirement to get AWS engineers to look into an issue: https://github.com/aws/aws-for-fluent-bit

@nokute78
Copy link
Collaborator

nokute78 commented Aug 6, 2021

Note:
Similar backtrace(v1.7.9): #3866 #3687

On v1.7.9, network back end was updated.
https://fluentbit.io/announcements/v1.7.9/
network: make async dns query use TCP socket instead of UDP

v1.7.6+: Doesn't crash, but log delivery is inconsistent -- sometimes loses logs, sometimes sends more logs (i.e. sends the same log multiple times, presumably caused by Fluent Bit's retry attempts)

DNS back end was changed from v1.7.6
https://fluentbit.io/announcements/v1.7.6/
network: new asynchronous DNS support

@pranavmarla
Copy link
Author

@pranavmarla This is very surprising. The new kinesis_firehose should perform better and support higher throughput than the old kinesis plugin.

Can you try enabling workers with the kinesis_firehose plugin?

Thanks for checking in @PettitWesley ! I re-ran the test with the core Fluent Bit Docker image (v1.8.3-debug) and the core kinesis_firehose plugin. As I increased the number of workers, Fluent Bit's stability got slightly better. Specifically:

  • No workers specified (original test case): Fluent Bit crashed after 16s (of tailing logs), lost logs
Fluent Bit v1.8.3
* Copyright (C) 2019-2021 The Fluent Bit Authors
* Copyright (C) 2015-2018 Treasure Data
* Fluent Bit is a CNCF sub-project under the umbrella of Fluentd
* https://fluentbit.io

[2021/08/06 16:21:30] [ info] [engine] started (pid=1)
[2021/08/06 16:21:30] [ info] [storage] version=1.1.1, initializing...
[2021/08/06 16:21:30] [ info] [storage] in-memory
[2021/08/06 16:21:30] [ info] [storage] normal synchronization mode, checksum disabled, max_chunks_up=128
[2021/08/06 16:21:30] [ info] [cmetrics] version=0.1.6
[2021/08/06 16:21:30] [ info] [sp] stream processor started

[2021/08/06 16:22:15] [ info] [input:tail:tail.0] inotify_fs_add(): inode=112 watch_fd=1 name=/data/perf-test/logFolder-fb/test.log
[2021/08/06 16:22:21] [ warn] [http_client] cannot increase buffer: current=4096 requested=36864 max=4096
[2021/08/06 16:22:21] [ warn] [http_client] cannot increase buffer: current=4096 requested=36864 max=4096
[2021/08/06 16:22:22] [ warn] [http_client] cannot increase buffer: current=4096 requested=36864 max=4096
[2021/08/06 16:22:29] [ warn] [http_client] cannot increase buffer: current=4096 requested=36864 max=4096
[2021/08/06 16:22:29] [ warn] [http_client] cannot increase buffer: current=4096 requested=36864 max=4096
[2021/08/06 16:22:30] [ warn] [http_client] cannot increase buffer: current=4096 requested=36864 max=4096
[2021/08/06 16:22:30] [ warn] [http_client] cannot increase buffer: current=4096 requested=36864 max=4096
[2021/08/06 16:22:31] [ warn] [http_client] cannot increase buffer: current=4096 requested=36864 max=4096
[2021/08/06 16:22:31] [ warn] [http_client] malformed HTTP response from firehose.us-east-1.amazonaws.com:443 on connection #44
[2021/08/06 16:22:31] [error] [output:kinesis_firehose:kinesis_firehose.0] Failed to send log records to Test-Firehose
[2021/08/06 16:22:31] [error] [output:kinesis_firehose:kinesis_firehose.0] Failed to send log records
[2021/08/06 16:22:31] [error] [output:kinesis_firehose:kinesis_firehose.0] Failed to send records
[2021/08/06 16:22:31] [error] [src/flb_http_client.c:1170 errno=11] Resource temporarily unavailable
[2021/08/06 16:22:31] [error] [output:kinesis_firehose:kinesis_firehose.0] Failed to send log records to Test-Firehose
[2021/08/06 16:22:31] [error] [output:kinesis_firehose:kinesis_firehose.0] Failed to send log records
[2021/08/06 16:22:31] [error] [output:kinesis_firehose:kinesis_firehose.0] Failed to send records
[2021/08/06 16:22:31] [error] [upstream] connection #-1 to firehose.us-east-1.amazonaws.com:443 timed out after 10 seconds
[2021/08/06 16:22:31] [ warn] [engine] failed to flush chunk '1-1628266942.531765227.flb', retry in 6 seconds: task_id=8, input=tail.0 > output=kinesis_firehose.0 (out_id=0)
[2021/08/06 16:22:31] [ warn] [engine] failed to flush chunk '1-1628266942.537837120.flb', retry in 10 seconds: task_id=9, input=tail.0 > output=kinesis_firehose.0 (out_id=0)
[2021/08/06 16:22:31] [engine] caught signal (SIGSEGV)
#0  0x5622832fdc24      in  mk_event_add() at lib/monkey/mk_core/mk_event.c:96
#1  0x562282e1d060      in  net_connect_async() at src/flb_network.c:369
#2  0x562282e1dd30      in  flb_net_tcp_connect() at src/flb_network.c:832
#3  0x562282e43866      in  flb_io_net_connect() at src/flb_io.c:89
#4  0x562282e28fef      in  create_conn() at src/flb_upstream.c:497
#5  0x562282e294b9      in  flb_upstream_conn_get() at src/flb_upstream.c:640
#6  0x562282f1dcf2      in  request_do() at src/aws/flb_aws_util.c:285
#7  0x562282f1d905      in  flb_aws_client_request() at src/aws/flb_aws_util.c:161
#8  0x562282eded3e      in  put_record_batch() at plugins/out_kinesis_firehose/firehose_api.c:828
#9  0x562282edd702      in  send_log_events() at plugins/out_kinesis_firehose/firehose_api.c:376
#10 0x562282edda5e      in  add_event() at plugins/out_kinesis_firehose/firehose_api.c:451
#11 0x562282edddd4      in  process_and_send_records() at plugins/out_kinesis_firehose/firehose_api.c:551
#12 0x562282edc396      in  cb_firehose_flush() at plugins/out_kinesis_firehose/firehose.c:326
#13 0x562282e130de      in  output_pre_cb_flush() at include/fluent-bit/flb_output.h:490
#14 0x562283300066      in  co_init() at lib/monkey/deps/flb_libco/amd64.c:117
  • 1 worker: Fluent Bit crashed after 21s, lost logs
Fluent Bit v1.8.3
* Copyright (C) 2019-2021 The Fluent Bit Authors
* Copyright (C) 2015-2018 Treasure Data
* Fluent Bit is a CNCF sub-project under the umbrella of Fluentd
* https://fluentbit.io

[2021/08/06 16:31:15] [ info] [engine] started (pid=1)
[2021/08/06 16:31:15] [ info] [storage] version=1.1.1, initializing...
[2021/08/06 16:31:15] [ info] [storage] in-memory
[2021/08/06 16:31:15] [ info] [storage] normal synchronization mode, checksum disabled, max_chunks_up=128
[2021/08/06 16:31:15] [ info] [cmetrics] version=0.1.6
[2021/08/06 16:31:15] [ info] [sp] stream processor started
[2021/08/06 16:31:15] [ info] [output:kinesis_firehose:kinesis_firehose.0] worker #0 started

[2021/08/06 16:31:28] [ info] [input:tail:tail.0] inotify_fs_add(): inode=103 watch_fd=1 name=/data/perf-test/logFolder-fb/test.log
[2021/08/06 16:31:29] [ info] [output:kinesis_firehose:kinesis_firehose.0] Processed 1 records, sent 1 to Test-Firehose
[2021/08/06 16:31:37] [ warn] [http_client] cannot increase buffer: current=4096 requested=36864 max=4096
[2021/08/06 16:31:37] [ warn] [http_client] cannot increase buffer: current=4096 requested=36864 max=4096
[2021/08/06 16:31:38] [ warn] [http_client] cannot increase buffer: current=4096 requested=36864 max=4096
[2021/08/06 16:31:38] [ warn] [http_client] cannot increase buffer: current=4096 requested=36864 max=4096
[2021/08/06 16:31:39] [ warn] [http_client] cannot increase buffer: current=4096 requested=36864 max=4096
[2021/08/06 16:31:40] [ warn] [http_client] cannot increase buffer: current=4096 requested=36864 max=4096
[2021/08/06 16:31:47] [ warn] [net] getaddrinfo(host='firehose.us-east-1.amazonaws.com', err=12): Timeout while contacting DNS servers
[2021/08/06 16:31:47] [error] [aws_client] connection initialization error
[2021/08/06 16:31:47] [error] [output:kinesis_firehose:kinesis_firehose.0] Failed to send log records to Test-Firehose
[2021/08/06 16:31:47] [error] [output:kinesis_firehose:kinesis_firehose.0] Failed to send log records
[2021/08/06 16:31:47] [error] [output:kinesis_firehose:kinesis_firehose.0] Failed to send records
[2021/08/06 16:31:47] [ warn] [engine] failed to flush chunk '1-1628267493.534779063.flb', retry in 8 seconds: task_id=2, input=tail.0 > output=kinesis_firehose.0 (out_id=0)
[2021/08/06 16:31:48] [ warn] [net] getaddrinfo(host='firehose.us-east-1.amazonaws.com', err=12): Timeout while contacting DNS servers
[2021/08/06 16:31:48] [error] [aws_client] connection initialization error
[2021/08/06 16:31:48] [error] [output:kinesis_firehose:kinesis_firehose.0] Failed to send log records to Test-Firehose
[2021/08/06 16:31:48] [error] [output:kinesis_firehose:kinesis_firehose.0] Failed to send log records
[2021/08/06 16:31:48] [error] [output:kinesis_firehose:kinesis_firehose.0] Failed to send records
[2021/08/06 16:31:48] [ warn] [engine] failed to flush chunk '1-1628267494.546264099.flb', retry in 6 seconds: task_id=3, input=tail.0 > output=kinesis_firehose.0 (out_id=0)
[2021/08/06 16:31:49] [ warn] [net] getaddrinfo(host='firehose.us-east-1.amazonaws.com', err=12): Timeout while contacting DNS servers
[2021/08/06 16:31:49] [error] [aws_client] connection initialization error
[2021/08/06 16:31:49] [error] [output:kinesis_firehose:kinesis_firehose.0] Failed to send log records to Test-Firehose
[2021/08/06 16:31:49] [error] [output:kinesis_firehose:kinesis_firehose.0] Failed to send log records
[2021/08/06 16:31:49] [error] [output:kinesis_firehose:kinesis_firehose.0] Failed to send records
[2021/08/06 16:31:49] [ warn] [engine] failed to flush chunk '1-1628267493.523305205.flb', retry in 11 seconds: task_id=1, input=tail.0 > output=kinesis_firehose.0 (out_id=0)
[2021/08/06 16:31:49] [ warn] [net] getaddrinfo(host='firehose.us-east-1.amazonaws.com', err=12): Timeout while contacting DNS servers
[2021/08/06 16:31:49] [error] [aws_client] connection initialization error
[2021/08/06 16:31:49] [error] [output:kinesis_firehose:kinesis_firehose.0] Failed to send log records to Test-Firehose
fluent-bit: /tmp/fluent-bit/lib/monkey/deps/flb_libco/amd64.c:121: crash: Assertion `0' failed.
[2021/08/06 16:31:49] [error] [output:kinesis_firehose:kinesis_firehose.0] Failed to send log records
[2021/08/06 16:31:49] [error] [output:kinesis_firehose:kinesis_firehose.0] Failed to send records
[2021/08/06 16:31:49] [engine] caught signal (SIGSEGV)
#0  0x7f54f0b6c611      in  ???() at ???:0
#1  0x7f54f0b6c40e      in  ???() at ???:0
#2  0x7f54f0b7a101      in  ???() at ???:0
#3  0x56453e52d089      in  crash() at lib/monkey/deps/flb_libco/amd64.c:121
#4  0xffffffffffffffff  in  ???() at ???:0
  • 2 workers: Fluent Bit crashed after 26s, lost logs
Fluent Bit v1.8.3
* Copyright (C) 2019-2021 The Fluent Bit Authors
* Copyright (C) 2015-2018 Treasure Data
* Fluent Bit is a CNCF sub-project under the umbrella of Fluentd
* https://fluentbit.io

[2021/08/06 16:33:07] [ info] [engine] started (pid=1)
[2021/08/06 16:33:07] [ info] [storage] version=1.1.1, initializing...
[2021/08/06 16:33:07] [ info] [storage] in-memory
[2021/08/06 16:33:07] [ info] [storage] normal synchronization mode, checksum disabled, max_chunks_up=128
[2021/08/06 16:33:07] [ info] [cmetrics] version=0.1.6
[2021/08/06 16:33:07] [ info] [sp] stream processor started
[2021/08/06 16:33:07] [ info] [output:kinesis_firehose:kinesis_firehose.0] worker #0 started
[2021/08/06 16:33:07] [ info] [output:kinesis_firehose:kinesis_firehose.0] worker #1 started

[2021/08/06 16:33:22] [ info] [input:tail:tail.0] inotify_fs_add(): inode=103 watch_fd=1 name=/data/perf-test/logFolder-fb/test.log
[2021/08/06 16:33:26] [ info] [output:kinesis_firehose:kinesis_firehose.0] Processed 1 records, sent 1 to Test-Firehose
[2021/08/06 16:33:36] [ warn] [http_client] cannot increase buffer: current=4096 requested=36864 max=4096
[2021/08/06 16:33:36] [ warn] [http_client] cannot increase buffer: current=4096 requested=36864 max=4096
[2021/08/06 16:33:37] [ warn] [http_client] malformed HTTP response from firehose.us-east-1.amazonaws.com:443 on connection #57
[2021/08/06 16:33:37] [error] [output:kinesis_firehose:kinesis_firehose.0] Failed to send log records to Test-Firehose
[2021/08/06 16:33:37] [error] [output:kinesis_firehose:kinesis_firehose.0] Failed to send log records
[2021/08/06 16:33:37] [error] [output:kinesis_firehose:kinesis_firehose.0] Failed to send records
[2021/08/06 16:33:37] [error] [src/flb_http_client.c:1170 errno=11] Resource temporarily unavailable
[2021/08/06 16:33:37] [error] [output:kinesis_firehose:kinesis_firehose.0] Failed to send log records to Test-Firehose
[2021/08/06 16:33:37] [error] [output:kinesis_firehose:kinesis_firehose.0] Failed to send log records
[2021/08/06 16:33:37] [error] [output:kinesis_firehose:kinesis_firehose.0] Failed to send records
[2021/08/06 16:33:37] [ warn] [engine] failed to flush chunk '1-1628267608.108375944.flb', retry in 9 seconds: task_id=3, input=tail.0 > output=kinesis_firehose.0 (out_id=0)
[2021/08/06 16:33:37] [ warn] [engine] failed to flush chunk '1-1628267609.129126356.flb', retry in 9 seconds: task_id=5, input=tail.0 > output=kinesis_firehose.0 (out_id=0)
[2021/08/06 16:33:37] [ warn] [http_client] cannot increase buffer: current=4096 requested=36864 max=4096
[2021/08/06 16:33:37] [ warn] [http_client] cannot increase buffer: current=4096 requested=36864 max=4096
[2021/08/06 16:33:37] [ warn] [http_client] cannot increase buffer: current=4096 requested=36864 max=4096
[2021/08/06 16:33:38] [ warn] [http_client] cannot increase buffer: current=4096 requested=36864 max=4096
[2021/08/06 16:33:38] [ warn] [http_client] cannot increase buffer: current=4096 requested=36864 max=4096
[2021/08/06 16:33:38] [ warn] [http_client] cannot increase buffer: current=4096 requested=36864 max=4096
[2021/08/06 16:33:39] [ warn] [http_client] cannot increase buffer: current=4096 requested=36864 max=4096
[2021/08/06 16:33:39] [ warn] [http_client] cannot increase buffer: current=4096 requested=36864 max=4096
[2021/08/06 16:33:40] [ warn] [http_client] cannot increase buffer: current=4096 requested=36864 max=4096
[2021/08/06 16:33:40] [ warn] [http_client] cannot increase buffer: current=4096 requested=36864 max=4096
[2021/08/06 16:33:41] [ warn] [http_client] cannot increase buffer: current=4096 requested=36864 max=4096
[2021/08/06 16:33:47] [ warn] [net] getaddrinfo(host='firehose.us-east-1.amazonaws.com', err=12): Timeout while contacting DNS servers
[2021/08/06 16:33:47] [error] [aws_client] connection initialization error
[2021/08/06 16:33:47] [error] [output:kinesis_firehose:kinesis_firehose.0] Failed to send log records to Test-Firehose
[2021/08/06 16:33:47] [error] [output:kinesis_firehose:kinesis_firehose.0] Failed to send log records
[2021/08/06 16:33:47] [error] [output:kinesis_firehose:kinesis_firehose.0] Failed to send records
[2021/08/06 16:33:47] [ warn] [engine] failed to flush chunk '1-1628267607.103042907.flb', retry in 10 seconds: task_id=2, input=tail.0 > output=kinesis_firehose.0 (out_id=0)
[2021/08/06 16:33:48] [error] [upstream] connection #-1 to firehose.us-east-1.amazonaws.com:443 timed out after 10 seconds
[2021/08/06 16:33:48] [error] [upstream] connection #-1 to firehose.us-east-1.amazonaws.com:443 timed out after 10 seconds
[2021/08/06 16:33:48] [ warn] [net] getaddrinfo(host='firehose.us-east-1.amazonaws.com', err=12): Timeout while contacting DNS servers
[2021/08/06 16:33:48] [engine] caught signal (SIGSEGV)
#0  0x55cf71489236      in  __mk_list_del() at lib/monkey/include/monkey/mk_core/mk_list.h:87
#1  0x55cf7148926d      in  mk_list_del() at lib/monkey/include/monkey/mk_core/mk_list.h:93
#2  0x55cf71489d1f      in  prepare_destroy_conn() at src/flb_upstream.c:390
#3  0x55cf71489d81      in  prepare_destroy_conn_safe() at src/flb_upstream.c:412
#4  0x55cf7148a057      in  create_conn() at src/flb_upstream.c:501
#5  0x55cf7148a4b9      in  flb_upstream_conn_get() at src/flb_upstream.c:640
#6  0x55cf7157ecf2      in  request_do() at src/aws/flb_aws_util.c:285
#7  0x55cf7157e905      in  flb_aws_client_request() at src/aws/flb_aws_util.c:161
#8  0x55cf7153fd3e      in  put_record_batch() at plugins/out_kinesis_firehose/firehose_api.c:828
#9  0x55cf7153e702      in  send_log_events() at plugins/out_kinesis_firehose/firehose_api.c:376
#10 0x55cf7153ea5e      in  add_event() at plugins/out_kinesis_firehose/firehose_api.c:451
#11 0x55cf7153edd4      in  process_and_send_records() at plugins/out_kinesis_firehose/firehose_api.c:551
#12 0x55cf7153d396      in  cb_firehose_flush() at plugins/out_kinesis_firehose/firehose.c:326
[2021/08/06 16:33:48] [ warn] [net] getaddrinfo(host='firehose.us-east-1.amazonaws.com', err=12): Timeout while contacting DNS servers
#13 0x55cf71476a75      in  output_pre_cb_flush() at include/fluent-bit/flb_output.h:490
#14 0x55cf71961066      in  co_init() at lib/monkey/deps/flb_libco/amd64.c:117
#15 0xffffffffffffffff  in  ???() at ???:0
  • 3 workers: Inconsistent stability -- ran test twice, got different results.
    • Round 1: For the first time, Fluent Bit did not crash, did not lose logs -- instead, sent extra/duplicate logs
        Fluent Bit v1.8.3
      * Copyright (C) 2019-2021 The Fluent Bit Authors
      * Copyright (C) 2015-2018 Treasure Data
      * Fluent Bit is a CNCF sub-project under the umbrella of Fluentd
      * https://fluentbit.io
    
      [2021/08/06 16:34:56] [ info] [engine] started (pid=1)
      [2021/08/06 16:34:56] [ info] [storage] version=1.1.1, initializing...
      [2021/08/06 16:34:56] [ info] [storage] in-memory
      [2021/08/06 16:34:56] [ info] [storage] normal synchronization mode, checksum disabled, max_chunks_up=128
      [2021/08/06 16:34:56] [ info] [cmetrics] version=0.1.6
      [2021/08/06 16:34:56] [ info] [output:kinesis_firehose:kinesis_firehose.0] worker #0 started
      [2021/08/06 16:34:56] [ info] [sp] stream processor started
      [2021/08/06 16:34:56] [ info] [output:kinesis_firehose:kinesis_firehose.0] worker #1 started
      [2021/08/06 16:34:56] [ info] [output:kinesis_firehose:kinesis_firehose.0] worker #2 started
    
      [2021/08/06 16:35:05] [ info] [input:tail:tail.0] inotify_fs_add(): inode=103 watch_fd=1 name=/data/perf-test/logFolder-fb/test.log
      [2021/08/06 16:35:13] [ warn] [http_client] cannot increase buffer: current=4096 requested=36864 max=4096
      [2021/08/06 16:35:13] [ warn] [http_client] cannot increase buffer: current=4096 requested=36864 max=4096
      [2021/08/06 16:35:13] [ warn] [http_client] cannot increase buffer: current=4096 requested=36864 max=4096
      [2021/08/06 16:35:14] [ warn] [http_client] cannot increase buffer: current=4096 requested=36864 max=4096
      [2021/08/06 16:35:14] [ warn] [http_client] cannot increase buffer: current=4096 requested=36864 max=4096
      [2021/08/06 16:35:14] [error] [src/flb_http_client.c:1170 errno=32] Broken pipe
      [2021/08/06 16:35:14] [error] [output:kinesis_firehose:kinesis_firehose.0] Failed to send log records to Test-Firehose
      [2021/08/06 16:35:14] [ warn] [engine] failed to flush chunk '1-1628267708.941337308.flb', retry in 11 seconds: task_id=2, input=tail.0 > output=kinesis_firehose.0 (out_id=0)
      [2021/08/06 16:35:14] [error] [output:kinesis_firehose:kinesis_firehose.0] Failed to send log records
      [2021/08/06 16:35:14] [error] [output:kinesis_firehose:kinesis_firehose.0] Failed to send records
      [2021/08/06 16:35:15] [ warn] [http_client] cannot increase buffer: current=4096 requested=36864 max=4096
      [2021/08/06 16:35:15] [ warn] [http_client] cannot increase buffer: current=4096 requested=36864 max=4096
      [2021/08/06 16:35:16] [ warn] [http_client] cannot increase buffer: current=4096 requested=36864 max=4096
      [2021/08/06 16:35:16] [ warn] [http_client] cannot increase buffer: current=4096 requested=36864 max=4096
      [2021/08/06 16:35:20] [ warn] [http_client] cannot increase buffer: current=4096 requested=36864 max=4096
      [2021/08/06 16:35:20] [ info] [output:kinesis_firehose:kinesis_firehose.0] Processed 1349 records, sent 1349 to Test-Firehose
      [2021/08/06 16:35:20] [ warn] [http_client] cannot increase buffer: current=4096 requested=36864 max=4096
      [2021/08/06 16:35:20] [ warn] [http_client] cannot increase buffer: current=4096 requested=36864 max=4096
      [2021/08/06 16:35:20] [ warn] [http_client] cannot increase buffer: current=4096 requested=36864 max=4096
      [2021/08/06 16:35:21] [ warn] [http_client] cannot increase buffer: current=4096 requested=36864 max=4096
      [2021/08/06 16:35:21] [ warn] [http_client] cannot increase buffer: current=4096 requested=36864 max=4096
      [2021/08/06 16:35:22] [ warn] [http_client] malformed HTTP response from firehose.us-east-1.amazonaws.com:443 on connection #64
      [2021/08/06 16:35:22] [error] [output:kinesis_firehose:kinesis_firehose.0] Failed to send log records to Test-Firehose
      [2021/08/06 16:35:22] [error] [output:kinesis_firehose:kinesis_firehose.0] Failed to send log records
      [2021/08/06 16:35:22] [error] [output:kinesis_firehose:kinesis_firehose.0] Failed to send records
      [2021/08/06 16:35:22] [ warn] [engine] failed to flush chunk '1-1628267705.723954790.flb', retry in 6 seconds: task_id=0, input=tail.0 > output=kinesis_firehose.0 (out_id=0)
      [2021/08/06 16:35:22] [ info] [output:kinesis_firehose:kinesis_firehose.0] Processed 1 records, sent 1 to Test-Firehose
      [2021/08/06 16:35:22] [ warn] [http_client] cannot increase buffer: current=4096 requested=36864 max=4096
      [2021/08/06 16:35:22] [ warn] [http_client] cannot increase buffer: current=4096 requested=36864 max=4096
      [2021/08/06 16:35:23] [ warn] [http_client] cannot increase buffer: current=4096 requested=36864 max=4096
      [2021/08/06 16:35:23] [ warn] [http_client] cannot increase buffer: current=4096 requested=36864 max=4096
      [2021/08/06 16:35:24] [ warn] [http_client] cannot increase buffer: current=4096 requested=36864 max=4096
      [2021/08/06 16:35:24] [ warn] [http_client] cannot increase buffer: current=4096 requested=36864 max=4096
      [2021/08/06 16:35:24] [error] [src/flb_http_client.c:1170 errno=11] Resource temporarily unavailable
      [2021/08/06 16:35:24] [error] [output:kinesis_firehose:kinesis_firehose.0] Failed to send log records to Test-Firehose
      [2021/08/06 16:35:24] [error] [output:kinesis_firehose:kinesis_firehose.0] Failed to send log records
      [2021/08/06 16:35:24] [error] [output:kinesis_firehose:kinesis_firehose.0] Failed to send records
      [2021/08/06 16:35:24] [ warn] [engine] failed to flush chunk '1-1628267712.30102437.flb', retry in 7 seconds: task_id=9, input=tail.0 > output=kinesis_firehose.0 (out_id=0)
      [2021/08/06 16:35:24] [ warn] [http_client] malformed HTTP response from firehose.us-east-1.amazonaws.com:443 on connection #68
      [2021/08/06 16:35:24] [ warn] [engine] failed to flush chunk '1-1628267709.948331479.flb', retry in 11 seconds: task_id=3, input=tail.0 > output=kinesis_firehose.0 (out_id=0)
      [2021/08/06 16:35:24] [error] [output:kinesis_firehose:kinesis_firehose.0] Failed to send log records to Test-Firehose
      [2021/08/06 16:35:24] [error] [output:kinesis_firehose:kinesis_firehose.0] Failed to send log records
      [2021/08/06 16:35:24] [error] [output:kinesis_firehose:kinesis_firehose.0] Failed to send records
      [2021/08/06 16:35:25] [ warn] [http_client] cannot increase buffer: current=4096 requested=36864 max=4096
      [2021/08/06 16:35:26] [ warn] [http_client] cannot increase buffer: current=4096 requested=36864 max=4096
      [2021/08/06 16:35:27] [ warn] [http_client] cannot increase buffer: current=4096 requested=36864 max=4096
      [2021/08/06 16:35:27] [ info] [output:kinesis_firehose:kinesis_firehose.0] Processed 2153 records, sent 2153 to Test-Firehose
      [2021/08/06 16:35:27] [ warn] [http_client] cannot increase buffer: current=4096 requested=36864 max=4096
      [2021/08/06 16:35:27] [ warn] [http_client] cannot increase buffer: current=4096 requested=36864 max=4096
      [2021/08/06 16:35:28] [ warn] [http_client] cannot increase buffer: current=4096 requested=36864 max=4096
      [2021/08/06 16:35:28] [ warn] [http_client] cannot increase buffer: current=4096 requested=36864 max=4096
      [2021/08/06 16:35:28] [ warn] [http_client] cannot increase buffer: current=4096 requested=36864 max=4096
      [2021/08/06 16:35:30] [ warn] [http_client] cannot increase buffer: current=4096 requested=36864 max=4096
      [2021/08/06 16:35:31] [ warn] [http_client] cannot increase buffer: current=4096 requested=36864 max=4096
      [2021/08/06 16:35:32] [ warn] [http_client] cannot increase buffer: current=4096 requested=36864 max=4096
      [2021/08/06 16:35:32] [ warn] [http_client] cannot increase buffer: current=4096 requested=36864 max=4096
      [2021/08/06 16:35:32] [ warn] [http_client] cannot increase buffer: current=4096 requested=36864 max=4096
      [2021/08/06 16:35:33] [ warn] [http_client] cannot increase buffer: current=4096 requested=36864 max=4096
      [2021/08/06 16:35:33] [ warn] [http_client] cannot increase buffer: current=4096 requested=36864 max=4096
      [2021/08/06 16:35:34] [ warn] [http_client] cannot increase buffer: current=4096 requested=36864 max=4096
      [2021/08/06 16:35:35] [ warn] [http_client] cannot increase buffer: current=4096 requested=36864 max=4096
      [2021/08/06 16:35:35] [ info] [output:kinesis_firehose:kinesis_firehose.0] Processed 2153 records, sent 2153 to Test-Firehose
      [2021/08/06 16:35:35] [ warn] [http_client] cannot increase buffer: current=4096 requested=36864 max=4096
      [2021/08/06 16:35:37] [ warn] [http_client] cannot increase buffer: current=4096 requested=36864 max=4096
      [2021/08/06 16:35:37] [ info] [output:kinesis_firehose:kinesis_firehose.0] Processed 2151 records, sent 2151 to Test-Firehose
      [2021/08/06 16:35:37] [ warn] [http_client] cannot increase buffer: current=4096 requested=36864 max=4096
      [2021/08/06 16:35:37] [ warn] [http_client] cannot increase buffer: current=4096 requested=36864 max=4096
      [2021/08/06 16:35:37] [ warn] [http_client] cannot increase buffer: current=4096 requested=36864 max=4096
      [2021/08/06 16:35:37] [ warn] [http_client] cannot increase buffer: current=4096 requested=36864 max=4096
      [2021/08/06 16:35:37] [ warn] [http_client] cannot increase buffer: current=4096 requested=36864 max=4096
      [2021/08/06 16:35:38] [ warn] [http_client] cannot increase buffer: current=4096 requested=36864 max=4096
      [2021/08/06 16:35:38] [ warn] [http_client] cannot increase buffer: current=4096 requested=36864 max=4096
      [2021/08/06 16:35:38] [ info] [output:kinesis_firehose:kinesis_firehose.0] Processed 2162 records, sent 2162 to Test-Firehose
      [2021/08/06 16:35:38] [ warn] [http_client] cannot increase buffer: current=4096 requested=36864 max=4096
      [2021/08/06 16:35:38] [ warn] [http_client] cannot increase buffer: current=4096 requested=36864 max=4096
      [2021/08/06 16:35:38] [ warn] [http_client] cannot increase buffer: current=4096 requested=36864 max=4096
      [2021/08/06 16:35:38] [ info] [output:kinesis_firehose:kinesis_firehose.0] Processed 2041 records, sent 2041 to Test-Firehose
      [2021/08/06 16:35:38] [ warn] [http_client] cannot increase buffer: current=4096 requested=36864 max=4096
      [2021/08/06 16:35:38] [ info] [output:kinesis_firehose:kinesis_firehose.0] Processed 2153 records, sent 2153 to Test-Firehose
      [2021/08/06 16:35:38] [ warn] [http_client] cannot increase buffer: current=4096 requested=36864 max=4096
      [2021/08/06 16:35:39] [ warn] [http_client] cannot increase buffer: current=4096 requested=36864 max=4096
      [2021/08/06 16:35:39] [ warn] [http_client] cannot increase buffer: current=4096 requested=36864 max=4096
      [2021/08/06 16:35:40] [ warn] [http_client] cannot increase buffer: current=4096 requested=36864 max=4096
      [2021/08/06 16:35:40] [ warn] [http_client] cannot increase buffer: current=4096 requested=36864 max=4096
      [2021/08/06 16:35:40] [ warn] [http_client] cannot increase buffer: current=4096 requested=36864 max=4096
      [2021/08/06 16:35:40] [ info] [output:kinesis_firehose:kinesis_firehose.0] Processed 2188 records, sent 2188 to Test-Firehose
      [2021/08/06 16:35:41] [ warn] [http_client] cannot increase buffer: current=4096 requested=36864 max=4096
      [2021/08/06 16:35:42] [ warn] [http_client] cannot increase buffer: current=4096 requested=36864 max=4096
      [2021/08/06 16:35:48] [ warn] [http_client] cannot increase buffer: current=4096 requested=36864 max=4096
      [2021/08/06 16:35:48] [ info] [output:kinesis_firehose:kinesis_firehose.0] Processed 2109 records, sent 2109 to Test-Firehose
      [2021/08/06 16:35:48] [ warn] [http_client] cannot increase buffer: current=4096 requested=36864 max=4096
      [2021/08/06 16:35:48] [ warn] [http_client] cannot increase buffer: current=4096 requested=36864 max=4096
      [2021/08/06 16:35:48] [ warn] [http_client] cannot increase buffer: current=4096 requested=36864 max=4096
      [2021/08/06 16:35:52] [ warn] [http_client] cannot increase buffer: current=4096 requested=36864 max=4096
      [2021/08/06 16:35:52] [ info] [output:kinesis_firehose:kinesis_firehose.0] Processed 2152 records, sent 2152 to Test-Firehose
      [2021/08/06 16:35:52] [ warn] [http_client] cannot increase buffer: current=4096 requested=36864 max=4096
      [2021/08/06 16:35:52] [ info] [output:kinesis_firehose:kinesis_firehose.0] Processed 2236 records, sent 2236 to Test-Firehose
      [2021/08/06 16:35:52] [ warn] [http_client] cannot increase buffer: current=4096 requested=36864 max=4096
      [2021/08/06 16:35:52] [ info] [output:kinesis_firehose:kinesis_firehose.0] Processed 2154 records, sent 2154 to Test-Firehose
    
    • Round 2: Fluent Bit crashed after 26s, lost logs
        ...
      [2021/08/06 16:47:11] [ info] [input:tail:tail.0] inotify_fs_add(): inode=103 watch_fd=2 name=/data/perf-test/logFolder-fb/test.log
      [2021/08/06 16:47:15] [ info] [output:kinesis_firehose:kinesis_firehose.0] Processed 1 records, sent 1 to Test-Firehose
      [2021/08/06 16:47:22] [error] [src/flb_http_client.c:1170 errno=32] Broken pipe
      [2021/08/06 16:47:22] [error] [output:kinesis_firehose:kinesis_firehose.0] Failed to send log records to Test-Firehose
      [2021/08/06 16:47:22] [error] [output:kinesis_firehose:kinesis_firehose.0] Failed to send log records
      [2021/08/06 16:47:22] [error] [output:kinesis_firehose:kinesis_firehose.0] Failed to send records
      [2021/08/06 16:47:23] [ warn] [engine] failed to flush chunk '1-1628268436.567145506.flb', retry in 7 seconds: task_id=2, input=tail.0 > output=kinesis_firehose.0 (out_id=0)
      [2021/08/06 16:47:26] [ warn] [http_client] cannot increase buffer: current=4096 requested=36864 max=4096
      [2021/08/06 16:47:26] [ warn] [http_client] cannot increase buffer: current=4096 requested=36864 max=4096
      [2021/08/06 16:47:26] [ warn] [http_client] cannot increase buffer: current=4096 requested=36864 max=4096
      [2021/08/06 16:47:27] [ warn] [http_client] cannot increase buffer: current=4096 requested=36864 max=4096
      [2021/08/06 16:47:27] [ warn] [http_client] cannot increase buffer: current=4096 requested=36864 max=4096
      [2021/08/06 16:47:27] [ warn] [http_client] cannot increase buffer: current=4096 requested=36864 max=4096
      [2021/08/06 16:47:28] [ warn] [http_client] cannot increase buffer: current=4096 requested=36864 max=4096
      [2021/08/06 16:47:28] [ warn] [http_client] cannot increase buffer: current=4096 requested=36864 max=4096
      [2021/08/06 16:47:28] [ warn] [http_client] cannot increase buffer: current=4096 requested=36864 max=4096
      [2021/08/06 16:47:29] [ warn] [http_client] cannot increase buffer: current=4096 requested=36864 max=4096
      [2021/08/06 16:47:29] [ warn] [http_client] cannot increase buffer: current=4096 requested=36864 max=4096
      [2021/08/06 16:47:30] [ warn] [http_client] cannot increase buffer: current=4096 requested=36864 max=4096
      [2021/08/06 16:47:32] [ warn] [http_client] cannot increase buffer: current=4096 requested=36864 max=4096
      [2021/08/06 16:47:32] [ warn] [http_client] cannot increase buffer: current=4096 requested=36864 max=4096
      [2021/08/06 16:47:32] [ warn] [http_client] cannot increase buffer: current=4096 requested=36864 max=4096
      [2021/08/06 16:47:33] [ warn] [http_client] cannot increase buffer: current=4096 requested=36864 max=4096
      [2021/08/06 16:47:37] [error] [upstream] connection #-1 to firehose.us-east-1.amazonaws.com:443 timed out after 10 seconds
      [2021/08/06 16:47:37] [ warn] [net] getaddrinfo(host='firehose.us-east-1.amazonaws.com', err=12): Timeout while contacting DNS servers
      [2021/08/06 16:47:37] [engine] caught signal (SIGSEGV)
      #0  0x55c6e96de236      in  __mk_list_del() at lib/monkey/include/monkey/mk_core/mk_list.h:87
      #1  0x55c6e96de26d      in  mk_list_del() at lib/monkey/include/monkey/mk_core/mk_list.h:93
      #2  0x55c6e96ded1f      in  prepare_destroy_conn() at src/flb_upstream.c:390
      #3  0x55c6e96ded81      in  prepare_destroy_conn_safe() at src/flb_upstream.c:412
      #4  0x55c6e96df057      in  create_conn() at src/flb_upstream.c:501
      #5  0x55c6e96df4b9      in  flb_upstream_conn_get() at src/flb_upstream.c:640
      #6  0x55c6e97d3cf2      in  request_do() at src/aws/flb_aws_util.c:285
      #7  0x55c6e97d3905      in  flb_aws_client_request() at src/aws/flb_aws_util.c:161
      #8  0x55c6e9794d3e      in  put_record_batch() at plugins/out_kinesis_firehose/firehose_api.c:828
      #9  0x55c6e9793702      in  send_log_events() at plugins/out_kinesis_firehose/firehose_api.c:376
      #10 0x55c6e9793a5e      in  add_event() at plugins/out_kinesis_firehose/firehose_api.c:451
      #11 0x55c6e9793dd4      in  process_and_send_records() at plugins/out_kinesis_firehose/firehose_api.c:551
      #12 0x55c6e9792396      in  cb_firehose_flush() at plugins/out_kinesis_firehose/firehose.c:326
      #13 0x55c6e96cba75      in  output_pre_cb_flush() at include/fluent-bit/flb_output.h:490
      #14 0x55c6e9bb6066      in  co_init() at lib/monkey/deps/flb_libco/amd64.c:117
      #15 0xffffffffffffffff  in  ???() at ???:0
    

Can you also please open an issue referencing this issue at the AWS repo, that is more or less a requirement to get AWS engineers to look into an issue: https://github.com/aws/aws-for-fluent-bit

Done: aws/aws-for-fluent-bit#219

@PettitWesley PettitWesley added AWS Issues with AWS plugins or experienced by users running on AWS bug labels Aug 9, 2021
@PettitWesley
Copy link
Contributor

@edsiper These crashes look like they might be core networking issues... do any of these reports look like other issues you've seen in other plugins?

@xLitil
Copy link

xLitil commented Aug 12, 2021

Hello,

A similar error occured with forward :

[2021/08/12 14:49:33] [error] [upstream] connection #-1 to log.dns20.com:1520 timed out after 10 seconds
[2021/08/12 14:49:33] [engine] caught signal (SIGSEGV)
#0  0x4d46f2            in  __mk_list_del() at lib/monkey/include/monkey/mk_core/mk_list.h:87
#1  0x4d4728            in  mk_list_del() at lib/monkey/include/monkey/mk_core/mk_list.h:93
#2  0x4d5a30            in  flb_upstream_conn_release() at src/flb_upstream.c:680
#3  0x547df5            in  cb_forward_flush() at plugins/out_forward/forward.c:1229
#4  0x4bfcdb            in  output_pre_cb_flush() at include/fluent-bit/flb_output.h:490
#5  0x8fffe5            in  co_init() at lib/monkey/deps/flb_libco/amd64.c:117
#6  0xffffffffffffffff  in  ???() at ???:0

Configuration :

[OUTPUT]
    Name                 forward
    Alias                log_forward
    Match                *
    Host                 log.dns20.com
    Port                 1520
    Require_ack_response True
{
  "fluent-bit": {
    "version": "1.8.3",
    "edition": "Community",
    "flags": [
      "FLB_HAVE_PARSER",
      "FLB_HAVE_RECORD_ACCESSOR",
      "FLB_HAVE_STREAM_PROCESSOR",
      "FLB_HAVE_TLS",
      "FLB_HAVE_OPENSSL",
      "FLB_HAVE_AWS",
      "FLB_HAVE_AWS_CREDENTIAL_PROCESS",
      "FLB_HAVE_SIGNV4",
      "FLB_HAVE_SQLDB",
      "FLB_HAVE_METRICS",
      "FLB_HAVE_HTTP_SERVER",
      "FLB_HAVE_FORK",
      "FLB_HAVE_GMTOFF",
      "FLB_HAVE_UNIX_SOCKET",
      "FLB_HAVE_PROXY_GO",
      "FLB_HAVE_JEMALLOC",
      "FLB_HAVE_LIBBACKTRACE",
      "FLB_HAVE_REGEX",
      "FLB_HAVE_UTF8_ENCODER",
      "FLB_HAVE_LUAJIT",
      "FLB_HAVE_C_TLS",
      "FLB_HAVE_ACCEPT4",
      "FLB_HAVE_INOTIFY"
    ]
  }
}

@hossain-rayhan
Copy link
Contributor

This problem is getting interesting. I tested with the exact same setup as described in this issue. I also got the similar warning message [ warn] [http_client] cannot increase buffer: current=4096 requested=36864 max=4096.

However, fluent-bit was able to send logs successfully and did not crash. I kept it running for around 20 minutes with different payloads. I tested with the same image fluent/fluent-bit:1.8.3-debug. Following are the snapshots of my outputs.

Fluent Bit v1.8.3
* Copyright (C) 2019-2021 The Fluent Bit Authors
* Copyright (C) 2015-2018 Treasure Data
* Fluent Bit is a CNCF sub-project under the umbrella of Fluentd
* https://fluentbit.io

[2021/08/22 00:26:22] [ info] [engine] started (pid=1)
[2021/08/22 00:26:22] [ info] [storage] version=1.1.1, initializing...
[2021/08/22 00:26:22] [ info] [storage] in-memory
[2021/08/22 00:26:22] [ info] [storage] normal synchronization mode, checksum disabled, max_chunks_up=128
[2021/08/22 00:26:22] [ info] [cmetrics] version=0.1.6
[2021/08/22 00:26:22] [ info] [sp] stream processor started
[2021/08/22 00:26:22] [ info] [input:tail:tail.0] inotify_fs_add(): inode=12664761 watch_fd=1 name=/data/perf-test/logFolder-fb/test.log
[2021/08/22 00:26:32] [ info] [output:kinesis_firehose:kinesis_firehose.0] Processed 12 records, sent 12 to fluentbit-firehose-plugin-test
[2021/08/22 00:26:37] [ info] [output:kinesis_firehose:kinesis_firehose.0] Processed 3 records, sent 3 to fluentbit-firehose-plugin-test
[2021/08/22 00:26:47] [ info] [output:kinesis_firehose:kinesis_firehose.0] Processed 9 records, sent 9 to fluentbit-firehose-plugin-test
[2021/08/22 00:26:52] [ info] [output:kinesis_firehose:kinesis_firehose.0] Processed 6 records, sent 6 to fluentbit-firehose-plugin-test
[2021/08/22 00:29:27] [ info] [output:kinesis_firehose:kinesis_firehose.0] Processed 8 records, sent 8 to fluentbit-firehose-plugin-test
[2021/08/22 00:29:32] [ info] [output:kinesis_firehose:kinesis_firehose.0] Processed 12 records, sent 12 to fluentbit-firehose-plugin-test
[2021/08/22 00:30:12] [ info] [output:kinesis_firehose:kinesis_firehose.0] Processed 12 records, sent 12 to fluentbit-firehose-plugin-test
[2021/08/22 00:30:17] [ info] [output:kinesis_firehose:kinesis_firehose.0] Processed 8 records, sent 8 to fluentbit-firehose-plugin-test
[2021/08/22 00:31:12] [ info] [output:kinesis_firehose:kinesis_firehose.0] Processed 12 records, sent 12 to fluentbit-firehose-plugin-test
[2021/08/22 00:31:17] [ warn] [http_client] cannot increase buffer: current=4096 requested=36864 max=4096
[2021/08/22 00:31:17] [ info] [output:kinesis_firehose:kinesis_firehose.0] Processed 20 records, sent 20 to fluentbit-firehose-plugin-test
.
.
.
[2021/08/22 00:56:51] [ info] [output:kinesis_firehose:kinesis_firehose.0] Processed 110 records, sent 110 to fluentbit-firehose-plugin-test
[2021/08/22 00:56:56] [ warn] [http_client] cannot increase buffer: current=4096 requested=36864 max=4096
[2021/08/22 00:56:56] [ info] [output:kinesis_firehose:kinesis_firehose.0] Processed 55 records, sent 55 to fluentbit-firehose-plugin-test
.
.
[2021/08/22 01:07:31] [ warn] [http_client] cannot increase buffer: current=4096 requested=36864 max=4096
[2021/08/22 01:07:31] [ info] [output:kinesis_firehose:kinesis_firehose.0] Processed 55 records, sent 55 to fluentbit-firehose-plugin-test
[2021/08/22 01:07:36] [ warn] [http_client] cannot increase buffer: current=4096 requested=36864 max=4096
[2021/08/22 01:07:36] [ info] [output:kinesis_firehose:kinesis_firehose.0] Processed 55 records, sent 55 to fluentbit-firehose-plugin-test
.
.
[2021/08/22 01:22:25] [ info] [output:kinesis_firehose:kinesis_firehose.0] Processed 1295 records, sent 1295 to fluentbit-firehose-plugin-test
[2021/08/22 01:22:29] [ warn] [http_client] cannot increase buffer: current=4096 requested=36864 max=4096
[2021/08/22 01:22:30] [ warn] [http_client] cannot increase buffer: current=4096 requested=36864 max=4096
[2021/08/22 01:22:31] [ warn] [http_client] cannot increase buffer: current=4096 requested=36864 max=4096
[2021/08/22 01:22:31] [ info] [output:kinesis_firehose:kinesis_firehose.0] Processed 1295 records, sent 1295 to fluentbit-firehose-plugin-test
[2021/08/22 01:22:34] [ warn] [http_client] cannot increase buffer: current=4096 requested=36864 max=4096
[2021/08/22 01:22:35] [ warn] [http_client] cannot increase buffer: current=4096 requested=36864 max=4096
[2021/08/22 01:22:39] [ warn] [http_client] cannot increase buffer: current=4096 requested=36864 max=4096
[2021/08/22 01:22:39] [ warn] [http_client] cannot increase buffer: current=4096 requested=36864 max=4096
[2021/08/22 01:22:39] [ info] [output:kinesis_firehose:kinesis_firehose.0] Processed 1295 records, sent 1295 to fluentbit-firehose-plugin-test

@hossain-rayhan
Copy link
Contributor

Hi @edsiper, I tested this with multiple workload size. it can easily handle a lot of 2MB to 5 MB/second. However it starts crashing randomly when the the load is increased to ~10MB/second. I used the exact same config and data generator from this issue. My environment was Amazon Linux 2.

I believe something is wrong with our core networking module.

Fluent Bit v1.8.3
* Copyright (C) 2019-2021 The Fluent Bit Authors
* Copyright (C) 2015-2018 Treasure Data
* Fluent Bit is a CNCF sub-project under the umbrella of Fluentd
* https://fluentbit.io

[2021/08/25 00:04:14] [ info] [engine] started (pid=1)
[2021/08/25 00:04:14] [ info] [storage] version=1.1.1, initializing...
[2021/08/25 00:04:14] [ info] [storage] in-memory
[2021/08/25 00:04:14] [ info] [storage] normal synchronization mode, checksum disabled, max_chunks_up=128
[2021/08/25 00:04:14] [ info] [cmetrics] version=0.1.6
[2021/08/25 00:04:14] [ info] [sp] stream processor started
[2021/08/25 00:04:14] [ info] [input:tail:tail.0] inotify_fs_add(): inode=12664761 watch_fd=1 name=/data/perf-test/logFolder-fb/test.log
[2021/08/25 00:05:25] [ warn] [http_client] cannot increase buffer: current=4096 requested=36864 max=4096
[2021/08/25 00:05:25] [ warn] [http_client] cannot increase buffer: current=4096 requested=36864 max=4096
[2021/08/25 00:05:26] [ warn] [http_client] cannot increase buffer: current=4096 requested=36864 max=4096
[2021/08/25 00:05:26] [ warn] [http_client] cannot increase buffer: current=4096 requested=36864 max=4096
[2021/08/25 00:05:27] [ warn] [http_client] cannot increase buffer: current=4096 requested=36864 max=4096
[2021/08/25 00:05:27] [ warn] [http_client] cannot increase buffer: current=4096 requested=36864 max=4096
[2021/08/25 00:05:28] [ warn] [http_client] cannot increase buffer: current=4096 requested=36864 max=4096
[2021/08/25 00:05:29] [ warn] [http_client] cannot increase buffer: current=4096 requested=36864 max=4096
[2021/08/25 00:05:29] [error] [upstream] connection #-1 to firehose.us-west-2.amazonaws.com:443 timed out after 10 seconds
[2021/08/25 00:05:29] [engine] caught signal (SIGSEGV)
[2021/08/25 00:05:29] [  Error] epoll_ctl: Bad file descriptor, errno=9 at /tmp/fluent-bit/lib/monkey/mk_core/mk_event_epoll.c:136
#0  0x555c367081ee      in  __mk_list_add() at lib/monkey/include/monkey/mk_core/mk_list.h:59
#1  0x555c3670821e      in  mk_list_add() at lib/monkey/include/monkey/mk_core/mk_list.h:64
#2  0x555c36708d3a      in  prepare_destroy_conn() at src/flb_upstream.c:393
#3  0x555c36708d81      in  prepare_destroy_conn_safe() at src/flb_upstream.c:412
#4  0x555c36709057      in  create_conn() at src/flb_upstream.c:501
#5  0x555c367094b9      in  flb_upstream_conn_get() at src/flb_upstream.c:640
#6  0x555c367fdcf2      in  request_do() at src/aws/flb_aws_util.c:285
#7  0x555c367fd905      in  flb_aws_client_request() at src/aws/flb_aws_util.c:161
#8  0x555c367bed3e      in  put_record_batch() at plugins/out_kinesis_firehose/firehose_api.c:828
#9  0x555c367bd702      in  send_log_events() at plugins/out_kinesis_firehose/firehose_api.c:376
#10 0x555c367bda5e      in  add_event() at plugins/out_kinesis_firehose/firehose_api.c:451
#11 0x555c367bddd4      in  process_and_send_records() at plugins/out_kinesis_firehose/firehose_api.c:551
#12 0x555c367bc396      in  cb_firehose_flush() at plugins/out_kinesis_firehose/firehose.c:326
#13 0x555c366f30de      in  output_pre_cb_flush() at include/fluent-bit/flb_output.h:490
#14 0x555c36be0066      in  co_init() at lib/monkey/deps/flb_libco/amd64.c:117

@wchaws
Copy link

wchaws commented Sep 18, 2021

I have the same issue.

@hegderohit89
Copy link

Hi @edsiper , we are getting similar reports from our users with Splunk.
We recently upgraded the fBit from v1.7.4 to v1.8.3 to have the fix for #3758
But we are still seeing the SIGSEGV errors on this version as well.

The setup is running fBit v1.8.3 and sending logs to upstream Splunk server. Here are the fBit process logs:

[2021/09/23 17:46:43] [ warn] [http_client] malformed HTTP response from vlp-splnk-<MASKED>:8088 on connection #160
[2021/09/23 17:46:43] [ warn] [output:splunk:splunk.0] http_do=-1
[2021/09/23 17:46:43] [debug] [upstream] KA connection #160 to vlp-splnk-<MASKED>:8088 is now available
[2021/09/23 17:46:43] [ warn] [http_client] cannot increase buffer: current=2000000 requested=2032768 max=2000000
[2021/09/23 17:46:43] [debug] [socket] could not validate socket status for #165 (don't worry)
[2021/09/23 17:46:43] [debug] [upstream] KA connection #160 to vlp-splnk-<MASKED>:8088 has been disconnected by the remote service
[2021/09/23 17:46:43] [debug] [out coro] cb_destroy coro_id=459
[2021/09/23 17:46:43] [debug] [out coro] cb_destroy coro_id=460
[2021/09/23 17:46:43] [debug] [retry] new retry created for task_id=1 attempts=1
[2021/09/23 17:46:43] [ warn] [engine] failed to flush chunk '9-1632419203.266570457.flb', retry in 8 seconds: task_id=1, input=tail.0 > output=splunk.0 (out_id=0)
[2021/09/23 17:46:43] [ warn] [http_client] cannot increase buffer: current=2000000 requested=2032768 max=2000000
[2021/09/23 17:46:43] [ warn] [http_client] cannot increase buffer: current=2000000 requested=2032768 max=2000000
[2021/09/23 17:46:43] [ warn] [http_client] cannot increase buffer: current=2000000 requested=2032768 max=2000000
[2021/09/23 17:46:43] [debug] [upstream] KA connection #162 to vlp-splnk-<MASKED>:8088 is now available
[2021/09/23 17:46:43] [debug] [upstream] KA connection #156 to vlp-splnk-<MASKED>:8088 is now available
[2021/09/23 17:46:43] [debug] [socket] could not validate socket status for #160 (don't worry)
[2021/09/23 17:46:43] [debug] [socket] could not validate socket status for #154 (don't worry)
[2021/09/23 17:46:43] [engine] caught signal (SIGSEGV)
[2021/09/23 17:46:43] [debug] [socket] could not validate socket status for #163 (don't worry)
[2021/09/23 17:46:43] [debug] [socket] could not validate socket status for #161 (don't worry)
#0  0x5624aa716abd      in  tcache_dalloc_large() at lib/jemalloc-5.2.1/include/jemalloc/internal/tcache_inlines.h:210
#1  0x5624aa716abd      in  arena_dalloc_large() at lib/jemalloc-5.2.1/include/jemalloc/internal/arena_inlines_b.h:276
#2  0x5624aa716abd      in  arena_dalloc() at lib/jemalloc-5.2.1/include/jemalloc/internal/arena_inlines_b.h:323
#3  0x5624aa716abd      in  idalloctm() at lib/jemalloc-5.2.1/include/jemalloc/internal/jemalloc_internal_inlines_c.h:118
#4  0x5624aa716abd      in  ifree() at lib/jemalloc-5.2.1/src/jemalloc.c:2586
#5  0x5624aa716abd      in  je_free_default() at lib/jemalloc-5.2.1/src/jemalloc.c:2790
#6  0x7fc4f6b55fdc      in  ???() at ???:0
#7  0x7fc4f6c34c05      in  ???() at ???:0
#8  0x7fc4f6c34de4      in  ???() at ???:0
#9  0x7fc4f6c0f700      in  ???() at ???:0
#10 0x7fc4f6c2fc9e      in  ???() at ???:0
#11 0x7fc4f6c34e9e      in  ???() at ???:0
#12 0x7fc4f6c34bd1      in  ???() at ???:0
#13 0x7fc4f6c34e9e      in  ???() at ???:0
#14 0x7fc4f6c34bd1      in  ???() at ???:0
#15 0x7fc4f6c34de4      in  ???() at ???:0
#16 0x7fc4f6f88c67      in  ???() at ???:0
#17 0x7fc4f6f85fa0      in  ???() at ???:0
#18 0x5624aa7c6cdb      in  tls_session_destroy() at src/tls/openssl.c:338
#19 0x5624aa7c784c      in  flb_tls_session_destroy() at src/tls/flb_tls.c:394
#20 0x5624aa7b66dc      in  destroy_conn() at src/flb_upstream.c:425
#21 0x5624aa7b73c6      in  flb_upstream_conn_pending_destroy() at src/flb_upstream.c:815
#22 0x5624aa7b751b      in  flb_upstream_conn_pending_destroy_list() at src/flb_upstream.c:865
#23 0x5624aa7b0711      in  flb_engine_start() at src/flb_engine.c:717
#24 0x5624aa794f1d      in  flb_lib_worker() at src/flb_lib.c:628
#25 0x7fc4f724c608      in  ???() at ???:0
#26 0x7fc4f6467292      in  ???() at ???:0
#27 0xffffffffffffffff  in  ???() at ???:0
/docker-entrypoint.sh: line 19:     9 Aborted                 (core dumped) /fluent-bit/bin/fluent-bit -c /fluent-bit/etc/fluent-bit.conf

Any update/insight on this issue would be appreciated Thanks.

@github-actions
Copy link
Contributor

This issue is stale because it has been open 30 days with no activity. Remove stale label or comment or this will be closed in 5 days.

@PettitWesley
Copy link
Contributor

@pranavmarla @hossain-rayhan This issue can be closed now/soon? We fixed the issues in our latest releases?

@hossain-rayhan
Copy link
Contributor

Mostly we tested with v1.7.5. I haven't tested with the latest realese yet. I also have another open issue #4040. I will test and update on that issue.

@github-actions
Copy link
Contributor

github-actions bot commented Dec 2, 2021

This issue is stale because it has been open 30 days with no activity. Remove stale label or comment or this will be closed in 5 days.

@github-actions github-actions bot added the Stale label Dec 2, 2021
@PettitWesley
Copy link
Contributor

@hossain-rayhan @pranavmarla Is this one okay to let it close automatically?

@hossain-rayhan
Copy link
Contributor

I have two other open issues for tracking the broken pipe and connection errors with Firehose plugin on high load. Some of the concerns with v1.8.+ might have been fixed by upstream (as I don't see any segfault or crash). @pranavmarla can tell wheater he wants to keep it open and track.

Other open issues:
#4040 and #4332

@pranavmarla
Copy link
Author

Thanks @PettitWesley and @hossain-rayhan -- as long as the remaining issues are being tracked, it should be fine to let this close

@PettitWesley
Copy link
Contributor

Cool. I will close it. (The stale label will be ignored because we commented).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
AWS Issues with AWS plugins or experienced by users running on AWS bug Stale
Projects
None yet
Development

No branches or pull requests

8 participants