Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Crash when connection to AWS CloudWatch logs drops #5246

Closed
ghost opened this issue Apr 6, 2022 · 3 comments
Closed

Crash when connection to AWS CloudWatch logs drops #5246

ghost opened this issue Apr 6, 2022 · 3 comments

Comments

@ghost
Copy link

ghost commented Apr 6, 2022

Bug Report

Describe the bug
Fluent bit 1.9.1 crashes when an HTTPS connection to AWS CloudWatch logs APIs is closed prematurely (TLS is expecting more data but gets EOF).

To Reproduce

  • Steps to reproduce the problem:
    1. Use CloudWatch Logs output module .
    2. Wait for a connection problem or provoke it.

Expected behavior
Fluent bit recovers (and retries sending logs).

Your Environment

  • Version used: 1.9.1
  • Environment name and version (e.g. Kubernetes? What version?): k8s 1.21

Additional context

stderr logs :

[2022/04/05 21:34:56] [error] [aws_client] connection initialization error
[2022/04/05 21:34:56] [error] [tls] error: unexpected EOF
[2022/04/05 21:34:56] [engine] caught signal (SIGSEGV)

#0 0x5580e6ca4c4c in __mk_list_del() at lib/monkey/include/monkey/mk_core/mk_list.h:87
#1 0x5580e6ca4c83 in mk_list_del() at lib/monkey/include/monkey/mk_core/mk_list.h:93
#2 0x5580e6ca58b7 in prepare_destroy_conn() at src/flb_upstream.c:443
#3 0x5580e6ca593a in prepare_destroy_conn_safe() at src/flb_upstream.c:469
#4 0x5580e6ca615c in cb_upstream_conn_ka_dropped() at src/flb_upstream.c:724
#5 0x5580e6c9f522 in flb_engine_start() at src/flb_engine.c:847
#6 0x5580e6c7ea34 in flb_lib_worker() at src/flb_lib.c:626
#7 0x7f18a513cea6 in ???() at ???:0
#8 0x7f18a4a1cdee in ???() at ???:0
#9 0xffffffffffffffff in ???() at ???:0
@oshelot
Copy link

oshelot commented Jun 21, 2022

I'm going to bump this instead of creating a new bug. The issue, I believe, is TLS (regardless of output). I tested the two following OUTPUT configs:

I forwarded to another local syslog server and everything worked as expected.

[OUTPUT]
    name                 syslog
    match                sys_out
    host                 localhost
    port                 13000
    mode                 tcp
    syslog_format        rfc3164
    syslog_message_key   message
    syslog_severity_key  pri
    syslog_facility_key  facility
    syslog_hostname_key  host
    syslog_appname_key   ident

The TLS config:

[OUTPUT]
    name                 syslog
    match                sys_out
    host                 *******
    port                 443
    mode                 tls
    syslog_format        rfc3164
    syslog_message_key   message
    syslog_severity_key  pri
    syslog_facility_key  facility
    syslog_hostname_key  host
    syslog_appname_key   ident
    tls.debug		 4
    tls.verify           off
    tls.ca_file          /tmp/chain.crt
    tls.crt_file         /tmp/domain.crt
    tls.key_file         /tmp/domain.key
    tls.key_passwd	 password

This errors almost immediately with:
[2022/06/21 17:15:08] [engine] caught signal (SIGSEGV)
#0 0x557ad3cce110 in flb_tls_session_create() at src/tls/flb_tls.c:334
#1 0x557ad3cdc985 in flb_io_net_connect() at src/flb_io.c:109
#2 0x557ad3cb4586 in create_conn() at src/flb_upstream.c:560
#3 0x557ad3cb4b28 in flb_upstream_conn_get() at src/flb_upstream.c:705
#4 0x557ad3da2c57 in cb_syslog_flush() at plugins/out_syslog/syslog.c:768
#5 0x557ad3c97e1b in output_pre_cb_flush() at include/fluent-bit/flb_output.h:517
#6 0x557ad42395ea in co_init() at lib/monkey/deps/flb_libco/amd64.c:117
#7 0xffffffffffffffff in ???() at ???:0

FB version 1.9.4
OS - Ubuntu 21 (esx VM)

@github-actions
Copy link
Contributor

This issue is stale because it has been open 90 days with no activity. Remove stale label or comment or this will be closed in 5 days. Maintainers can add the exempt-stale label.

@github-actions github-actions bot added the Stale label Sep 20, 2022
@github-actions
Copy link
Contributor

This issue was closed because it has been stalled for 5 days with no activity.

@github-actions github-actions bot closed this as not planned Won't fix, can't repro, duplicate, stale Sep 25, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant