Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

SIGSEGV in mk_event v1.8.2 #3894

Closed
ingshtrom opened this issue Jul 30, 2021 · 2 comments
Closed

SIGSEGV in mk_event v1.8.2 #3894

ingshtrom opened this issue Jul 30, 2021 · 2 comments
Labels

Comments

@ingshtrom
Copy link

ingshtrom commented Jul 30, 2021

Bug Report

Describe the bug
I started a local test of Fluent Bit and went away for 10 minutes. I came back and it had crashed. I can reproduce this consistently with all of the data from https://github.com/ingshtrom/fluent-bit-kinesis-test. Follow the directions and run for ~5 minutes to see the crash. You will need an AWS Kinesis firehose set up, but the schema and such doesn't matter. You just need the endpoint to have fluent-bit kinesis_firehose plugin hit 😄

To Reproduce
14:31:30 -> 14:42:14

  • Rubular link if applicable:
  • Example log message if applicable:
...
fluent-bit    | [2021/07/30 14:42:12] [debug] [output:kinesis_firehose:kinesis_firehose.0] PutRecordBatch http status=200
fluent-bit    | [2021/07/30 14:42:12] [debug] [output:kinesis_firehose:kinesis_firehose.0] Sent events to test-alexh-haproxy-access-logs
fluent-bit    | [2021/07/30 14:42:13] [debug] [output:kinesis_firehose:kinesis_firehose.0] Sending 500 records
fluent-bit    | [2021/07/30 14:42:13] [debug] [output:kinesis_firehose:kinesis_firehose.0] Sending log records to delivery stream test-alexh-haproxy-access-logs
fluent-bit    | [2021/07/30 14:42:13] [ warn] [http_client] cannot increase buffer: current=4096 requested=36864 max=4096
fluent-bit    | [2021/07/30 14:42:13] [debug] [output:kinesis_firehose:kinesis_firehose.0] PutRecordBatch http status=200
fluent-bit    | [2021/07/30 14:42:13] [debug] [output:kinesis_firehose:kinesis_firehose.0] Sent events to test-alexh-haproxy-access-logs
fluent-bit    | [2021/07/30 14:42:13] [debug] [output:kinesis_firehose:kinesis_firehose.0] Sending 500 records
fluent-bit    | [2021/07/30 14:42:13] [debug] [output:kinesis_firehose:kinesis_firehose.0] Sending log records to delivery stream test-alexh-haproxy-access-logs
fluent-bit    | [2021/07/30 14:42:13] [ warn] [http_client] cannot increase buffer: current=4096 requested=36864 max=4096
fluent-bit    | [2021/07/30 14:42:13] [debug] [output:kinesis_firehose:kinesis_firehose.0] PutRecordBatch http status=200
fluent-bit    | [2021/07/30 14:42:13] [debug] [output:kinesis_firehose:kinesis_firehose.0] Sent events to test-alexh-haproxy-access-logs
fluent-bit    | [2021/07/30 14:42:13] [debug] [output:kinesis_firehose:kinesis_firehose.0] Sending 500 records
fluent-bit    | [2021/07/30 14:42:13] [debug] [output:kinesis_firehose:kinesis_firehose.0] Sending log records to delivery stream test-alexh-haproxy-access-logs
fluent-bit    | [2021/07/30 14:42:13] [ warn] [http_client] cannot increase buffer: current=4096 requested=36864 max=4096
fluent-bit    | [2021/07/30 14:42:13] [debug] [output:kinesis_firehose:kinesis_firehose.0] PutRecordBatch http status=200
fluent-bit    | [2021/07/30 14:42:13] [debug] [output:kinesis_firehose:kinesis_firehose.0] Sent events to test-alexh-haproxy-access-logs
fluent-bit    | [2021/07/30 14:42:13] [debug] [output:kinesis_firehose:kinesis_firehose.0] Sending 500 records
fluent-bit    | [2021/07/30 14:42:13] [debug] [output:kinesis_firehose:kinesis_firehose.0] Sending log records to delivery stream test-alexh-haproxy-access-logs
fluent-bit    | [2021/07/30 14:42:13] [ warn] [http_client] cannot increase buffer: current=4096 requested=36864 max=4096
fluent-bit    | [2021/07/30 14:42:13] [debug] [output:kinesis_firehose:kinesis_firehose.0] PutRecordBatch http status=200
fluent-bit    | [2021/07/30 14:42:13] [debug] [output:kinesis_firehose:kinesis_firehose.0] Sent events to test-alexh-haproxy-access-logs
fluent-bit    | [2021/07/30 14:42:14] [debug] [output:kinesis_firehose:kinesis_firehose.0] Sending 500 records
fluent-bit    | [2021/07/30 14:42:14] [debug] [output:kinesis_firehose:kinesis_firehose.0] Sending log records to delivery stream test-alexh-haproxy-access-logs
fluent-bit    | [2021/07/30 14:42:14] [ warn] [http_client] cannot increase buffer: current=4096 requested=36864 max=4096
fluent-bit    | [2021/07/30 14:42:14] [debug] [output:kinesis_firehose:kinesis_firehose.0] PutRecordBatch http status=200
fluent-bit    | [2021/07/30 14:42:14] [debug] [output:kinesis_firehose:kinesis_firehose.0] Sent events to test-alexh-haproxy-access-logs
fluent-bit    | [2021/07/30 14:42:14] [debug] [output:kinesis_firehose:kinesis_firehose.0] Sending 500 records
fluent-bit    | [2021/07/30 14:42:14] [debug] [output:kinesis_firehose:kinesis_firehose.0] Sending log records to delivery stream test-alexh-haproxy-access-logs
fluent-bit    | [2021/07/30 14:42:14] [debug] [socket] could not validate socket status for #87 (don't worry)
fluent-bit    | [2021/07/30 14:42:14] [debug] [socket] could not validate socket status for #85 (don't worry)
fluent-bit    | [2021/07/30 14:42:14] [debug] [socket] could not validate socket status for #86 (don't worry)
fluent-bit    | [2021/07/30 14:42:14] [error] [upstream] connection #-1 to firehose.us-east-1.amazonaws.com:443 timed out after 10 seconds
fluent-bit    | [2021/07/30 14:42:14] [error] [upstream] connection #69 to firehose.us-east-1.amazonaws.com:443 timed out after 10 seconds
fluent-bit    | [2021/07/30 14:42:14] [error] [upstream] connection #86 to firehose.us-east-1.amazonaws.com:443 timed out after 10 seconds
fluent-bit    | [2021/07/30 14:42:14] [error] [upstream] connection #-1 to firehose.us-east-1.amazonaws.com:443 timed out after 10 seconds
fluent-bit    | [2021/07/30 14:42:14] [error] [upstream] connection #-1 to firehose.us-east-1.amazonaws.com:443 timed out after 10 seconds
fluent-bit    | [2021/07/30 14:42:14] [error] [upstream] connection #-1 to firehose.us-east-1.amazonaws.com:443 timed out after 10 seconds
fluent-bit    | [2021/07/30 14:42:14] [error] [upstream] connection #-1 to firehose.us-east-1.amazonaws.com:443 timed out after 10 seconds
fluent-bit    | [2021/07/30 14:42:14] [debug] [task] created task=0x7f6fda83a0c0 id=10 OK
fluent-bit    | [2021/07/30 14:42:14] [debug] [output:kinesis_firehose:kinesis_firehose.0] Sending 125 records
fluent-bit    | [2021/07/30 14:42:14] [debug] [output:kinesis_firehose:kinesis_firehose.0] Sending log records to delivery stream test-alexh-haproxy-access-logs
fluent-bit    | [2021/07/30 14:42:14] [debug] [input:tail:tail.0] inode=3675717 events: IN_MODIFY
fluent-bit    | [2021/07/30 14:42:14] [debug] [task] task_id=16 reached retry-attempts limit 1/1
fluent-bit    | [2021/07/30 14:42:14] [ warn] [engine] chunk '1-1627656071.46278172.flb' cannot be retried: task_id=16, input=tail.0 > output=kinesis_firehose.0
fluent-bit    | [2021/07/30 14:42:14] [debug] [task] destroy task=0x7f6fda839f80 (task_id=16)
fluent-bit    | [2021/07/30 14:42:14] [error] [io] connect event handler error
fluent-bit    | [2021/07/30 14:42:14] [error] [net] socket #69 could not connect to firehose.us-east-1.amazonaws.com:443
fluent-bit    | [2021/07/30 14:42:14] [debug] [upstream] connection #-1 failed to firehose.us-east-1.amazonaws.com:443
fluent-bit    | [2021/07/30 14:42:14] [error] [aws_client] connection initialization error
fluent-bit    | [2021/07/30 14:42:14] [error] [output:kinesis_firehose:kinesis_firehose.0] Failed to send log records to test-alexh-haproxy-access-logs
fluent-bit    | [2021/07/30 14:42:14] [error] [output:kinesis_firehose:kinesis_firehose.0] Failed to send log records
fluent-bit    | [2021/07/30 14:42:14] [error] [output:kinesis_firehose:kinesis_firehose.0] Failed to send records
fluent-bit    | [2021/07/30 14:42:14] [debug] [output:kinesis_firehose:kinesis_firehose.0] Sending 500 records
fluent-bit    | [2021/07/30 14:42:14] [debug] [output:kinesis_firehose:kinesis_firehose.0] Sending log records to delivery stream test-alexh-haproxy-access-logs
fluent-bit    | [2021/07/30 14:42:14] [debug] [task] created task=0x7f6fda839c60 id=16 OK
fluent-bit    | [2021/07/30 14:42:14] [debug] [output:kinesis_firehose:kinesis_firehose.0] Sending 244 records
fluent-bit    | [2021/07/30 14:42:14] [debug] [output:kinesis_firehose:kinesis_firehose.0] Sending log records to delivery stream test-alexh-haproxy-access-logs
fluent-bit    | [2021/07/30 14:42:14] [debug] [out coro] cb_destroy coro_id=106
fluent-bit    | [2021/07/30 14:42:14] [debug] [retry] new retry created for task_id=6 attempts=1
fluent-bit    | [2021/07/30 14:42:14] [ warn] [engine] failed to flush chunk '1-1627656097.55952817.flb', retry in 7 seconds: task_id=6, input=tail.0 > output=kinesis_firehose.0 (out_id=0)
fluent-bit    | [2021/07/30 14:42:14] [debug] [retry] new retry created for task_id=44 attempts=1
fluent-bit    | [2021/07/30 14:42:14] [ warn] [engine] failed to flush chunk '1-1627656123.203870615.flb', retry in 8 seconds: task_id=44, input=tail.0 > output=kinesis_firehose.0 (out_id=0)
fluent-bit    | [2021/07/30 14:42:14] [engine] caught signal (SIGSEGV)
fluent-bit    | #0  0x56297680cd34      in  mk_event_add() at lib/monkey/mk_core/mk_event.c:96
fluent-bit    | #1  0x56297632fee8      in  net_connect_async() at src/flb_network.c:369
fluent-bit    | #2  0x562976330bb8      in  flb_net_tcp_connect() at src/flb_network.c:832
fluent-bit    | #3  0x56297635610a      in  flb_io_net_connect() at src/flb_io.c:89
fluent-bit    | #4  0x56297633be77      in  create_conn() at src/flb_upstream.c:497
fluent-bit    | #5  0x56297633c341      in  flb_upstream_conn_get() at src/flb_upstream.c:640
fluent-bit    | #6  0x56297642d808      in  request_do() at src/aws/flb_aws_util.c:284
fluent-bit    | #7  0x56297642d41b      in  flb_aws_client_request() at src/aws/flb_aws_util.c:160
fluent-bit    | #8  0x5629763f0ee1      in  put_record_batch() at plugins/out_kinesis_firehose/firehose_api.c:828
fluent-bit    | #9  0x5629763ef8a5      in  send_log_events() at plugins/out_kinesis_firehose/firehose_api.c:376
fluent-bit    | #10 0x5629763efc01      in  add_event() at plugins/out_kinesis_firehose/firehose_api.c:451
fluent-bit    | #11 0x5629763eff77      in  process_and_send_records() at plugins/out_kinesis_firehose/firehose_api.c:551
fluent-bit    | #12 0x5629763ee539      in  cb_firehose_flush() at plugins/out_kinesis_firehose/firehose.c:326
fluent-bit    | #13 0x5629763260b0      in  output_pre_cb_flush() at include/fluent-bit/flb_output.h:490
fluent-bit    | #14 0x56297680f186      in  co_init() at lib/monkey/deps/flb_libco/amd64.c:117
fluent-bit exited with code 139

  • Steps to reproduce the problem: Just start it? I'm sorry I don't have more information.

Expected behavior
Fluent Bit doesn't crash 😄

Screenshots

Your Environment

  • Version used: 1.8.2
  • Configuration:
[SERVICE]
  Flush        1
  Grace        0
  Log_Level    debug
  Daemon       off
  Parsers_File parsers.conf
  HTTP_Server  On
  HTTP_Listen  0.0.0.0
  HTTP_PORT    24242

[INPUT]
  Name           tail
  Path           /var/log/haproxy/access.log
  Parser         haproxy
  Tag            haproxy2
  Read_from_Head false
  Mem_Buf_Limit  200MB

[FILTER]
  Name       record_modifier
  Match      haproxy2
  Remove_key appname
  Remove_key captured_request_cookie
  Remove_key captured_response_cookie
  Remove_key facility
  Remove_key hostname
  Remove_key http_path
  Remove_key http_verb
  Remove_key http_version
  Remove_key ident
  Remove_key log_bucket
  Remove_key message
  Remove_key pid
  Remove_key pri
  Remove_key procid
  Remove_key severity
  Remove_key source_ip
  Remove_key source_type
  Remove_key worker

[OUTPUT]
  Name kinesis_firehose
  Match haproxy2
  region us-east-1
  delivery_stream test-alexh-haproxy-access-logs
  • Environment name and version (e.g. Kubernetes? What version?): Docker container running via docker compose
  • Server type and version: host is mac osx
  • Operating System and version: distroless Docker Container fluent/fluent-bit:1.8.2
  • Filters and plugins:

Additional context
parsers.conf

[PARSER]
  Name        tail
  Format      regex
  Regex       ^(?<message>.*)$

[PARSER]
  Name        syslog
  Format      regex
  Regex       ^\<(?<pri>[0-9]+)\>(?<time>[^ ]* {1,2}[^ ]* [^ ]*) (?<host>[^ ]*) (?<ident>[a-zA-Z0-9_\/\.\-]*)(?:\[(?<pid>[0-9]+)\])?(?:[^\:]*\:)? *(?<message>.*)$
  Time_Key    time
  Time_Format %b %d %H:%M:%S

# testing done manually in https://rubular.com
# put in the regex and you can paste all of the
# test log lines in order to test line-by-line
#
# reference: https://cbonte.github.io/haproxy-dconv/2.0/configuration.html#8.2.4
# haproxy example log-format:
# log-format %ci:%cp\ %tr\ %ft\ %b/%s\ %Tq/%Tw/%Tc/%Tr/%Tt\ %ST\ %B\ %CC\ %CS\ %tsc\ %ac/%fc/%bc/%sc/%rc\ %sq/%bq\ %hr\ %H\ %{+Q}r
#
# example log lines:
# 188.166.36.123:42616 [24/Feb/2021:19:46:11.018] default default/<NOSRV> -1/2089/-1/-1/2088 500 181 - - PT-- 3/3/0/0/0 0/0 {cloud-stage.docker.com|network-daemon/1.2.2 go-dockercloud/1.0.8} external-default-mq7cq "GET /api/infra/v1/fermayo2/node/a3191d02-791a-4aad-8a0a-73c82105ee77/ HTTP/1.1"
# 13.56.215.207:24846 [24/Feb/2021:19:45:56.226] default hub_gateway/10.128.84.174:80 62/0/1/20080/20143 200 4909 - - ---- 1/1/0/0/0 0/0 {hub-stage.docker.com|Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/36.0.1985.67 Safari/537.36} external-default-mq7cq "GET /api/tutum/v1/status/ HTTP/1.1"
# 34.126.103.28:57322 [25/Feb/2021:20:18:00.789] default default/<NOSRV> -1/-1/-1/-1/232 503 208 - - SC-- 3/3/0/0/0 0/0 {52.7.212.59|} external-default-c9f7g "GET / HTTP/1.1"
# 10.128.32.136:50678 [25/Feb/2021:20:42:43.417] http http/<NOSRV> 10/-1/-1/-1/10 400 196 - - PR-- 1/1/0/0/0 0/0 external-default-q9phv "GET /config/getuser?index=0 HTTP/1.1"
# 73.75.245.202:60087 [26/Feb/2021:16:12:19.470] http kubernetes-dashboard/10.128.71.165:9090 0/0/1/6/7 200 1660 - - ---- 8/7/0/0/0 0/0 {kubernetes-dashboard.proxy.stage-us-east-1.aws.dckr.io|Mozilla/5.0 (Windows NT 10.0; rv:78.0) Gecko/20100101 Firefox/78.0} "GET /api/v1/namespace HTTP/1.1"
# 127.0.0.1:33338 [26/Feb/2021:16:39:49.896] stats: stats:/<STATS> 0/0/0/1/1 200 20737 - - LR-- 5/1/0/0/0 0/0 "GET /;csv HTTP/1.1"
# 185.142.236.35:46846 [26/Feb/2021:18:15:49.504] default default/<NOSRV> -1/-1/-1/-1/80 400 183 - - CR-- 1/1/0/0/0 0/0 {|} external-registry-wv2c7 "<BADREQ>"
# 201.219.235.57:30038 [26/Feb/2021:18:28:43.360] default default/<NOSRV> -1/-1/-1/-1/0 400 183 - - PR-- 84/83/0/0/0 0/0 {|} external-default-q2tp6 "GET /"
#
[PARSER]
  Name        haproxy
  Format      regex
  Regex       ^(?<client_ip>(\d+\.){3}\d+):(?<client_port>\d+) \[(?<time>\d\d?\/\w+\/\d{2,4}:\d\d:\d\d:\d\d.\d+)\] (?<frontend_name>\S+) (?<backend_name>[\S\.-]+)\/(?<server_name>[\w\.-:<>]+) (?<time_request>[\d-]+)\/(?<time_queue>[\d-]+)\/(?<time_backend_connect>[\d-]+)\/(?<time_backend_response>[\d-]+)\/\+?(?<time_duration>[\d-]+) (?<http_status_code>\d{3}) \+?(?<bytes_read>\d+) (?<captured_request_cookie>\S+) (?<captured_response_cookie>\S+) (?<termination_state>\S+) (?<actconn>\d+)\/(?<feconn>\d+)\/(?<beconn>\d+)\/(?<srvconn>\d+)\/(?<retries>\d+) (?<srv_queue>\d+)\/(?<backend_queue>\d+)( \{(?<request_header_host>\S*)\|(?<request_header_user_agent>.*)\})? ?(?<hostname>\S+)? "(?<full_http_request><BADREQ>|(?<http_verb>\S+) (?<http_request_prefix>[^\/]+)?(?<http_path>\S+)( HTTP\/(?<http_version>\d+\.\d+))?)"(?<addtl_data>.*)$
  Time_Key    time
  Time_Format %d/%b/%Y:%H:%M:%S.%L
@github-actions
Copy link
Contributor

This issue is stale because it has been open 30 days with no activity. Remove stale label or comment or this will be closed in 5 days.

@github-actions github-actions bot added the Stale label Aug 30, 2021
@github-actions
Copy link
Contributor

github-actions bot commented Sep 4, 2021

This issue was closed because it has been stalled for 5 days with no activity.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

1 participant