Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

got unrecoverable error in primary and no secondary error_class=ArgumentError error="wrong number of arguments (given 4, expected 3)" #1716

Closed
kefiras opened this issue Apr 5, 2024 · 12 comments · Fixed by kube-logging/fluentd-images#140
Labels
bug Something isn't working

Comments

@kefiras
Copy link

kefiras commented Apr 5, 2024

Describe the bug:
Error when using syslog output

Expected behaviour:
Logs should be sent to defined syslog cluster output

Steps to reproduce the bug:
Configure below resource

apiVersion: logging.banzaicloud.io/v1beta1
kind: ClusterOutput
metadata:
  name: syslog
  namespace: logging
spec:
  syslog:
    buffer:
      timekey: 30s
      timekey_wait: 0s
    host: syslog.example.net
    insecure: true
    port: 20444
    transport: tls
apiVersion: logging.banzaicloud.io/v1beta1
kind: ClusterFlow
metadata:
  name: hosttailer-flow
  namespace: logging
spec:
  filters:
  - tag_normaliser: {}
  globalOutputRefs:
  - syslog
  match:
  - select:
      labels:
        app.kubernetes.io/name: host-tailer

Additional context:
Fluentd throws errors:

2024-04-05 11:29:00 +0000 [warn]: #0 [clusterflow:logging:hosttailer-flow:clusteroutput:logging:syslog] got unrecoverable error in primary and no secondary error_class=ArgumentError error="wrong number of arguments (given 4, expected 3)"
  2024-04-05 11:29:00 +0000 [warn]: #0 /usr/local/bundle/gems/fluentd-1.16.3/lib/fluent/plugin_helper/socket.rb:41:in `socket_create'
  2024-04-05 11:29:00 +0000 [warn]: #0 /usr/local/bundle/gems/fluent-plugin-syslog_rfc5424-0.9.0.rc.8/lib/fluent/plugin/out_syslog_rfc5424.rb:65:in `find_or_create_socket'
  2024-04-05 11:29:00 +0000 [warn]: #0 /usr/local/bundle/gems/fluent-plugin-syslog_rfc5424-0.9.0.rc.8/lib/fluent/plugin/out_syslog_rfc5424.rb:39:in `write'
  2024-04-05 11:29:00 +0000 [warn]: #0 /usr/local/bundle/gems/fluentd-1.16.3/lib/fluent/plugin/output.rb:1225:in `try_flush'
  2024-04-05 11:29:00 +0000 [warn]: #0 /usr/local/bundle/gems/fluentd-1.16.3/lib/fluent/plugin/output.rb:1538:in `flush_thread_run'
  2024-04-05 11:29:00 +0000 [warn]: #0 /usr/local/bundle/gems/fluentd-1.16.3/lib/fluent/plugin/output.rb:510:in `block (2 levels) in start'
  2024-04-05 11:29:00 +0000 [warn]: #0 /usr/local/bundle/gems/fluentd-1.16.3/lib/fluent/plugin_helper/thread.rb:78:in `block in thread_create'
2024-04-05 11:29:00 +0000 [warn]: #0 [clusterflow:logging:hosttailer-flow:clusteroutput:logging:syslog] bad chunk is moved to /buffers/backup/worker0/clusterflow_logging_hosttailer-flow_clusteroutput_logging_syslog/61557c1e8b4b20b9380467be5ff0a45b.log
2024-04-05 11:29:01 +0000 [warn]: #0 [clusterflow:logging:hosttailer-flow:clusteroutput:logging:syslog] got unrecoverable error in primary and no secondary error_class=ArgumentError error="wrong number of arguments (given 4, expected 3)"
  2024-04-05 11:29:01 +0000 [warn]: #0 /usr/local/bundle/gems/fluentd-1.16.3/lib/fluent/plugin_helper/socket.rb:41:in `socket_create'
  2024-04-05 11:29:01 +0000 [warn]: #0 /usr/local/bundle/gems/fluent-plugin-syslog_rfc5424-0.9.0.rc.8/lib/fluent/plugin/out_syslog_rfc5424.rb:65:in `find_or_create_socket'
  2024-04-05 11:29:01 +0000 [warn]: #0 /usr/local/bundle/gems/fluent-plugin-syslog_rfc5424-0.9.0.rc.8/lib/fluent/plugin/out_syslog_rfc5424.rb:39:in `write'
  2024-04-05 11:29:01 +0000 [warn]: #0 /usr/local/bundle/gems/fluentd-1.16.3/lib/fluent/plugin/output.rb:1225:in `try_flush'
  2024-04-05 11:29:01 +0000 [warn]: #0 /usr/local/bundle/gems/fluentd-1.16.3/lib/fluent/plugin/output.rb:1538:in `flush_thread_run'
  2024-04-05 11:29:01 +0000 [warn]: #0 /usr/local/bundle/gems/fluentd-1.16.3/lib/fluent/plugin/output.rb:510:in `block (2 levels) in start'
  2024-04-05 11:29:01 +0000 [warn]: #0 /usr/local/bundle/gems/fluentd-1.16.3/lib/fluent/plugin_helper/thread.rb:78:in `block in thread_create'
2024-04-05 11:29:01 +0000 [warn]: #0 [clusterflow:logging:hosttailer-flow:clusteroutput:logging:syslog] bad chunk is moved to /buffers/backup/worker0/clusterflow_logging_hosttailer-flow_clusteroutput_logging_syslog/61557c20915175b74f5d02915b7386cb.log

Environment details:

  • Kubernetes version 1.27
  • Cloud-provider/provisioner : AKS
  • logging-operator version : 4.6.0
  • Install method (e.g. helm or static manifests): helm
  • Logs from the misbehaving component (and any other relevant logs):
  • Resource definition (possibly in YAML format) that caused the issue, without sensitive data:

/kind bug

@kefiras kefiras added the bug Something isn't working label Apr 5, 2024
@pepov
Copy link
Member

pepov commented Apr 8, 2024

@kefiras this error message alone doesn't tell much about the original problem

  • have you tried looking at the referred bad chunk?
2024-04-05 11:29:01 +0000 [warn]: #0 [clusterflow:logging:hosttailer-flow:clusteroutput:logging:syslog] bad chunk is moved to /buffers/backup/worker0/clusterflow_logging_hosttailer-flow_clusteroutput_logging_syslog/61557c20915175b74f5d02915b7386cb.log
  • have you tried raising the log level? (logLevel: debug in fluentd spec)
  • have you/can you check the error/warning messages on the receiving side if there were any?

@kefiras
Copy link
Author

kefiras commented Apr 9, 2024

Debug is already enabled

bad chunk

??f?.FsN??time?2024-04-09T10:18:22.776368974Z?message?:Apr  9 10:18:22 aks-prometheus-18130450-vmss000000 kernel: [498058.497065] calico-packet: IN=azve56f4c00502 OUT=azva623c2d61aa MAC=aa:aa:aa:aa:aa:aa:6a:73:f2:79:14:75:08:00 SRC=10.244.3.144 DST=10.244.3.135 LEN=60 TOS=0x00 PREC=0x00 TTL=63 ID=35709 DF PROTO=TCP SPT=36200 DPT=2020 WINDOW=64240 RES=0x00 SYN URGP=0 ?app?host-tailer?container_image?Lrepo-aks.qa.example.net/example/linux/exm/exm/vendor/fluent/fluent-bit:2.1.8?clustername?aks1kexm1?datacenter?eastus2?env?nonprod?family?logging?mnemonic?exm?hostname?"aks-prometheus-18130450-vmss000000?namespace?logging?pod_id?$2461060d-4eb9-41ec-8fe2-eefcf4bad090?pod_name?filetail-host-tailer-phq7s?service?syslog/ $ 

I haven't checked receiving side but I doubt anything is send

Copy link

stale bot commented Jun 8, 2024

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions!

@stale stale bot added the wontfix This will not be worked on label Jun 8, 2024
@stale stale bot closed this as completed Jun 15, 2024
@liz-86
Copy link

liz-86 commented Jul 4, 2024

We encountered the same error. Is it possible to open this issue again?

@pepov pepov reopened this Jul 4, 2024
@stale stale bot removed the wontfix This will not be worked on label Jul 4, 2024
@pepov
Copy link
Member

pepov commented Jul 4, 2024

@liz-86 can you add some details to this? do you see this error with the latest image versions as well?

@liz-86
Copy link

liz-86 commented Jul 4, 2024

Yes, we tested our configuration (much the same as the above mentioned but with tcp transport and not tls) with the latest fluentd image (kube-logging/fluentd-images:v1.16-full).
Our ClusterOutput:

apiVersion: logging.banzaicloud.io/v1beta1
kind: ClusterOutput
metadata:
  name: syslog
  namespace: logging
spec:
  syslog:
    buffer:
      flush_thread_count: 16
      timekey: 1m
      timekey_use_utc: true
      timekey_wait: 30s
    format:
      type: json
    host: syslog.example.net
    insecure: true
    port: 5056
    transport: tcp

The created fluentd.conf is the following (from k8s secret loggging-operator-logging-fluentd-app):

  <match **>
    @type syslog_rfc5424
    @id clusterflow:logging:syslog-flow:clusteroutput:logging:syslog-output
    host syslog.example.net
    insecure true
    port 5056
    transport tcp
    <buffer tag,time>
      @type file
      chunk_limit_size 8MB
      flush_thread_count 16
      path /buffers/clusterflow:logging:syslog-flow:clusteroutput:logging:syslog-output.*.buffer
      retry_forever true
      timekey 1m
      timekey_use_utc true
      timekey_wait 30s
    </buffer>
    <format>
      @type json
    </format>
  </match>

@TimWelter
Copy link

Same issue here.

Provider: RKE2
Kubernetes Version: v1.27.12 +rke2r1
Chart: Logging (103.1.1+up4.4.0)

@pepov
Copy link
Member

pepov commented Jul 5, 2024

What are your fluentd and fluentbit image versions?

@pepov
Copy link
Member

pepov commented Jul 5, 2024

It seems I totally misunderstood the issue originally. I've looked at it once again and it seems that the ruby3 upgrade broke the syslog plugin because of the deprecation and removal of https://blog.saeloun.com/2019/10/07/ruby-2-7-keyword-arguments-redesign/

I've made a change here: pepov/fluent-plugin-syslog_rfc5424@6404b61

Then applied on my fork of the fluentd image here: kube-logging/fluentd-images@main...pepov:fluentd-images:main

I didn't have the time to test it with a syslog receiver, could you please give it a try with ghcr.io/pepov/fluentd:v1.16-full?

@liz-86
Copy link

liz-86 commented Jul 8, 2024

Thanks for looking into the issue. I can confirm that with the new image there are no more errors in the fluentd. I need to talk to another team to see if there are getting the desired logs. But it looks good at the moment.

Thanks again!

EDIT: All seems to be working perfectly. The other team's are getting logs. :)

@pepov
Copy link
Member

pepov commented Jul 8, 2024

thx for the confirmation, I'm making the PRs to have the fix released asap

@pepov
Copy link
Member

pepov commented Jul 8, 2024

The images have been updated with the fix with the 148th build:
v1.16-full-build.148
v1.16-full

For logging operator 4.8:
v1.16-4.8-full-build.148
v1.16-4.8-full

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants