Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Stackdriver Output] Debugging #4159

Closed
lmuhlha opened this issue Oct 4, 2021 · 20 comments
Closed

[Stackdriver Output] Debugging #4159

lmuhlha opened this issue Oct 4, 2021 · 20 comments

Comments

@lmuhlha
Copy link

lmuhlha commented Oct 4, 2021

Bug Report

Describe the bug
We are trying to troubleshoot sending logs via FluentBit's Stackdriver output. No logs are being output, there are no output errors but lots of retries and failed retries.

# HELP fluentbit_input_records_total Number of input records.
# TYPE fluentbit_input_records_total counter
fluentbit_input_records_total{name="containers"} 9005
# HELP fluentbit_output_errors_total Number of output errors.
# TYPE fluentbit_output_errors_total counter
fluentbit_output_errors_total{name="stackdriver-export-all"} 0
fluentbit_output_errors_total{name="stackdriver-export-kube"} 0
# HELP fluentbit_output_proc_bytes_total Number of processed output bytes.
# TYPE fluentbit_output_proc_bytes_total counter
fluentbit_output_proc_bytes_total{name="stackdriver-export-all"} 0
fluentbit_output_proc_bytes_total{name="stackdriver-export-kube"} 0
# HELP fluentbit_output_proc_records_total Number of processed output records.
# TYPE fluentbit_output_proc_records_total counter
fluentbit_output_proc_records_total{name="stackdriver-export-all"} 0
fluentbit_output_proc_records_total{name="stackdriver-export-kube"} 0
# HELP fluentbit_output_retries_failed_total Number of abandoned batches because the maximum number of re-tries was reached.
# TYPE fluentbit_output_retries_failed_total counter
fluentbit_output_retries_failed_total{name="stackdriver-export-all"} 36
fluentbit_output_retries_failed_total{name="stackdriver-export-kube"} 38
# HELP fluentbit_output_retries_total Number of output retries.
# TYPE fluentbit_output_retries_total counter
fluentbit_output_retries_total{name="stackdriver-export-all"} 118
fluentbit_output_retries_total{name="stackdriver-export-kube"} 118

Log level is set to debug, but upon looking at the logs to figure out why the retries are happening there is nothing other than:

[2021/10/04 21:52:56] [ info] Configuration:
[2021/10/04 21:52:56] [ info]  flush time     | 5.000000 seconds
[2021/10/04 21:52:56] [ info]  grace          | 120 seconds
[2021/10/04 21:52:56] [ info]  daemon         | 0
[2021/10/04 21:52:56] [ info] ___________
[2021/10/04 21:52:56] [ info]  inputs:
[2021/10/04 21:52:56] [ info]      tail
[2021/10/04 21:52:56] [ info] ___________
[2021/10/04 21:52:56] [ info]  filters:
[2021/10/04 21:52:56] [ info] ___________
[2021/10/04 21:52:56] [ info]  outputs:
[2021/10/04 21:52:56] [ info]      stackdriver.0
[2021/10/04 21:52:56] [ info]      stackdriver.1
[2021/10/04 21:52:56] [ info] ___________
[2021/10/04 21:52:56] [ info]  collectors:

We changed the config to use the stdout output just to see if that worked as expected, and it did. There were lots of debug logs from FluentBit and logs were being output properly.

Is this a bug or is there another way we can figure out why these retries are happening?

To Reproduce

  • Rubular link if applicable:
  • Example log message if applicable:
{"log":"YOUR LOG MESSAGE HERE","stream":"stdout","time":"2018-06-11T14:37:30.681701731Z"}
  • Steps to reproduce the problem:

Expected behavior

FluentBit's debug logs should include why the Stackdriver output is retrying and not outputting any logs.
Screenshots

Your Environment

  • Version used: fluent/fluent-bit:1.7.4-debug@sha256:4bbad42f30f66f84c2464c7592963142c9d21aa51bc838b359bb3f7248741b80
  • Configuration:
  fluent-bit.conf: |-
    [SERVICE]
        Flush         5
        Grace         120
        Log_Level     debug
        Log_File      /var/log/fluentbit.log
        Daemon        off
        Parsers_File  parsers.conf
        HTTP_Server   On
        HTTP_Listen   0.0.0.0
        HTTP_PORT     2020
    @INCLUDE containers.input.conf
    @INCLUDE output.conf

  containers.input.conf: |-
    [INPUT]
        Name             tail
        Alias            containers
        Tag              kube.<namespace_name>.<pod_name>.<container_name>
        Tag_Regex        (?<pod_name>[a-z0-9]([-a-z0-9]*[a-z0-9])?(\.[a-z0-9]([-a-z0-9]*[a-z0-9])?)*)_(?<namespace_name>[^_]+)_(?<container_name>.+)-
        Path             /var/log/containers/*.log
        DB               /var/run/google-fluentbit/pos-files/flb_kube.db
        Buffer_Max_Size  1MB
        Mem_Buf_Limit    1MB
        Skip_Long_Lines  On
        Refresh_Interval 1

  output.conf: |-
    [OUTPUT]
        Name                       stackdriver
        Alias                      stackdriver-export-kube
        Match                      kube.*
        export_to_project_id       project-id
        Retry_Limit                2
        workers                    2
    [OUTPUT]
        Name                       stackdriver
        Alias                      stackdriver-export-all
        Match                      *
        export_to_project_id       project-id
        Retry_Limit                2
        workers                    2

  parsers.conf: |-
    [PARSER]
        Name        docker
        Format      json
        Time_Key    time
        Time_Format %Y-%m-%dT%H:%M:%S.%L%z
    [PARSER]
        Name        containerd
        Format      regex
        Regex       ^(?<time>.+) (?<stream>stdout|stderr) [^ ]* (?<log>.*)$
        Time_Key    time
        Time_Format %Y-%m-%dT%H:%M:%S.%L%z
    # CRI Parser
    [PARSER]
        # http://rubular.com/r/tjUt3Awgg4
        Name cri
        Format regex
        Regex ^(?<time>[^ ]+) (?<stream>stdout|stderr) (?<logtag>[^ ]*) (?<message>.*)$
        Time_Key    time
        Time_Format %Y-%m-%dT%H:%M:%S.%L%z
    [PARSER]
        Name        json
        Format      json
    [PARSER]
        Name        glog
        Format      regex
        Regex       ^(?<severity>\w)(?<time>\d{4} [^\s]*)\s+(?<pid>\d+)\s+(?<source_file>[^ \]]+)\:(?<source_line>\d+)\]\s(?<message>.*)$
        Time_Key    time
        Time_Format %m%d %H:%M:%S.%L%z
    [PARSER]
        Name        syslog
        Format      regex
        Regex       ^\<(?<pri>[0-9]+)\>(?<time>[^ ]* {1,2}[^ ]* [^ ]*) (?<host>[^ ]*) (?<ident>[a-zA-Z0-9_\/\.\-]*)(?:\[(?<pid>[0-9]+)\])?(?:[^\:]*\:)? *(?<message>.*)$
        Time_Key    time
        Time_Format %b %d %H:%M:%S
    [PARSER]
        Name firstline
        Format regex
        Regex  /^\w\d{4}/
  • Environment name and version (e.g. Kubernetes? What version?): Kubernetes
  • Server type and version:
  • Operating System and version: "Debian GNU/Linux 10 (buster)"
  • Filters and plugins: None

Additional context

@JeffLuoo
Copy link
Contributor

JeffLuoo commented Oct 5, 2021

What is the log in the log file /var/log/fluentbit.log?

@lmuhlha
Copy link
Author

lmuhlha commented Oct 5, 2021

Thank you! 🤦🏻‍♀️

@lmuhlha lmuhlha closed this as completed Oct 5, 2021
@lmuhlha lmuhlha reopened this Oct 6, 2021
@lmuhlha
Copy link
Author

lmuhlha commented Oct 6, 2021

@JeffLuoo I actually have a sort-of related question, so I thought it might make sense to just re-open this.
I can't seem to get the Stackdriver output to output any logs. But I do have the FluentBit logs now.
I keep seeing the errors below and i'm not sure how to troubleshoot it further:

[2021/10/06 22:07:14] [debug] [output:stackdriver:stackdriver-export-all] HTTP Status=400
[2021/10/06 22:07:14] [ warn] [output:stackdriver:stackdriver-export-all] error
<html><title>Error 400 (Bad Request)!!1</title></html>
[2021/10/06 22:07:14] [debug] [socket] could not validate socket status for #114 (don't worry)
[2021/10/06 22:07:14] [debug] [out coro] cb_destroy coro_id=24
[2021/10/06 22:07:14] [debug] [retry] re-using retry for task_id=11 attempts=2
[2021/10/06 22:07:14] [ warn] [engine] failed to flush chunk '1-1633558020.721104648.flb', retry in 87 seconds: task_id=11, input=containers > output=stackdriver-export-all (out_id=0)
[2021/10/06 22:07:17] [debug] [output:stackdriver:stackdriver-export-all] local_resource_id not found, tag [k8s_container.system.fluentbit-gke.fluentbit] is assigned for local_resource_id
[2021/10/06 22:07:17] [debug] [output:stackdriver:stackdriver-export-all] [logging.googleapis.com/monitored_resource] not found in the payload
[2021/10/06 22:07:17] [debug] [http_client] not using http_proxy for header
[2021/10/06 22:07:17] [debug] [http_client] header=POST /v2/entries:write HTTP/1.1
...
...
[2021/10/06 22:07:17] [error] [src/flb_http_client.c:1163 errno=32] Broken pipe
[2021/10/06 22:07:17] [ warn] [output:stackdriver:stackdriver-export-all] http_do=-1
[2021/10/06 22:07:17] [debug] [socket] could not validate socket status for #114 (don't worry)
[2021/10/06 22:07:17] [debug] [out coro] cb_destroy coro_id=25
[2021/10/06 22:07:17] [debug] [retry] re-using retry for task_id=12 attempts=2
[2021/10/06 22:07:17] [ warn] [engine] failed to flush chunk '1-1633558024.698386373.flb', retry in 22 seconds: task_id=12, input=containers > output=stackdriver-export-all (out_id=0)
[2021/10/06 22:07:18] [debug] [output:stackdriver:stackdriver-export-all] local_resource_id not found, tag [k8s_container.fulfil.fulfil.fulfil] is assigned for local_resource_id
[2021/10/06 22:07:18] [debug] [output:stackdriver:stackdriver-export-all] [logging.googleapis.com/monitored_resource] not found in the payload
[2021/10/06 22:07:18] [debug] [http_client] not using http_proxy for header
[2021/10/06 22:07:18] [debug] [http_client] header=POST /v2/entries:write HTTP/1.1

Config:

data:
  fluent-bit.conf: |-
    [SERVICE]
        Flush         5
        Grace         120
        Log_Level     debug
        Daemon        off
        Parsers_File  parsers.conf
        HTTP_Server   On
        HTTP_Listen   0.0.0.0
        HTTP_PORT     2020
    @INCLUDE containers.input.conf
    @INCLUDE output.conf
  containers.input.conf: |-
    [INPUT]
        Name             tail
        Alias            containers
        Tag              k8s_container.<namespace_name>.<pod_name>.<container_name>
        Tag_Regex        (?<pod_name>[a-z0-9]([-a-z0-9]*[a-z0-9])?(\.[a-z0-9]([-a-z0-9]*[a-z0-9])?)*)_(?<namespace_name>[^_]+)_(?<container_name>.+)-
        Path             /var/log/containers/*.log
        Parser           docker
        DB               /var/run/google-fluentbit/pos-files/flb_kube.db
        Mem_Buf_Limit    5MB
        Skip_Long_Lines  On
        Refresh_Interval 5
        Read_from_Head   True
  output.conf: |-
    [OUTPUT]
        Name                       stackdriver
        Alias                      stackdriver-export-all
        Match                      k8s_container.*
        export_to_project_id       project-id
        resource                   k8s_container
        k8s_cluster_name           testing-europe-west1
        k8s_cluster_location       europe-west1
        labels_key                 labels
        severity_key               level
        Retry_Limit                2
  parsers.conf: |-
    [PARSER]
        Name                    docker
        Format                  json
        Time_Key                time
        Time_Format             %Y-%m-%dT%H:%M:%S.%L%z

Thank you for your help!

@JeffLuoo
Copy link
Contributor

JeffLuoo commented Oct 7, 2021

What fluent-bit version are you using? If you are using some older version < 1.7.x, please upgrade to latest fluent-bit, this is a token header issue which has been fixed in later version.

@sjoeboo
Copy link

sjoeboo commented Oct 7, 2021

@JeffLuoo This is ocuring with fluent/fluent-bit:1.7.4-debug (on same team as @lmuhlha )

@JeffLuoo
Copy link
Contributor

JeffLuoo commented Oct 7, 2021

@sjoeboo Please upgrade then. This is the fix to the issue: 2e93cba#diff-77efe81876d673b9c75f6d54f354bdda1570ce21f8b8386ca32f4cf9b754ba6a

@lmuhlha
Copy link
Author

lmuhlha commented Oct 7, 2021

Upgraded to 1.7.9-debug just now and still seeing:

[2021/10/07 18:26:03] [debug] [upstream] KA connection #63 to logging.googleapis.com:443 has been assigned (recycled)
[2021/10/07 18:26:03] [debug] [output:stackdriver:stackdriver-export-all] local_resource_id not found, tag [k8s_container.spotify-system.fluentbit-gke.fluentbit] is assigned for local_resource_id
[2021/10/07 18:26:03] [debug] [output:stackdriver:stackdriver-export-all] [logging.googleapis.com/monitored_resource] not found in the payload

Will try a more recent version next.

We are finally outputting logs though 👍🏻
Maybe worth mentioning the SD output doesn't work in 1.7.4 at all? Unless I missed it and it's already mentioned.

@JeffLuoo
Copy link
Contributor

JeffLuoo commented Oct 7, 2021

Those debug logs are harmless message. Please ignore them and check whether you see the logs on stackdriver.

@lmuhlha
Copy link
Author

lmuhlha commented Oct 8, 2021

I iterated on the very simple config above after seeing some log output and added in all the inputs & outputs we actually need. Now the containers crash loop with exit code 139 with the following logs:

[2021/10/08 18:12:03] [ info]  collectors:
[2021/10/08 18:13:07] [engine] caught signal (SIGSEGV)
#0  0x55e3de60140c      in  __mk_list_add() at lib/monkey/include/monkey/mk_core/mk_list.h:59
#1  0x55e3de60143c      in  mk_list_add() at lib/monkey/include/monkey/mk_core/mk_list.h:64
#2  0x55e3de601a4e      in  cb_mq_metrics() at src/http_server/api/v1/metrics.c:183
#3  0x55e3de9e2668      in  mk_fifo_worker_read() at lib/monkey/mk_server/mk_fifo.c:438
#4  0x55e3de9f1689      in  mk_server_worker_loop() at lib/monkey/mk_server/mk_server.c:569
#5  0x55e3de9e877c      in  mk_sched_launch_worker_loop() at lib/monkey/mk_server/mk_scheduler.c:416
#6  0x7f5592cb5fa2      in  ???() at ???:0
#7  0x7f55923964ce      in  ???() at ???:0
#8  0xffffffffffffffff  in  ???() at ???:0

which is possibly related to

[2021/10/08 18:05:25] [ warn] [routes_mask] Can't set bit (2379) past limits of bitfield
[2021/10/08 18:05:25] [ warn] [routes_mask] Can't set bit (2379) past limits of bitfield
[2021/10/08 18:05:25] [ warn] [routes_mask] Can't set bit (2379) past limits of bitfield
[2021/10/08 18:05:25] [ warn] [input] systemd.14 paused (mem buf overlimit)
[2021/10/08 18:05:25] [ warn] [routes_mask] Can't set bit (2379) past limits of bitfield
[2021/10/08 18:05:25] [ warn] [input] systemd.15 paused (mem buf overlimit)
  • Version used: fluent/fluent-bit:1.8.7-debug@sha256:024748e4aa934d5b53a713341608b7ba801d41a170f9870fdf67f4032a20146f
  • Environment name and version (e.g. Kubernetes? What version?): Kubernetes
  • Operating System and version: "Debian GNU/Linux 10 (buster)"
  • Filters and plugins: Throttle filter
  • Config:
  fluent-bit.conf: |-
    [SERVICE]
        Flush         5
        Grace         120
        Log_Level     debug
        Log_File      /var/log/fluentbit.log
        Daemon        off
        Parsers_File  parsers.conf
        HTTP_Server   On
        HTTP_Listen   0.0.0.0
        HTTP_PORT     3020
    @INCLUDE containers.input.conf
    @INCLUDE system.input.conf
    @INCLUDE filter.conf
    @INCLUDE output.conf
  containers.input.conf: |-
    [INPUT]
        Name             tail
        Alias            containers
        Tag              k8s_container.<namespace_name>.<pod_name>.<container_name>
        Tag_Regex        (?<pod_name>[a-z0-9]([-a-z0-9]*[a-z0-9])?(\.[a-z0-9]([-a-z0-9]*[a-z0-9])?)*)_(?<namespace_name>[^_]+)_(?<container_name>.+)-
        Path             /var/log/containers/*.log
        DB               /var/run/google-fluentbit/pos-files/flb_kube.db
        Buffer_Max_Size  1MB
        Mem_Buf_Limit    5MB
        Skip_Long_Lines  On
        Refresh_Interval 5
        Read_from_Head   True
  system.input.conf: |-
    # Example:
    # Dec 21 23:17:22 gke-foo-1-1-4b5cbd14-node-4eoj startupscript: Finished running startup script /var/run/google.startup.script
    [INPUT]
        Name   tail
        Alias  syslog
        Parser syslog
        Path   /var/log/startupscript.log
        DB     /var/log/startupscript.db
        Alias  startupscript
        Tag    startupscript
    [INPUT]
        Name    tail
        Alias   docker
        Path    /var/log/docker.log
        Tag     docker
        Parser  docker
        Mem_Buf_Limit    1MB
        Skip_Long_Lines  On
        Refresh_Interval 1
    [INPUT]
        Name  tail
        Alias etcd
        Path  /var/log/etcd.log
        Tag   etcd
        Mem_Buf_Limit    1MB
        Skip_Long_Lines  On
        Refresh_Interval 1
    [INPUT]
        Name             tail
        Alias            kubelet
        Path             /var/log/kubelet.log
        Tag              kubelet
        Multiline        off
        Parser_Firstline firstline
        Parser_1         format1
        Mem_Buf_Limit    1MB
        Skip_Long_Lines  On
        Refresh_Interval 1
    # Example:
    # I1118 21:26:53.975789       6 proxier.go:1096] Port "nodePort for kube-system/default-http-backend:http" (:31429/tcp) was open before and is still needed
    [INPUT]
        Name            tail
        Alias           kube-proxy
        Tag             kube-proxy
        Path            /var/log/kube-proxy.log
        DB              /var/log/kube-proxy.db
        Buffer_Max_Size 1MB
        Mem_Buf_Limit   1MB
        Refresh_Interval 1
        Parser          glog
    [INPUT]
        Name             tail
        Alias            kube-apiserver
        Path             /var/log/kube-apiserver.log
        Tag              kube-apiserver
        Multiline        off
        Parser_Firstline firstline
        Parser_1         format1
        Mem_Buf_Limit    1MB
        Skip_Long_Lines  On
        Refresh_Interval 1
    [INPUT]
        Name             tail
        Alias            kube-controller-manager
        Path             /var/log/kube-controller-manager.log
        Tag              kube-controller-manager
        Multiline        off
        Parser_Firstline firstline
        Parser_1         format1
        Mem_Buf_Limit    1MB
        Skip_Long_Lines  On
        Refresh_Interval 1
    [INPUT]
        Name             tail
        Alias            kube-scheduler
        Path             /var/log/kube-scheduler.log
        Tag              kube-scheduler
        Multiline        off
        Parser_Firstline firstline
        Parser_1         format1
        Mem_Buf_Limit    1MB
        Skip_Long_Lines  On
        Refresh_Interval 1
    [INPUT]
        Name             tail
        Alias            rescheduler
        Path             /var/log/rescheduler.log
        Tag              rescheduler
        Multiline        off
        Parser_Firstline firstline
        Parser_1         format1
        Mem_Buf_Limit    1MB
        Skip_Long_Lines  On
        Refresh_Interval 1
    [INPUT]
        Name             tail
        Alias            glbc
        Path             /var/log/glbc.log
        Tag              glbc
        Multiline        off
        Parser_Firstline firstline
        Parser_1         format1
        Mem_Buf_Limit    1MB
        Skip_Long_Lines  On
        Refresh_Interval 1
    [INPUT]
        Name             tail
        Alias            cluster-autoscaler
        Path             /var/log/cluster-autoscaler.log
        Tag              cluster-autoscaler
        Multiline        off
        Parser_Firstline firstline
        Parser_1         format1
        Mem_Buf_Limit    1MB
        Skip_Long_Lines  On
        Refresh_Interval 1
    # Logs from systemd-journal for interesting services.
    [INPUT]
        Name           systemd
        Alias          sysd-docker
        Tag            docker
        Systemd_Filter _SYSTEMD_UNIT=docker.service
        Path           /var/log/journal
        DB             /var/log/gcp-journald-docker.db
        Read_from_head  true
        Buffer_Max_Size 1MB
        Mem_Buf_Limit   1MB
        Refresh_Interval 1
    [INPUT]
        Name           systemd
        Alias          sysd-container-runtime
        Tag            container-runtime
        Systemd_Filter _SYSTEMD_UNIT=containerd.service
        Path           /var/log/journal
        DB             /var/log/gcp-journald-container-runtime.db
        Read_from_head true
        Buffer_Max_Size 1MB
        Mem_Buf_Limit   1MB
        Refresh_Interval 1
    [INPUT]
        Name            systemd
        Alias           sysd-kubelet
        Tag             kubelet
        Systemd_Filter  _SYSTEMD_UNIT=kubelet.service
        Path            /var/log/journal
        DB              /var/log/gcp-journald-kubelet.db
        Read_from_head  true
        Buffer_Max_Size 1MB
        Mem_Buf_Limit   1MB
        Refresh_Interval 1
    [INPUT]
        Name           systemd
        Alias          sysd-node-problem-detector
        Tag            node-problem-detector
        Systemd_Filter _SYSTEMD_UNIT=node-problem-detector.service
        Path           /var/log/journal
        DB             /var/log/gcp-journald-node-problem-detector.db
        Read_from_head  true
        Buffer_Max_Size 1MB
        Mem_Buf_Limit   1MB
        Refresh_Interval 1
  filter.conf: |-
    {% raw -%}
    # rate limit records per namespace
    {% for namespace,project in log_mappings.items() %}
    [FILTER]
        Name     throttle
        Alias    throttle-{{namespace}}
        Match    k8s_container.{{namespace}}
        Rate     300000
        Window   60
        Interval 1s
    {% endfor %}
    {% endraw %}
  output.conf: |-
    {% raw -%}
    # handle namespaces in droplist first
    {% for namespace in log_droplist %}
    [OUTPUT]
        Name  null
        Alias null-{{namespace}}
        Match k8s_container.{{namespace}}.*
    {% endfor %}
    # Stackdriver output per namespace
    {% for namespace,project in log_mappings.items() %}
    [OUTPUT]
        Name                       stackdriver
        Alias                      stackdriver-{{namespace}}
        Match                      k8s_container.{{namespace}}.*
        export_to_project_id       {{project}}
        namespace                  {{namespace}}
        k8s_cluster_name           {{cluster_name}}
        k8s_cluster_location       {{gcp_region}}
        labels_key                 labels
        severity_key               level
        resource                   k8s_container
        Retry_Limit                2
        workers                    2
    {% endfor %}
    {% endraw -%}
    [OUTPUT]
        Name                       stackdriver
        Alias                      stackdriver-export-k8s_container
        Match                      k8s_container.*
        export_to_project_id       project-id
        k8s_cluster_name           {{cluster_name}}
        k8s_cluster_location       {{gcp_region}}
        labels_key                 labels
        severity_key               level
        resource                   k8s_container
        Retry_Limit                2
        workers                    2
    [OUTPUT]
        Name                       stackdriver
        Alias                      stackdriver-export-all
        Match                      *
        export_to_project_id       project-id
        k8s_cluster_name           {{cluster_name}}
        k8s_cluster_location       {{gcp_region}}
        labels_key                 labels
        severity_key               level
        resource                   k8s_container
        Retry_Limit                2
        workers                    2
  parsers.conf: |-
    [PARSER]
        Name        docker
        Format      json
        Time_Key    time
        Time_Format %Y-%m-%dT%H:%M:%S.%L%z
    [PARSER]
        Name        containerd
        Format      regex
        Regex       ^(?<time>.+) (?<stream>stdout|stderr) [^ ]* (?<log>.*)$
        Time_Key    time
        Time_Format %Y-%m-%dT%H:%M:%S.%L%z
    # CRI Parser
    [PARSER]
        # http://rubular.com/r/tjUt3Awgg4
        Name cri
        Format regex
        Regex ^(?<time>[^ ]+) (?<stream>stdout|stderr) (?<logtag>[^ ]*) (?<message>.*)$
        Time_Key    time
        Time_Format %Y-%m-%dT%H:%M:%S.%L%z
    [PARSER]
        Name        json
        Format      json
    [PARSER]
        Name        glog
        Format      regex
        Regex       ^(?<severity>\w)(?<time>\d{4} [^\s]*)\s+(?<pid>\d+)\s+(?<source_file>[^ \]]+)\:(?<source_line>\d+)\]\s(?<message>.*)$
        Time_Key    time
        Time_Format %m%d %H:%M:%S.%L%z
    [PARSER]
        Name        syslog
        Format      regex
        Regex       ^\<(?<pri>[0-9]+)\>(?<time>[^ ]* {1,2}[^ ]* [^ ]*) (?<host>[^ ]*) (?<ident>[a-zA-Z0-9_\/\.\-]*)(?:\[(?<pid>[0-9]+)\])?(?:[^\:]*\:)? *(?<message>.*)$
        Time_Key    time
        Time_Format %b %d %H:%M:%S
    [PARSER]
        Name firstline
        Format regex
        Regex  /^\w\d{4}/

For more context, this config leads to an SD throttle filter and output for 2000+ namespaces. Could this also be an issue?
We currently run FluentD with the same amount of outputs (but have other throughput issues).
Thanks again.

@lmuhlha
Copy link
Author

lmuhlha commented Oct 8, 2021

I also tried removing @INCLUDE system.input.conf since there were some logs related to

[2021/10/08 18:46:29] [debug] [input:tail:docker] scanning path /var/log/docker.log
[2021/10/08 18:46:29] [debug] [input:tail:docker] cannot read info from: /var/log/docker.log
[2021/10/08 18:46:29] [debug] [input:tail:docker] 0 new files found on path '/var/log/docker.log'
[2021/10/08 18:46:29] [debug] [input:tail:etcd] scanning path /var/log/etcd.log
[2021/10/08 18:46:29] [debug] [input:tail:etcd] cannot read info from: /var/log/etcd.log
[2021/10/08 18:46:29] [debug] [input:tail:etcd] 0 new files found on path '/var/log/etcd.log'

which seemed harmless. So those are gone now, but the containers are still restarting.

@lmuhlha lmuhlha changed the title [Stackdriver Output] No debug logs for retries [Stackdriver Output] Debugging Oct 13, 2021
@lmuhlha
Copy link
Author

lmuhlha commented Oct 13, 2021

@JeffLuoo Any ideas on if we can't use 2000+ outputs?

@JeffLuoo
Copy link
Contributor

Max is 256 to my understanding

/*
* The routing mask is an array integers used to store a bitfield. Each
* bit represents the unique id of an output plugin. For example, the 9th
* bit in the routes_mask represents the output plugin with id = 9.
*
* A value of 1 in the bitfield means that output plugin is selected
* and a value of zero means that output is deselected.
*
* The size of the bitmask array limits the number of output plugins
* The router can route to. For example: with a value of 4 using
* 64-bit integers the bitmask can represent up to 256 output plugins
*/
#define FLB_ROUTES_MASK_ELEMENTS 4
/*
* How many bits are in each element of the bitmask array
*/
#define FLB_ROUTES_MASK_ELEMENT_BITS (sizeof(uint64_t) * CHAR_BIT)
/*
* The maximum number of routes that can be stored in the array
*/
#define FLB_ROUTES_MASK_MAX_VALUE (FLB_ROUTES_MASK_ELEMENTS * FLB_ROUTES_MASK_ELEMENT_BITS)

@lmuhlha
Copy link
Author

lmuhlha commented Oct 18, 2021

Thank you!

@matt-simons
Copy link

Is it possible to increase/configure the maximum number of output plugins? We've recently hit this ceiling also.

@lecaros
Copy link
Contributor

lecaros commented Mar 23, 2022

Hi, @matt-simons, would you mind sharing your use case where you need more than 256 outputs?

@davidovich
Copy link

@lecaros I have this exact use-case where we offer a shared kubernetes cluster (namespace as a service) in a multi-tenant scenario where each tenant has an elastic endpoint. As tenants are not known in advance, we use the fluent-operator and deposit ClusterOutput documents that specify an es endpoint dedicated to them. Since this is a shared cluster, we will inevitably go beyond 256 outputs (as there is one output per tenant). This looks very much like the templated approach hinted by @lmuhlha above, where there is a template loop over the projects and namespace.

As such, we would very glad if this limit was configurable, as we have not yet found a way to configure dynamically our outputs (the es endpoint comes from another service, and we cannot know it in advance).

I am a bit stuck right now because our design (that seemed sound by reading fluent-bits tagline (thousands of inputs)) is now flawed.

@matt-simons
Copy link

our use-case was exactly the same as @davidovich's

@lecaros
Copy link
Contributor

lecaros commented Apr 5, 2022

Thanks for your feedback, @davidovich and @matt-simons. We could write down a feature request from this. If you want to do it, you're very welcome to do so, and I'll take it from there. Otherwise, I'll do it later.

@davidovich
Copy link

I started something in #5224 is that what you had in mind ?

@lecaros
Copy link
Contributor

lecaros commented Apr 5, 2022

Yes, sorry I missed that one.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

6 participants