Memory leak still present in 1.7 #3346

loburm · 2021-04-09T08:11:48Z

Bug Report

Describe the bug
This is a continuation of #3073. First problem: it seems that solution introduced in 1.7.3 hasn't helped, or maybe I haven't configured it correctly (I would say that documentation is unclear here). Second problem: event after initial growth is over we still see the growth of memory (nearly 1MB/day).

Configuration

[SERVICE]
    Flush         5
    Grace         120
    Log_Level     info
    Log_File      /var/log/fluentbit.log
    Daemon        off
    Parsers_File  parsers.conf
    HTTP_Server   On
    HTTP_Listen   0.0.0.0
    HTTP_PORT     2020

[INPUT]
    Name             tail
    Alias            kube_containers_kube-system
    Tag              kube_<namespace_name>_<pod_name>_<container_name>
    Tag_Regex        (?<pod_name>[a-z0-9]([-a-z0-9]*[a-z0-9])?(\.[a-z0-9]([-a-z0-9]*[a-z0-9])?)*)_(?<namespace_name>[^_]+)_(?<container_name>.+)-
    Path             /var/log/containers/*_kube-system_*.log
    DB               /var/run/google-fluentbit/pos-files/flb_kube_kube-system.db
    DB.locking       true
    DB.journal_mode  Off
    Read_from_Head   On
    Buffer_Max_Size  1MB
    Mem_Buf_Limit    5MB
    Skip_Long_Lines  On
    Refresh_Interval 5

[FILTER]
    Name         parser
    Match        kube_*
    Key_Name     log
    Reserve_Data True
    Parser       docker
    Parser       containerd

[INPUT]
    Name            systemd
    Alias           kubelet
    Tag             kubelet
    Systemd_Filter  _SYSTEMD_UNIT=kubelet.service
    Path            /var/log/journal
    DB              /var/run/google-fluentbit/pos-files/kubelet.db
    DB.Sync         Normal
    Buffer_Max_Size 1MB
    Mem_Buf_Limit   1MB

[INPUT]
    Name            systemd
    Alias           node-problem-detector
    Tag             node-problem-detector
    Systemd_Filter  _SYSTEMD_UNIT=node-problem-detector.service
    Path            /var/log/journal
    DB              /var/run/google-fluentbit/pos-files/node-problem-detector.db
    DB.Sync         Normal
    Buffer_Max_Size 1MB
    Mem_Buf_Limit   1MB

[FILTER]
    Name        modify
    Match       *
    Hard_rename log message

[FILTER]
    Name         parser
    Match        kube_*
    Key_Name     message
    Reserve_Data True
    Parser       glog
    Parser       json

[OUTPUT]
    Name        http
    Match       *
    Host        127.0.0.1
    Port        2021
    URI         /logs
    header_tag  FLUENT-TAG
    Format      msgpack
    Retry_Limit 2

[PARSER]
    Name        docker
    Format      json
    Time_Key    time
    Time_Format %Y-%m-%dT%H:%M:%S.%L%z

[PARSER]
    Name        containerd
    Format      regex
    Regex       ^(?<time>.+) (?<stream>stdout|stderr) [^ ]* (?<log>.*)$
    Time_Key    time
    Time_Format %Y-%m-%dT%H:%M:%S.%L%z

[PARSER]
    Name        json
    Format      json

[PARSER]
    Name        syslog
    Format      regex
    Regex       ^\<(?<pri>[0-9]+)\>(?<time>[^ ]* {1,2}[^ ]* [^ ]*) (?<host>[^ ]*) (?<ident>[a-zA-Z0-9_\/\.\-]*)(?:\[(?<pid>[0-9]+)\])?(?:[^\:]*\:)? *(?<message>.*)$
    Time_Key    time
    Time_Format %b %d %H:%M:%S

[PARSER]
    Name        glog
    Format      regex
    Regex       ^(?<severity>\w)(?<time>\d{4} [^\s]*)\s+(?<pid>\d+)\s+(?<source_file>[^ \]]+)\:(?<source_line>\d+)\]\s(?<message>.*)$
    Time_Key    time
    Time_Format %m%d %H:%M:%S.%L%z

Investigation

Above is the most important part of the config and in the graph below it corresponds to the blue line. Full config differs from this one by a dozen of different input plugins (systemd and tail) but all of them have exactly the same configuration as input plugins above. Only names, sources and db parameters differ. Also there is a similar deployment of Fluent Bit 1.3.11, with exactly the same config (except a few small changes specific for 1.7) and memory usage was stable for 3 days. Then I have tried to disable every plugin one by one and there are basically no difference, except a case when tail plugin was disabled (in this case we don't see initial growth, that most probably caused by WAL mode). Below is the full list of all those tests:

Fluent Bit 1.3.11 - stable line.
Config from the above on 1.7.3. Blue line.
Disabled only kube_containers_kube-system. This is the line above 1st and 2nd lines.
Disabled only node-problem-detector input plugin.
Disabled only kubelet input plugin.
Disabled only first parser plugin.
Disabled only second parser plugin.
Disabled modify plugin.
Replaced http plugin with null plugin.

For cases 4-9 there are basically no difference. In all cases input load is below 200 bytes / second.

Summarizing all the above, it seems that memory leak is not related to any plugin. I have a suspicion that maybe metrics scraping might cause a memory leak. That's what I'm going to verify next.

At the same time, could you verify that my configuration of db.journal_mode is correct? According to the documentation it should be db.wal on or db.wal off, but according to the code db.journal_mode should be used.

The text was updated successfully, but these errors were encountered:

github-actions · 2021-05-10T01:53:08Z

This issue is stale because it has been open 30 days with no activity. Remove stale label or comment or this will be closed in 5 days.

github-actions · 2021-05-15T01:56:55Z

This issue was closed because it has been stalled for 5 days with no activity.

github-actions bot added the Stale label May 10, 2021

github-actions bot closed this as completed May 15, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Memory leak still present in 1.7 #3346

Memory leak still present in 1.7 #3346

loburm commented Apr 9, 2021

github-actions bot commented May 10, 2021

github-actions bot commented May 15, 2021

Memory leak still present in 1.7 #3346

Memory leak still present in 1.7 #3346

Comments

loburm commented Apr 9, 2021

Bug Report

github-actions bot commented May 10, 2021

github-actions bot commented May 15, 2021