You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Describe the bug
This is a continuation of #3073. First problem: it seems that solution introduced in 1.7.3 hasn't helped, or maybe I haven't configured it correctly (I would say that documentation is unclear here). Second problem: event after initial growth is over we still see the growth of memory (nearly 1MB/day).
Configuration
[SERVICE]
Flush 5
Grace 120
Log_Level info
Log_File /var/log/fluentbit.log
Daemon off
Parsers_File parsers.conf
HTTP_Server On
HTTP_Listen 0.0.0.0
HTTP_PORT 2020
[INPUT]
Name tail
Alias kube_containers_kube-system
Tag kube_<namespace_name>_<pod_name>_<container_name>
Tag_Regex (?<pod_name>[a-z0-9]([-a-z0-9]*[a-z0-9])?(\.[a-z0-9]([-a-z0-9]*[a-z0-9])?)*)_(?<namespace_name>[^_]+)_(?<container_name>.+)-
Path /var/log/containers/*_kube-system_*.log
DB /var/run/google-fluentbit/pos-files/flb_kube_kube-system.db
DB.locking true
DB.journal_mode Off
Read_from_Head On
Buffer_Max_Size 1MB
Mem_Buf_Limit 5MB
Skip_Long_Lines On
Refresh_Interval 5
[FILTER]
Name parser
Match kube_*
Key_Name log
Reserve_Data True
Parser docker
Parser containerd
[INPUT]
Name systemd
Alias kubelet
Tag kubelet
Systemd_Filter _SYSTEMD_UNIT=kubelet.service
Path /var/log/journal
DB /var/run/google-fluentbit/pos-files/kubelet.db
DB.Sync Normal
Buffer_Max_Size 1MB
Mem_Buf_Limit 1MB
[INPUT]
Name systemd
Alias node-problem-detector
Tag node-problem-detector
Systemd_Filter _SYSTEMD_UNIT=node-problem-detector.service
Path /var/log/journal
DB /var/run/google-fluentbit/pos-files/node-problem-detector.db
DB.Sync Normal
Buffer_Max_Size 1MB
Mem_Buf_Limit 1MB
[FILTER]
Name modify
Match *
Hard_rename log message
[FILTER]
Name parser
Match kube_*
Key_Name message
Reserve_Data True
Parser glog
Parser json
[OUTPUT]
Name http
Match *
Host 127.0.0.1
Port 2021
URI /logs
header_tag FLUENT-TAG
Format msgpack
Retry_Limit 2
[PARSER]
Name docker
Format json
Time_Key time
Time_Format %Y-%m-%dT%H:%M:%S.%L%z
[PARSER]
Name containerd
Format regex
Regex ^(?<time>.+) (?<stream>stdout|stderr) [^ ]* (?<log>.*)$
Time_Key time
Time_Format %Y-%m-%dT%H:%M:%S.%L%z
[PARSER]
Name json
Format json
[PARSER]
Name syslog
Format regex
Regex ^\<(?<pri>[0-9]+)\>(?<time>[^ ]* {1,2}[^ ]* [^ ]*) (?<host>[^ ]*) (?<ident>[a-zA-Z0-9_\/\.\-]*)(?:\[(?<pid>[0-9]+)\])?(?:[^\:]*\:)? *(?<message>.*)$
Time_Key time
Time_Format %b %d %H:%M:%S
[PARSER]
Name glog
Format regex
Regex ^(?<severity>\w)(?<time>\d{4} [^\s]*)\s+(?<pid>\d+)\s+(?<source_file>[^ \]]+)\:(?<source_line>\d+)\]\s(?<message>.*)$
Time_Key time
Time_Format %m%d %H:%M:%S.%L%z
Investigation
Above is the most important part of the config and in the graph below it corresponds to the blue line. Full config differs from this one by a dozen of different input plugins (systemd and tail) but all of them have exactly the same configuration as input plugins above. Only names, sources and db parameters differ. Also there is a similar deployment of Fluent Bit 1.3.11, with exactly the same config (except a few small changes specific for 1.7) and memory usage was stable for 3 days. Then I have tried to disable every plugin one by one and there are basically no difference, except a case when tail plugin was disabled (in this case we don't see initial growth, that most probably caused by WAL mode). Below is the full list of all those tests:
Fluent Bit 1.3.11 - stable line.
Config from the above on 1.7.3. Blue line.
Disabled only kube_containers_kube-system. This is the line above 1st and 2nd lines.
Disabled only node-problem-detector input plugin.
Disabled only kubelet input plugin.
Disabled only first parser plugin.
Disabled only second parser plugin.
Disabled modify plugin.
Replaced http plugin with null plugin.
For cases 4-9 there are basically no difference. In all cases input load is below 200 bytes / second.
Summarizing all the above, it seems that memory leak is not related to any plugin. I have a suspicion that maybe metrics scraping might cause a memory leak. That's what I'm going to verify next.
At the same time, could you verify that my configuration of db.journal_mode is correct? According to the documentation it should be db.wal on or db.wal off, but according to the code db.journal_mode should be used.
The text was updated successfully, but these errors were encountered:
Bug Report
Describe the bug
This is a continuation of #3073. First problem: it seems that solution introduced in 1.7.3 hasn't helped, or maybe I haven't configured it correctly (I would say that documentation is unclear here). Second problem: event after initial growth is over we still see the growth of memory (nearly 1MB/day).
Configuration
Investigation
Above is the most important part of the config and in the graph below it corresponds to the blue line. Full config differs from this one by a dozen of different input plugins (systemd and tail) but all of them have exactly the same configuration as input plugins above. Only names, sources and db parameters differ. Also there is a similar deployment of Fluent Bit 1.3.11, with exactly the same config (except a few small changes specific for 1.7) and memory usage was stable for 3 days. Then I have tried to disable every plugin one by one and there are basically no difference, except a case when tail plugin was disabled (in this case we don't see initial growth, that most probably caused by WAL mode). Below is the full list of all those tests:
kube_containers_kube-system
. This is the line above 1st and 2nd lines.node-problem-detector
input plugin.kubelet
input plugin.For cases 4-9 there are basically no difference. In all cases input load is below 200 bytes / second.
Summarizing all the above, it seems that memory leak is not related to any plugin. I have a suspicion that maybe metrics scraping might cause a memory leak. That's what I'm going to verify next.
At the same time, could you verify that my configuration of db.journal_mode is correct? According to the documentation it should be
db.wal on
ordb.wal off
, but according to the codedb.journal_mode
should be used.The text was updated successfully, but these errors were encountered: