emitter_for_rewrite_tag uses lots of memory after fluent-bit restarts #4506

wangyuan0916 · 2021-12-22T01:20:47Z

Bug Report

Describe the bug
In my enviroment, fluent-bit is deployed as a daemonset in k8s cluster to collect logs of all containers. I'm using modify filter plugin to add tags based on log level in log content, e.g. {keep: 1}, {keep:2}, {keep:3}, then using rewrite-tag to forward these log to 3 different destinations. it works well, until fluent-bit pod restarts. The memory of fluent-bit is very high after it restarts. I find the input plugin created internally call 'emitter_for_rewrite_tag' uses a lot. How can I limit this part of memory? It seems like the data chunk size is 309M before data serialized into json format. Actually it costs much more than that, 'memory_working_set_bytes=1262563328i'
and remains at least 800m after that. I have limited the emitter buffer size, but it doesn't work. usually less than 100m is sensable.
"emitter_for_rewrite_tag.7": {
"status": {
"overlimit": true,
"mem_size": "309.3M",
"mem_limit": "9.5M"
},
"chunks": {
"total": 1,
"up": 1,
"down": 0,
"busy": 1,
"busy_size": "309.3M"
}
}

To Reproduce

fluent-bit configuration is :
[INPUT]
Name tail
Tag kube.*
Path /var/log/containers/.log
DB /var/log/flb_kube.db
Mem_Buf_Limit 5MB
Skip_Long_Lines On
Refresh_Interval 10
parser cri
[FILTER]
Name kubernetes
Match kube.
Kube_URL https://kubernetes.default.svc.cluster.local:443
Buffer_Size 0
Merge_Log On
K8S-Logging.Parser On
[FILTER]
Name modify
match kube.var.log.containers*log-1*
Add keep test1
Condition Key_value_matches message ERROR
[FILTER]
Name modify
match kube.var.log.containers*log-1*
Add keep test2
Condition Key_value_matches message DEBUG
[FILTER]
Name modify
match kube.var.log.containers*log-1*
Add keep test3
Condition Key_value_matches message WARNING
[FILTER]
Name rewrite_tag
Match kube.var.log.containers*log-1*
Rule $keep ^(test1)$ test1 false
Rule $keep ^(test2)$ test2 false
Rule $keep ^(test3)$ test3 false
Emitter_Mem_Buf_Limit 10M
Emitter_Storage.type memory

[OUTPUT]
Name file
Match test1
File /tmp/test1
[OUTPUT]
Name file
Match test2
File /tmp/test2
[OUTPUT]
Name file
Match test3
File /tmp/test3
there're 100 containers in this cluster. each container emits 3[DEBUG/WARNING/ERROR] logs per second, each record is 50 bytes.


**Expected behavior**
<!--- A clear and concise description of what you expected to happen. -->

**Screenshots**
<!--- If applicable, add screenshots to help explain your problem. -->

**Your Environment**
fluent-bit version is 1.5.7
linux

The text was updated successfully, but these errors were encountered:

nokute78 · 2021-12-22T03:30:09Z

Could you check #4049 ?
The above issue is fixed from v1.8.9
https://fluentbit.io/announcements/v1.8.9/

wangyuan0916 · 2021-12-22T05:45:44Z

thank you for your reply! I noticed the memory keep growing as well. It seems the same issue. will try in v1.8.9.
But i don't understand what you said in #4049

In previous implementation, in_emitter has 2 buffers.
- 1. rewrite_tag -> in_emitter_add_record -> msgpack_sbuffer_write
- 2. (timer thread. every 0.5 sec) cb_queue_chunks -> flb_input_chunk_append_raw

I understand the 1st thread is for the records from instance of rewrite_tag, they are added to the data chunk of internal input plugin. Please correct me if I'm wrong. But what's the 2nd thread doing? I'm confused.

wangyuan0916 · 2021-12-22T09:09:19Z

Hi I turned fluent-bit version to 1.8.10, the memory problem has gone !
But I met new issue about logging lost.
I have a container in k8s which emitting logs like:

          i=0;
          while true;
          do
            echo "[DEBUG] i=$i"
            echo "[ERROR] i=$i"
            i=$((i+1));
            sleep 1;
          done

the fluent-bit filters and outputs are

 [FILTER]
        Name modify
        match  kube.var.log.containers*_log-1_*
        Remove $kubernetes
    [FILTER]
        Name modify
        match  kube.var.log.containers*_log-1_*
        Add test1 true
        Condition Key_value_matches message ERROR
    [FILTER]
        Name modify
        match  kube.var.log.containers*_log-1_*
        Add test2 true
        Condition Key_value_matches message DEBUG
    [FILTER]
        Name modify
        match  kube.var.log.containers*_log-1_*
        Add test3 true
        Condition Key_value_matches message DEBUG
    [FILTER]
        Name rewrite_tag
        Match kube.var.log.containers*_log-1_*
        Rule $test1  ^(true)$ test1 true
        Emitter_Mem_Buf_Limit  10M
        Emitter_Storage.type memory
    [FILTER]
        Name rewrite_tag
        Match kube.var.log.containers*_log-1_*
        Rule $test2  ^(true)$ test2 true
        Emitter_Mem_Buf_Limit  10M
        Emitter_Storage.type memory
    [FILTER]
        Name rewrite_tag
        Match kube.var.log.containers*_log-1_*
        Rule $test3  ^(true)$ test3 true
        Emitter_Mem_Buf_Limit  10M
        Emitter_Storage.type memory
    [OUTPUT]
        Name file
        Match  test1
        File /tmp/test1
    [OUTPUT]
        Name file
        Match  test2
        File /tmp/test2
    [OUTPUT]
        Name file
        Match  test3
        File /tmp/test3
    [OUTPUT]
        Name file
        Match kube*
        File /tmp/kube

I want to collect error logs in /tmp/test1, and collect logs both in /tmp/test2 and /tmp/test3.
I found the log num in test1 has no problem. but in /tmp/test2 and /tmp/test3 the num is not continuous.

test3: [1640163187.297961204, {"stream":"stdout","logtag":"F","message":"[DEBUG] i=581","kubernetes":{"pod_name":"counter-1","namespace_name":"log-1","pod_id":"d76c604a-dc33-4ab6-b1b5-0f3bad72555d","host":"5c132468-b25d-4dbf-b71e-4a4aee46a011","container_name":"container1","docker_id":"f3be888c4a6d5bd71e1c00a7de63fe0bbc73df279634ca207ab6bb3e73d2d7b2","container_hash":"gcr.io/cf-pks-releng-environments/busybox@sha256:679b1c1058c1f2dc59a3ee70eed986a88811c0205c8ceea57cec5f22d2c3fbb1","container_image":"gcr.io/cf-pks-releng-environments/busybox:latest"},"test2":"true","test3":"true","cluster_name":"aaa"}]
test3: [1640163203.320908254, {"stream":"stdout","logtag":"F","message":"[DEBUG] i=597","kubernetes":{"pod_name":"counter-1","namespace_name":"log-1","pod_id":"d76c604a-dc33-4ab6-b1b5-0f3bad72555d","host":"5c132468-b25d-4dbf-b71e-4a4aee46a011","container_name":"container1","docker_id":"f3be888c4a6d5bd71e1c00a7de63fe0bbc73df279634ca207ab6bb3e73d2d7b2","container_hash":"gcr.io/cf-pks-releng-environments/busybox@sha256:679b1c1058c1f2dc59a3ee70eed986a88811c0205c8ceea57cec5f22d2c3fbb1","container_image":"gcr.io/cf-pks-releng-environments/busybox:latest"},"test2":"true","test3":"true","cluster_name":"aaa"}]
test3: [1640163213.331465903, {"stream":"stdout","logtag":"F","message":"[DEBUG] i=607","kubernetes":{"pod_name":"counter-1","namespace_name":"log-1","pod_id":"d76c604a-dc33-4ab6-b1b5-0f3bad72555d","host":"5c132468-b25d-4dbf-b71e-4a4aee46a011","container_name":"container1","docker_id":"f3be888c4a6d5bd71e1c00a7de63fe0bbc73df279634ca207ab6bb3e73d2d7b2","container_hash":"gcr.io/cf-pks-releng-environments/busybox@sha256:679b1c1058c1f2dc59a3ee70eed986a88811c0205c8ceea57cec5f22d2c3fbb1","container_image":"gcr.io/cf-pks-releng-environments/busybox:latest"},"test2":"true","test3":"true","cluster_name":"aaa"}]
test3: [1640163220.340100423, {"stream":"stdout","logtag":"F","message":"[DEBUG] i=614","kubernetes":{"pod_name":"counter-1","namespace_name":"log-1","pod_id":"d76c604a-dc33-4ab6-b1b5-0f3bad72555d","host":"5c132468-b25d-4dbf-b71e-4a4aee46a011","container_name":"container1","docker_id":"f3be888c4a6d5bd71e1c00a7de63fe0bbc73df279634ca207ab6bb3e73d2d7b2","container_hash":"gcr.io/cf-pks-releng-environments/busybox@sha256:679b1c1058c1f2dc59a3ee70eed986a88811c0205c8ceea57cec5f22d2c3fbb1","container_image":"gcr.io/cf-pks-releng-environments/busybox:latest"},"test2":"true","test3":"true","cluster_name":"aaa"}]
test3: [1640163223.466170346, {"stream":"stdout","logtag":"F","message":"[DEBUG] i=617","kubernetes":{"pod_name":"counter-1","namespace_name":"log-1","pod_id":"d76c604a-dc33-4ab6-b1b5-0f3bad72555d","host":"5c132468-b25d-4dbf-b71e-4a4aee46a011","container_name":"container1","docker_id":"f3be888c4a6d5bd71e1c00a7de63fe0bbc73df279634ca207ab6bb3e73d2d7b2","container_hash":"gcr.io/cf-pks-releng-environments/busybox@sha256:679b1c1058c1f2dc59a3ee70eed986a88811c0205c8ceea57cec5f22d2c3fbb1","container_image":"gcr.io/cf-pks-releng-environments/busybox:latest"},"test2":"true","test3":"true","cluster_name":"aaa"}]`

patrick-stephens · 2021-12-22T14:58:54Z

Can you raise a new issue just to keep it clear? I'll close the original issue as fixed on 1.8.10.

nokute78 · 2021-12-22T22:59:19Z

@wangyuan0916 Please open new issue for v1.8.10+.

what's the 2nd thread doing?

1st thread adds a buffer which is internal in_emitter plugin buffer.
2nd thread adds an another buffer for filter/output buffer.

The point is 1st thread doesn't have a limit, so flood of records cause memory leak.
On the other hand 2nd thread flb_input_chunk_append_raw has a limit to stop queueing.

patrick-stephens closed this as completed Dec 22, 2021

nokute78 mentioned this issue May 22, 2022

in_emitter: support async mode(#4325) #4357

Draft

4 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

emitter_for_rewrite_tag uses lots of memory after fluent-bit restarts #4506

emitter_for_rewrite_tag uses lots of memory after fluent-bit restarts #4506

wangyuan0916 commented Dec 22, 2021 •

edited

Loading

nokute78 commented Dec 22, 2021

wangyuan0916 commented Dec 22, 2021

wangyuan0916 commented Dec 22, 2021 •

edited

Loading

patrick-stephens commented Dec 22, 2021

nokute78 commented Dec 22, 2021

emitter_for_rewrite_tag uses lots of memory after fluent-bit restarts #4506

emitter_for_rewrite_tag uses lots of memory after fluent-bit restarts #4506

Comments

wangyuan0916 commented Dec 22, 2021 • edited Loading

Bug Report

nokute78 commented Dec 22, 2021

wangyuan0916 commented Dec 22, 2021

wangyuan0916 commented Dec 22, 2021 • edited Loading

patrick-stephens commented Dec 22, 2021

nokute78 commented Dec 22, 2021

wangyuan0916 commented Dec 22, 2021 •

edited

Loading

wangyuan0916 commented Dec 22, 2021 •

edited

Loading