Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

emitter_for_rewrite_tag uses lots of memory after fluent-bit restarts #4506

Closed
wangyuan0916 opened this issue Dec 22, 2021 · 5 comments
Closed

Comments

@wangyuan0916
Copy link

wangyuan0916 commented Dec 22, 2021

Bug Report

Describe the bug
In my enviroment, fluent-bit is deployed as a daemonset in k8s cluster to collect logs of all containers. I'm using modify filter plugin to add tags based on log level in log content, e.g. {keep: 1}, {keep:2}, {keep:3}, then using rewrite-tag to forward these log to 3 different destinations. it works well, until fluent-bit pod restarts. The memory of fluent-bit is very high after it restarts. I find the input plugin created internally call 'emitter_for_rewrite_tag' uses a lot. How can I limit this part of memory? It seems like the data chunk size is 309M before data serialized into json format. Actually it costs much more than that, 'memory_working_set_bytes=1262563328i'
and remains at least 800m after that. I have limited the emitter buffer size, but it doesn't work. usually less than 100m is sensable.
"emitter_for_rewrite_tag.7": {
"status": {
"overlimit": true,
"mem_size": "309.3M",
"mem_limit": "9.5M"
},
"chunks": {
"total": 1,
"up": 1,
"down": 0,
"busy": 1,
"busy_size": "309.3M"
}
}

To Reproduce

  • fluent-bit configuration is :
    [INPUT]
    Name tail
    Tag kube.*
    Path /var/log/containers/.log
    DB /var/log/flb_kube.db
    Mem_Buf_Limit 5MB
    Skip_Long_Lines On
    Refresh_Interval 10
    parser cri
    [FILTER]
    Name kubernetes
    Match kube.

    Kube_URL https://kubernetes.default.svc.cluster.local:443
    Buffer_Size 0
    Merge_Log On
    K8S-Logging.Parser On
    [FILTER]
    Name modify
    match kube.var.log.containers*log-1*
    Add keep test1
    Condition Key_value_matches message ERROR
    [FILTER]
    Name modify
    match kube.var.log.containers*log-1*
    Add keep test2
    Condition Key_value_matches message DEBUG
    [FILTER]
    Name modify
    match kube.var.log.containers*log-1*
    Add keep test3
    Condition Key_value_matches message WARNING
    [FILTER]
    Name rewrite_tag
    Match kube.var.log.containers*log-1*
    Rule $keep ^(test1)$ test1 false
    Rule $keep ^(test2)$ test2 false
    Rule $keep ^(test3)$ test3 false
    Emitter_Mem_Buf_Limit 10M
    Emitter_Storage.type memory

    [OUTPUT]
    Name file
    Match test1
    File /tmp/test1
    [OUTPUT]
    Name file
    Match test2
    File /tmp/test2
    [OUTPUT]
    Name file
    Match test3
    File /tmp/test3

  • there're 100 containers in this cluster. each container emits 3[DEBUG/WARNING/ERROR] logs per second, each record is 50 bytes.


**Expected behavior**
<!--- A clear and concise description of what you expected to happen. -->

**Screenshots**
<!--- If applicable, add screenshots to help explain your problem. -->

**Your Environment**
fluent-bit version is 1.5.7
linux

@nokute78
Copy link
Collaborator

Could you check #4049 ?
The above issue is fixed from v1.8.9
https://fluentbit.io/announcements/v1.8.9/

@wangyuan0916
Copy link
Author

thank you for your reply! I noticed the memory keep growing as well. It seems the same issue. will try in v1.8.9.
But i don't understand what you said in #4049

In previous implementation, in_emitter has 2 buffers.
- 1. rewrite_tag -> in_emitter_add_record -> msgpack_sbuffer_write
- 2. (timer thread. every 0.5 sec) cb_queue_chunks -> flb_input_chunk_append_raw

I understand the 1st thread is for the records from instance of rewrite_tag, they are added to the data chunk of internal input plugin. Please correct me if I'm wrong. But what's the 2nd thread doing? I'm confused.

@wangyuan0916
Copy link
Author

wangyuan0916 commented Dec 22, 2021

Hi I turned fluent-bit version to 1.8.10, the memory problem has gone !
But I met new issue about logging lost.
I have a container in k8s which emitting logs like:

          i=0;
          while true;
          do
            echo "[DEBUG] i=$i"
            echo "[ERROR] i=$i"
            i=$((i+1));
            sleep 1;
          done

the fluent-bit filters and outputs are

 [FILTER]
        Name modify
        match  kube.var.log.containers*_log-1_*
        Remove $kubernetes
    [FILTER]
        Name modify
        match  kube.var.log.containers*_log-1_*
        Add test1 true
        Condition Key_value_matches message ERROR
    [FILTER]
        Name modify
        match  kube.var.log.containers*_log-1_*
        Add test2 true
        Condition Key_value_matches message DEBUG
    [FILTER]
        Name modify
        match  kube.var.log.containers*_log-1_*
        Add test3 true
        Condition Key_value_matches message DEBUG
    [FILTER]
        Name rewrite_tag
        Match kube.var.log.containers*_log-1_*
        Rule $test1  ^(true)$ test1 true
        Emitter_Mem_Buf_Limit  10M
        Emitter_Storage.type memory
    [FILTER]
        Name rewrite_tag
        Match kube.var.log.containers*_log-1_*
        Rule $test2  ^(true)$ test2 true
        Emitter_Mem_Buf_Limit  10M
        Emitter_Storage.type memory
    [FILTER]
        Name rewrite_tag
        Match kube.var.log.containers*_log-1_*
        Rule $test3  ^(true)$ test3 true
        Emitter_Mem_Buf_Limit  10M
        Emitter_Storage.type memory
    [OUTPUT]
        Name file
        Match  test1
        File /tmp/test1
    [OUTPUT]
        Name file
        Match  test2
        File /tmp/test2
    [OUTPUT]
        Name file
        Match  test3
        File /tmp/test3
    [OUTPUT]
        Name file
        Match kube*
        File /tmp/kube

I want to collect error logs in /tmp/test1, and collect logs both in /tmp/test2 and /tmp/test3.
I found the log num in test1 has no problem. but in /tmp/test2 and /tmp/test3 the num is not continuous.

test3: [1640163187.297961204, {"stream":"stdout","logtag":"F","message":"[DEBUG] i=581","kubernetes":{"pod_name":"counter-1","namespace_name":"log-1","pod_id":"d76c604a-dc33-4ab6-b1b5-0f3bad72555d","host":"5c132468-b25d-4dbf-b71e-4a4aee46a011","container_name":"container1","docker_id":"f3be888c4a6d5bd71e1c00a7de63fe0bbc73df279634ca207ab6bb3e73d2d7b2","container_hash":"gcr.io/cf-pks-releng-environments/busybox@sha256:679b1c1058c1f2dc59a3ee70eed986a88811c0205c8ceea57cec5f22d2c3fbb1","container_image":"gcr.io/cf-pks-releng-environments/busybox:latest"},"test2":"true","test3":"true","cluster_name":"aaa"}]
test3: [1640163203.320908254, {"stream":"stdout","logtag":"F","message":"[DEBUG] i=597","kubernetes":{"pod_name":"counter-1","namespace_name":"log-1","pod_id":"d76c604a-dc33-4ab6-b1b5-0f3bad72555d","host":"5c132468-b25d-4dbf-b71e-4a4aee46a011","container_name":"container1","docker_id":"f3be888c4a6d5bd71e1c00a7de63fe0bbc73df279634ca207ab6bb3e73d2d7b2","container_hash":"gcr.io/cf-pks-releng-environments/busybox@sha256:679b1c1058c1f2dc59a3ee70eed986a88811c0205c8ceea57cec5f22d2c3fbb1","container_image":"gcr.io/cf-pks-releng-environments/busybox:latest"},"test2":"true","test3":"true","cluster_name":"aaa"}]
test3: [1640163213.331465903, {"stream":"stdout","logtag":"F","message":"[DEBUG] i=607","kubernetes":{"pod_name":"counter-1","namespace_name":"log-1","pod_id":"d76c604a-dc33-4ab6-b1b5-0f3bad72555d","host":"5c132468-b25d-4dbf-b71e-4a4aee46a011","container_name":"container1","docker_id":"f3be888c4a6d5bd71e1c00a7de63fe0bbc73df279634ca207ab6bb3e73d2d7b2","container_hash":"gcr.io/cf-pks-releng-environments/busybox@sha256:679b1c1058c1f2dc59a3ee70eed986a88811c0205c8ceea57cec5f22d2c3fbb1","container_image":"gcr.io/cf-pks-releng-environments/busybox:latest"},"test2":"true","test3":"true","cluster_name":"aaa"}]
test3: [1640163220.340100423, {"stream":"stdout","logtag":"F","message":"[DEBUG] i=614","kubernetes":{"pod_name":"counter-1","namespace_name":"log-1","pod_id":"d76c604a-dc33-4ab6-b1b5-0f3bad72555d","host":"5c132468-b25d-4dbf-b71e-4a4aee46a011","container_name":"container1","docker_id":"f3be888c4a6d5bd71e1c00a7de63fe0bbc73df279634ca207ab6bb3e73d2d7b2","container_hash":"gcr.io/cf-pks-releng-environments/busybox@sha256:679b1c1058c1f2dc59a3ee70eed986a88811c0205c8ceea57cec5f22d2c3fbb1","container_image":"gcr.io/cf-pks-releng-environments/busybox:latest"},"test2":"true","test3":"true","cluster_name":"aaa"}]
test3: [1640163223.466170346, {"stream":"stdout","logtag":"F","message":"[DEBUG] i=617","kubernetes":{"pod_name":"counter-1","namespace_name":"log-1","pod_id":"d76c604a-dc33-4ab6-b1b5-0f3bad72555d","host":"5c132468-b25d-4dbf-b71e-4a4aee46a011","container_name":"container1","docker_id":"f3be888c4a6d5bd71e1c00a7de63fe0bbc73df279634ca207ab6bb3e73d2d7b2","container_hash":"gcr.io/cf-pks-releng-environments/busybox@sha256:679b1c1058c1f2dc59a3ee70eed986a88811c0205c8ceea57cec5f22d2c3fbb1","container_image":"gcr.io/cf-pks-releng-environments/busybox:latest"},"test2":"true","test3":"true","cluster_name":"aaa"}]`

@patrick-stephens
Copy link
Contributor

Can you raise a new issue just to keep it clear? I'll close the original issue as fixed on 1.8.10.

@nokute78
Copy link
Collaborator

@wangyuan0916 Please open new issue for v1.8.10+.

what's the 2nd thread doing?

1st thread adds a buffer which is internal in_emitter plugin buffer.
2nd thread adds an another buffer for filter/output buffer.

The point is 1st thread doesn't have a limit, so flood of records cause memory leak.
On the other hand 2nd thread flb_input_chunk_append_raw has a limit to stop queueing.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants