Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Memory leak in in_emitter mem_buf when appending metric #9189

Closed
drbugfinder-work opened this issue Aug 12, 2024 · 16 comments · Fixed by #9251
Closed

Memory leak in in_emitter mem_buf when appending metric #9189

drbugfinder-work opened this issue Aug 12, 2024 · 16 comments · Fixed by #9251

Comments

@drbugfinder-work
Copy link
Contributor

Bug Report

Describe the bug
By adding the mem_buf_limit function to the log_to_metrics filter (4178405), we noticed that the mem_buf_limit was reached very often, leading in many could not append metrics logs. This happens because the newly introduced buffer limit is 10MB (before that change it was unlimited)
The bug happens especially in high load situations. With setting the mem_buf_limit to 0 (unlimited), we can see memory usage rising up to several gigabytes until the process eventually gets killed. However, this increase in memory is NOT steady but follows a sawtooth-like pattern, with values fluctuating and increasing over time.
We see values in a pattern like:
0b -> 1,2MB -> 0b -> 284KB -> 5MB -> 0b -> 20MB ... -> 2,5GB -> 0b -> 282 MB -> 2,9GB

This behavior appears to indicate some kind of memory leak. Unfortunately, we have not yet been able to identify the content of the input chunks. Any hints how to do this?

Please see the screenshots below:

image
image

Your Environment

  • Version used: 3.x - 3.1.4
  • Configuration: Forward or tail input with high volume + log_to_metrics filter
  • Operating System and version: Ubuntu 24.04
@Athishpranav2003
Copy link

@drbugfinder-work maybe valgrind could find this issue but surprisingly no one reported it till now

@edsiper
Copy link
Member

edsiper commented Aug 12, 2024

@drbugfinder-work thanks for reporting this. would you please help us with an easy reproducible test case ?

@drbugfinder-work
Copy link
Contributor Author

@leonardo-albertovich @edsiper

I have now created a reproducer. I've also tested multiple versions, even back to v2.2.3, but the behavior is the same.

You need two Fluent Bit instances, one forwarding (Forward (msgpack) or HTTP (json) makes NO difference) to the other instance. The second (receiving) instance then creates the metric.

We are still investigating on this and this is just the current version of our reproducer. We cannot tell for sure (yet) if this happens only when using two instances, but for now we only saw this happening when using two Fluent Bit instances.

fb-source.conf:

[Service]
    Flush        1
    Daemon       Off
    Log_Level    info
    HTTP_Server     On
    HTTP_Listen     0.0.0.0
    HTTP_PORT       2020
    Storage.path    /flb-persistence/flb-db1
    Storage.sync    full
    Storage.checksum On
    Storage.delete_irrecoverable_chunks On
    Storage.backlog.mem_limit  5M
    Storage.max_chunks_up  128
    Storage.metrics On

[Input]
    name         fluentbit_metrics
    tag          internal_metrics

[INPUT]
    Name cpu
    Tag  my_cpu
    Interval_Sec 0
    # 500 Logs/sec
    interval_nsec 2000000
    # 100 Logs/sec - Stable?
    #interval_nsec 10000000
    # 5000 Logs/sec - Grows very fast
    #interval_nsec 200000

# Just to fill dummy records. Total cardinality is similar to our use case, but of course this randoms will grow much faster than in reality - this is for demonstration only
[Filter]
    Name         lua
    Match        *
    Call         random_source
    Code         function random_source(tag, timestamp, record) local random_number=tostring(math.random(1,100)) record["source"]=random_number return 1,timestamp,record end
[Filter]
    Name         lua
    Match        *
    Call         random_service
    Code         function random_service(tag, timestamp, record) local random_number=tostring(math.random(1,50)) record["service"]=random_number return 1,timestamp,record end
[Filter]
    Name         lua
    Match        *
    Call         random_pod
    Code         function random_pod(tag, timestamp, record) local random_number=tostring(math.random(1,1000)) record["exported_pod"]=random_number return 1,timestamp,record end
[Filter]
    Name         lua
    Match        *
    Call         random_namespace
    Code         function random_namespace(tag, timestamp, record) local random_number=tostring(math.random(1,200)) record["exported_namespace"]=random_number return 1,timestamp,record end
[Filter]
    Name         lua
    Match        *
    Call         random_severity
    Code         function random_severity(tag, timestamp, record) local random_number=tostring(math.random(1,4)) record["severity"]=random_number return 1,timestamp,record end

#[Output]
#    Name         stdout
#    Match        my_cpu

# The behavior is the same for forward or http; so we can rule out the issue being caused by forwarding metrics via msgpack

[Output]
    Name         forward
    Match        *
    Time_as_integer true
    Alias        forward-fluentbit

#[Output]
#    Name             http
#    Match            *
#    Host             localhost
#    Port             8888

[Output]
    Name         prometheus_exporter
    Match        *
    Host         0.0.0.0
    Port         2021

fb-sink.conf:

[Service]
    Flush        1
    Daemon       Off
    Log_Level    info
    HTTP_Server     On
    HTTP_Listen     0.0.0.0
    HTTP_PORT       2024
    Storage.path    /flb-persistence/flb-db
    Storage.sync    full
    Storage.checksum On
    Storage.delete_irrecoverable_chunks On
    Storage.backlog.mem_limit  5M
    Storage.max_chunks_up  128
    Storage.metrics On

[Input]
    name         fluentbit_metrics
    tag          internal_metrics

# The behavior is the same for forward or http; so we can rule out the issue being caused by forwarding metrics via msgpack

#[INPUT]
#    name http
#    listen 0.0.0.0
#    port 8888

[Input]
    Name         forward
    listen       0.0.0.0
    Port         24224
    Storage.type  filesystem
    Storage.pause_on_chunks_overlimit On

[Filter]
    Name         log_to_metrics
    Emitter_mem_buf_limit 0
    tag          log_metric_generic
    match        my_cpu
    metric_mode  counter
    metric_name  messages_forwarded_total
    metric_description Counter of all messages in all log streams
    label_field  severity
    label_field  service
    label_field  source
    label_field  exported_namespace
    label_field  exported_pod
    alias        log_metrics_filter_messages_forwarded_total

#[Output]
#    Name         file
#    Match        *
#    Path         /tmp/flb
#    MkDir        On

[Output]
    Name         prometheus_exporter
    Match        *
    Host         0.0.0.0
    Port         2023

You can play around with the numbers. We think there's a tipping point somewhere between ~75-200 logs/second. From there it cannot catch up with processing and the mem size rises.

Of course the number of chunks/buckets/... will increase, as the cardinality increases. The important question is why the mem size of the emitter is growing hundreds of Megabytes while the input mem size is only between 0 and 1MB and the real amount of logs is also only a couple hundred Kilobytes. There seems to be absolutely no issue when logs are ingested at a lower rate (with same cardinality but over a longer time period).

@leonardo-albertovich
Copy link
Collaborator

Hi @drbugfinder-work, I'm having a bit of an issue understanding your idea, however, while looking at it using address sanitizer I noticed that there are two obvious memory leaks in in_forward when decoding metrics and traces which solved the only leak detected in my test (which is caused by the metrics generated by fluentbit_metrics and not by the logs_to_metrics related stuff).

One thing I noticed i my tests was that I had lots of flush tasks when trying to terminate the process which is odd because prometheus exporter flushes should be very short lived (as they should encode the metrics, store them in the local context and return) which I still haven't gotten to the root of.

Anyway, I wanted to share that leak stuff in order to clear the picture a bit so we can make some progress together.

@drbugfinder-work
Copy link
Contributor Author

drbugfinder-work commented Aug 13, 2024

Hi @leonardo-albertovich,

we are also seeing these many flush tasks.
When I let the example run with valgrind, I can see about 215 gigabytes (!) processed (allocated over ±20 seconds runtime), which is a lot - and several orders of magnitude higher than the actual logs:
(To be fair, there are 5000 logs/second for ± 20 seconds in the following output, so that's a bit of work to do...)

...
[2024/08/13 13:12:45] [ info] [task]   task_id=278 still running on route(s): prometheus_exporter/prometheus_exporter.0 
[2024/08/13 13:12:45] [ info] [task]   task_id=279 still running on route(s): prometheus_exporter/prometheus_exporter.0 
[2024/08/13 13:12:45] [ info] [task]   task_id=280 still running on route(s): prometheus_exporter/prometheus_exporter.0 
[2024/08/13 13:12:45] [ info] [task]   task_id=281 still running on route(s): prometheus_exporter/prometheus_exporter.0 
[2024/08/13 13:12:45] [ info] [task]   task_id=282 still running on route(s): prometheus_exporter/prometheus_exporter.0 
[2024/08/13 13:12:45] [ info] [task]   task_id=283 still running on route(s): prometheus_exporter/prometheus_exporter.0 
[2024/08/13 13:12:45] [ info] [task]   task_id=284 still running on route(s): prometheus_exporter/prometheus_exporter.0 
[2024/08/13 13:12:45] [ info] [task]   task_id=285 still running on route(s): prometheus_exporter/prometheus_exporter.0 
[2024/08/13 13:12:45] [ info] [task]   task_id=286 still running on route(s): prometheus_exporter/prometheus_exporter.0 
[2024/08/13 13:12:45] [ info] [task]   task_id=287 still running on route(s): prometheus_exporter/prometheus_exporter.0 
[2024/08/13 13:12:45] [ info] [task]   task_id=288 still running on route(s): prometheus_exporter/prometheus_exporter.0 
[2024/08/13 13:12:45] [ info] [task]   task_id=289 still running on route(s): prometheus_exporter/prometheus_exporter.0 
[2024/08/13 13:12:45] [ info] [task]   task_id=290 still running on route(s): prometheus_exporter/prometheus_exporter.0 
[2024/08/13 13:12:45] [ info] [task]   task_id=291 still running on route(s): prometheus_exporter/prometheus_exporter.0 
[2024/08/13 13:12:45] [ info] [task]   task_id=292 still running on route(s): prometheus_exporter/prometheus_exporter.0 
[2024/08/13 13:12:45] [ info] [task]   task_id=293 still running on route(s): prometheus_exporter/prometheus_exporter.0 
[2024/08/13 13:12:45] [ info] [task]   task_id=294 still running on route(s): prometheus_exporter/prometheus_exporter.0 
[2024/08/13 13:12:45] [ info] [task]   task_id=295 still running on route(s): prometheus_exporter/prometheus_exporter.0 
[2024/08/13 13:12:45] [ info] [task]   task_id=296 still running on route(s): prometheus_exporter/prometheus_exporter.0 
[2024/08/13 13:12:45] [ info] [task]   task_id=297 still running on route(s): prometheus_exporter/prometheus_exporter.0 
[2024/08/13 13:12:45] [ info] [task]   task_id=298 still running on route(s): prometheus_exporter/prometheus_exporter.0 
[2024/08/13 13:12:45] [ info] [task]   task_id=299 still running on route(s): prometheus_exporter/prometheus_exporter.0 
[2024/08/13 13:12:45] [ info] [task]   task_id=300 still running on route(s): prometheus_exporter/prometheus_exporter.0 
[2024/08/13 13:12:45] [ info] [task]   task_id=301 still running on route(s): prometheus_exporter/prometheus_exporter.0 
[2024/08/13 13:12:45] [ info] [task]   task_id=302 still running on route(s): prometheus_exporter/prometheus_exporter.0 
[2024/08/13 13:12:45] [ info] [task]   task_id=303 still running on route(s): prometheus_exporter/prometheus_exporter.0 
[2024/08/13 13:12:45] [ info] [task]   task_id=304 still running on route(s): prometheus_exporter/prometheus_exporter.0 
[2024/08/13 13:12:45] [ info] [task]   task_id=305 still running on route(s): prometheus_exporter/prometheus_exporter.0 
[2024/08/13 13:12:45] [ info] [task]   task_id=306 still running on route(s): prometheus_exporter/prometheus_exporter.0 
[2024/08/13 13:12:45] [ info] [task]   task_id=307 still running on route(s): prometheus_exporter/prometheus_exporter.0 
[2024/08/13 13:12:45] [ info] [task]   task_id=308 still running on route(s): prometheus_exporter/prometheus_exporter.0 
[2024/08/13 13:12:45] [ info] [engine] service has stopped (300 pending tasks)
[2024/08/13 13:12:45] [ info] [input] pausing fluentbit_metrics.0
[2024/08/13 13:12:45] [ info] [input] pausing forward.1
==871358== Thread 2 flb-pipeline:
==871358== Invalid write of size 8
==871358==    at 0xFEF0CF: __mk_list_del (mk_list.h:141)
==871358==    by 0xFEF106: mk_list_del (mk_list.h:147)
==871358==    by 0xFEF569: _mk_event_del (mk_event_epoll.c:180)
==871358==    by 0xFEF93B: _mk_event_channel_destroy (mk_event_epoll.c:400)
==871358==    by 0xFEFE95: mk_event_channel_destroy (mk_event.c:188)
==871358==    by 0x262399: flb_engine_shutdown (flb_engine.c:1139)
==871358==    by 0x261EFE: flb_engine_start (flb_engine.c:1017)
==871358==    by 0x1F7988: flb_lib_worker (flb_lib.c:674)
==871358==    by 0x4F07AC2: start_thread (pthread_create.c:442)
==871358==    by 0x4F98A03: clone (clone.S:100)
==871358==  Address 0x50f10d8 is 56 bytes inside a block of size 752 free'd
==871358==    at 0x484B27F: free (in /usr/libexec/valgrind/vgpreload_memcheck-amd64-linux.so)
==871358==    by 0x22CD22: flb_free (flb_mem.h:127)
==871358==    by 0x23349D: flb_output_instance_destroy (flb_output.c:449)
==871358==    by 0x233554: flb_output_exit (flb_output.c:476)
==871358==    by 0x2622EE: flb_engine_shutdown (flb_engine.c:1119)
==871358==    by 0x261EFE: flb_engine_start (flb_engine.c:1017)
==871358==    by 0x1F7988: flb_lib_worker (flb_lib.c:674)
==871358==    by 0x4F07AC2: start_thread (pthread_create.c:442)
==871358==    by 0x4F98A03: clone (clone.S:100)
==871358==  Block was alloc'd at
==871358==    at 0x484DA83: calloc (in /usr/libexec/valgrind/vgpreload_memcheck-amd64-linux.so)
==871358==    by 0x22CCE3: flb_calloc (flb_mem.h:95)
==871358==    by 0x233980: flb_output_new (flb_output.c:591)
==871358==    by 0x246555: configure_plugins_type (flb_config.c:773)
==871358==    by 0x246E0F: flb_config_load_config_format (flb_config.c:941)
==871358==    by 0x1E0000: service_configure (fluent-bit.c:765)
==871358==    by 0x1E1163: flb_main (fluent-bit.c:1298)
==871358==    by 0x1E1505: main (fluent-bit.c:1457)
==871358== 
[2024/08/13 13:12:45] [ info] [input] pausing storage_backlog.2
[2024/08/13 13:12:45] [ info] [input] pausing emitter_for_log_metrics_filter_messages_forwarded_total
==871358== 
==871358== HEAP SUMMARY:
==871358==     in use at exit: 8,110,242 bytes in 11,313 blocks
==871358==   total heap usage: 13,891,583 allocs, 13,880,270 frees, 232,866,534,510 bytes allocated
==871358== 
==871358== LEAK SUMMARY:
==871358==    definitely lost: 26,056 bytes in 13 blocks
==871358==    indirectly lost: 602,101 bytes in 10,408 blocks
==871358==      possibly lost: 7,482,024 bytes in 891 blocks
==871358==    still reachable: 61 bytes in 1 blocks
==871358==         suppressed: 0 bytes in 0 blocks
==871358== Rerun with --leak-check=full to see details of leaked memory
==871358== 
==871358== For lists of detected and suppressed errors, rerun with: -s
==871358== ERROR SUMMARY: 1 errors from 1 contexts (suppressed: 0 from 0)
root@lv8041:~/test# valgrind --leak-check=full ./fluent-bit-3-1-5 -c fb-sink.conf 
==871414== Memcheck, a memory error detector
==871414== Copyright (C) 2002-2017, and GNU GPL'd, by Julian Seward et al.
==871414== Using Valgrind-3.18.1 and LibVEX; rerun with -h for copyright info
==871414== Command: ./fluent-bit-3-1-5 -c fb-sink.conf
==871414== 

As I wrote in my example config, the issue also happens when we use http instead of forward, so in_forward should not be the culprit here.
Can we somehow trace what's inside the chunks in the in_emitter and if the chunks are correctly recycled/freed?

@leonardo-albertovich
Copy link
Collaborator

What I was saying was that it seems that there are two independent issues and so far I've found that the prometheus exporter flush callback is taking way too long which means that when there is a sustained high volume input it's not able to catch up.

I'm still manually tracing to find the bottleneck but you just caught me in the middle of the test. I'll report back in a few minutes.

The one thing I don't think I'd fixate on is in_emitter because unless I'm misinterpreting what I'm seeing it's not the source of the issue.

@leonardo-albertovich
Copy link
Collaborator

I thought there could be a bottleneck in the code and in the end the only noticeable delay I found was in the metrics context encoding function, however, it was not significant itself.

The problem I see with this test is that each chunk contains about 6500 to 7900 individual metric contexts all of which have to be decoded, processed (ie. append labels) and encoded individually which seems to be way too much.

I think at this point I'd need another opinion because I'm not sure of what to think of it.

@leonardo-albertovich
Copy link
Collaborator

Just opened PR #9199 to address the leak in in in_forward I mentioned earlier

@drbugfinder-work
Copy link
Contributor Author

drbugfinder-work commented Aug 15, 2024

Hi @leonardo-albertovich !

Just opened PR #9199 to address the leak in in in_forward I mentioned earlier

Thanks for the #9199 which is already in v3.1.6 now. I think this is the reason why the memory rise was even faster when we used forward in contrast to http. We have not compared that with your fix, yet.

I thought there could be a bottleneck in the code and in the end the only noticeable delay I found was in the metrics context encoding function, however, it was not significant itself.

The problem I see with this test is that each chunk contains about 6500 to 7900 individual metric contexts all of which have to be decoded, processed (ie. append labels) and encoded individually which seems to be way too much.

I think at this point I'd need another opinion because I'm not sure of what to think of it.

You're right. This is an artificial test with very high load (multiple times higher than in our real environments); however, we constantly run into the same issue in real scenarios (within minutes/hours from start, curve is flatter).
I think the processing (decoding, processing, encoding,...) of the metric contexts is one part of the issue. It would explain the tipping point, when there's more input than it can process and the buffer grows. Optimizing this would raise the tipping point of course.

However, I'm really interested why there is so much memory being allocated - multiple times higher than the actual log. Is it because of the meta data, metric context,...? Or is there a wrong calculation during realloc or chunk selection. I could not find anything, yet.

If the actual logs are only a few MB in size and the emitter buffer has reached already several gigabytes, there is something wrong.

@drbugfinder-work
Copy link
Contributor Author

@leonardo-albertovich
I now tried to skip the whole processing in cb_prom_flush by adding FLB_OUTPUT_RETURN(FLB_OK); in the first line to return immediately.

I cannot see any difference; the memory consumption is several gigabytes (grows beyond >4GB for about 20 seconds of runtime with 5000 logs/s = ±100.000 dummy logs) at the in_emitter buffer:

===== Input =====
fluentbit_metrics.0 (fluentbit_metrics)
│
├─ status
│  └─ overlimit     : no
│     ├─ mem size   : 52.3K (53600 bytes)
│     └─ mem limit  : 0b (0 bytes)
│
├─ tasks
│  ├─ total tasks   : 4
│  ├─ new           : 0
│  ├─ running       : 4
│  └─ size          : 52.3K (53600 bytes)
│
└─ chunks
   └─ total chunks  : 4
      ├─ up chunks  : 4
      ├─ down chunks: 0
      └─ busy chunks: 4
         ├─ size    : 52.3K (53600 bytes)
         └─ size err: 0

forward.1 (forward)
│
├─ status
│  └─ overlimit     : no
│     ├─ mem size   : 0b (0 bytes)
│     └─ mem limit  : 0b (0 bytes)
│
├─ tasks
│  ├─ total tasks   : 0
│  ├─ new           : 0
│  ├─ running       : 0
│  └─ size          : 0b (0 bytes)
│
└─ chunks
   └─ total chunks  : 1
      ├─ up chunks  : 0
      ├─ down chunks: 1
      └─ busy chunks: 0
         ├─ size    : 0b (0 bytes)
         └─ size err: 0

storage_backlog.2 (storage_backlog)
│
├─ status
│  └─ overlimit     : no
│     ├─ mem size   : 0b (0 bytes)
│     └─ mem limit  : 0b (0 bytes)
│
├─ tasks
│  ├─ total tasks   : 0
│  ├─ new           : 0
│  ├─ running       : 0
│  └─ size          : 0b (0 bytes)
│
└─ chunks
   └─ total chunks  : 0
      ├─ up chunks  : 0
      ├─ down chunks: 0
      └─ busy chunks: 0
         ├─ size    : 0b (0 bytes)
         └─ size err: 0

emitter_for_log_metrics_filter_messages_forwarded_total (emitter)
│
├─ status
│  └─ overlimit     : no
│     ├─ mem size   : 3.3G (3546388385 bytes)
│     └─ mem limit  : 0b (0 bytes)
│
├─ tasks
│  ├─ total tasks   : 1548
│  ├─ new           : 0
│  ├─ running       : 1548
│  └─ size          : 3.0G (3222384929 bytes)
│
└─ chunks
   └─ total chunks  : 1703
      ├─ up chunks  : 1703
      ├─ down chunks: 0
      └─ busy chunks: 1548
         ├─ size    : 3.0G (3222384929 bytes)
         └─ size err: 0

@1999ankitgoyal
Copy link

1999ankitgoyal commented Aug 18, 2024

I am also facing this issue.

Details:
The logs sent to the fluentbit is approx 2MB per second and Cardinality of approximately 13000.
Earlier, when I was using fluentbit version 3.0.4, the memory consumption of the fluentbit pod in K8s would reach 13GBs and would eventually crash due to Out of Memory(OOM) issue.

Now, that I have transitioned to fluentbit version 3.1.4, the highest memory consumption that I have seen is 373MB, and so the pod doesn't gets killed due to OOM error but this comes with following draws:

  1. The input plugin used is TCP with the source sending log one by one. In the version 3.1.4, the input plugin is not accepting too many logs and thus throttling. The source is lagging by hours in sending the actual logs.

  2. I am frequently seeing this error in the logs : [error] [filter:log_to_metrics:log_to_metrics.1] could not append metrics

Could anyone please help me with what version/modifications in the configuration that I should do so that the fluentbit doesn't consume more than 6GB of memory, the append error goes away and also ingestion of the incoming logs happens smoothly ?

Version used is 3.1.4 and demo configuration :

[SERVICE]
    Flush               1
    Daemon              off
    Log_Level           error
    Mem_Buf_Limit       100MB

[INPUT]
    Name                tcp
    Listen              0.0.0.0
    Port                5170
    Format              json
    Buffer_Size         128000
    Chunk_Size          64000
    Tag                 hello.logs

[FILTER]
    Name                lua
    Match               hello.logs
    script              hello_extract.lua
    call                extract

[FILTER]
    Name                Log_to_metrics
    Match               hello.logs
    Tag                   hello.metrics
   Metic name        hello_responsecode
    Metric_mode        counter
    Add_label           responseCode $responseCode
    Label_field         location
    Label_field         env

[FILTER]
    Name                Log_to_metrics
    Match               hello.logs
    Tag                    hello.metrics
    Metric_mode         counter
    Metric_name         hello_oper_count
    Add_label           api_Name $api_Id
    Add_label           responseCode $responseCode
    Label_field         location
    Label_field         env


[FILTER]
    Name                Log_to_metrics
    Match               hello.logs
    Tag                 hello.metrics
    Metric_mode         histogram
    Metric_name         hello_r_size 
    Bucket              2000
    Bucket              20000
    Bucket              50000
    Bucket              100000
    Bucket              1000000
    Value_field         $responseSize
    Add_label           api_Name $api_Id
    Label_field         location
    Label_field         env

[OUTPUT]
    Name                prometheus_exporter
    Match               hello.metrics
    Port                9170

@drbugfinder-work
Copy link
Contributor Author

Hi @1999ankitgoyal,

Thank you for providing a second sample of this issue.

The reduction in memory consumption from version 3.0.4 to 3.1.4 was due to PR #8659, where I introduced the option for a configurable memory buffer limit and set the default to 10MB. Before that change, the memory buffer was unlimited, which could result in the consumption of many gigabytes of memory. At the time, I introduced the limit to align with other plugins, not realizing that this change would expose an existing memory issue in the pipeline.

However, this limitation of the memory buffer leads to the error log you're seeing.

@leonardo-albertovich @edsiper I strongly suspect that any other metric-generating process using the flb_input_metrics_append function to insert metrics into the pipeline could potentially encounter the same issue, especially with similar volumes (e.g., >100~500 updates/second). For example, see

result = flb_input_metrics_append(ctx->ins, tag, tag_len, context);

When I set the limit to 0 (unlimited) and the buffer reached several gigabytes, I also observed multiple instances where no logs were forwarded through the pipeline at all after reaching a certain memory buffer size.

I am still trying to understand the content of the chunks to determine what is using this large amount of memory.

@drbugfinder-work
Copy link
Contributor Author

drbugfinder-work commented Aug 19, 2024

Hi @edsiper @leonardo-albertovich
Update:
I have delved deeper into the issue and believe I now understand the problem more clearly. I used the flb_pack_print method to inspect the content of a chunk and discovered that with each new metric update, a copy of the cmetrics context seems to be made. In this process, one value (from a tuple of metric labels) is updated (if the bucket already exists) or appended to the value list for this metric. Consequently, for every update, the entire list of existing label tuples is copied, and only one value is updated.

Instead of modifying a single cmetrics context with each update, a duplicate context is created and grows over time as the cardinality increases. This approach results in a list of temporary updated cmetrics contexts, which is significantly larger than the individual updates.

This also explains why the cardinality is relevant at the input buffer stage, as I initially thought the cmetrics context with high cardinality existed only at the output stage.

The proposed solution would be to avoid copying the entire cmetrics context. Instead, we should keep a list of updates and merge them later.

I am still investigating whether this behavior is intentional or if it is a bug. I have not fully reviewed the code yet.
Can you say something about this behavior?

For reference, please see the example of two records in an input buffer chunk here: https://gist.github.com/drbugfinder-work/131c11e67a12753e45bccdce478bc145#file-gistfile1-txt

And here is the diff from these two entries where one value has changed, but the entire list of label tuples remains the same and is a copy of the previous one: Bildschirmfoto 2024-08-19 um 11 11 33
In this case the cardinality is 1000 and the two records both contain all 1000 label tuples and only one of them gets updated.

@drbugfinder-work
Copy link
Contributor Author

drbugfinder-work commented Aug 19, 2024

@edsiper @leonardo-albertovich
Solution idea: Would it be possible to modify the flb_input_metrics_append method to check if the in_emitter buffer has reached its limit and, if so, drop the buffer and append the cmetrics context in one go afterwards? As I understand it, the last copy of the cmetrics context should contain all the accumulated changes. So if the limit is reached, the buffer could be dropped and replaced by a single insert, which might be better than running out of memory or virtually stopping the pipeline. I'm not sure if this approach could pose issues for other metric plugins in the pipeline.

Additionally, multi-threading could potentially be a concern.

@leonardo-albertovich
Copy link
Collaborator

If I understood your previous message correctly then what you found is what I pointed out before (thousands of individual metrics contexts being produced which in turn caused the prometheus exporter plugin to delay while processing them).

I have a feeling that making those types of changes in flb_input_metrics_append would have unexpected side effects.

Unless there's something I'm missing the issue should be addressed in logs_to_metrics or maybe even in a processor that acts as a buffer keeping a cmetrics context where it performs the changes as produced by logs_to_metrics and periodically produces a snapshot that goes through the pipeline.

IDK, I could be wrong, I haven't looked at this since last week and maybe I'm just completely missing the mark.

@drbugfinder-work
Copy link
Contributor Author

Hi @leonardo-albertovich,

You're right - I initially thought you were referring to the cmetrics contexts within the Prometheus output plugin. I now understand your point better.

I’ll focus on your suggested approach and explore a solution inside the logs_to_metrics plugin context to periodically inject the metrics into the pipeline.

Thanks for the insight!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants