Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

out_opentelemetry: make log records batch size to configurable #6559

Closed
wants to merge 2 commits into from

Conversation

Syn3rman
Copy link
Contributor

  • Fix build warnings in the out_opentelemetry plugin
  • Make batch size for configurable
  • Reduce default value for batch size to 64 since it takes up a lot of memory

Fixes #6512 & #6457


Enter [N/A] in the box, if an item is not applicable to your change.

Testing
Before we can approve your change; please submit the following in a comment:

  • Example configuration file for the change
[INPUT]
	Name dummy
	Tag dummy.log
	Rate 1
	Dummy {"key": ["val1", 1, 2.3, [4, 4.6, 2], {"mim": "mimvalue"}], "key2": "val2"}

[OUTPUT]
	name stdout
	match *

[OUTPUT]
	name opentelemetry
	match *
	host 0.0.0.0
	port 3434
	logs_uri /v1/logs
	metrics_uri /v1/metrics
	traces_uri /v1/traces
	batch_size 10
  • Debug log output from testing the change
root@171c1e35560a:/fluent-bit/build# ./bin/fluent-bit -c ../dev-files/confs/out_otel_logs.conf -v
Fluent Bit v2.0.7
* Copyright (C) 2015-2022 The Fluent Bit Authors
* Fluent Bit is a CNCF sub-project under the umbrella of Fluentd
* https://fluentbit.io

[2022/12/15 07:55:26] [ info] Configuration:
[2022/12/15 07:55:26] [ info]  flush time     | 1.000000 seconds
[2022/12/15 07:55:26] [ info]  grace          | 5 seconds
[2022/12/15 07:55:26] [ info]  daemon         | 0
[2022/12/15 07:55:26] [ info] ___________
[2022/12/15 07:55:26] [ info]  inputs:
[2022/12/15 07:55:26] [ info]      dummy
[2022/12/15 07:55:26] [ info] ___________
[2022/12/15 07:55:26] [ info]  filters:
[2022/12/15 07:55:26] [ info] ___________
[2022/12/15 07:55:26] [ info]  outputs:
[2022/12/15 07:55:26] [ info]      stdout.0
[2022/12/15 07:55:26] [ info]      opentelemetry.1
[2022/12/15 07:55:26] [ info] ___________
[2022/12/15 07:55:26] [ info]  collectors:
[2022/12/15 07:55:26] [ info] [fluent bit] version=2.0.7, commit=91ebce5540, pid=13501
[2022/12/15 07:55:26] [debug] [engine] coroutine stack size: 196608 bytes (192.0K)
[2022/12/15 07:55:26] [ info] [storage] ver=1.3.0, type=memory, sync=normal, checksum=off, max_chunks_up=128
[2022/12/15 07:55:26] [ info] [cmetrics] version=0.5.7
[2022/12/15 07:55:26] [ info] [ctraces ] version=0.2.5
[2022/12/15 07:55:26] [ info] [input:dummy:dummy.0] initializing
[2022/12/15 07:55:26] [ info] [input:dummy:dummy.0] storage_strategy='memory' (memory only)
[2022/12/15 07:55:26] [debug] [dummy:dummy.0] created event channels: read=21 write=22
[2022/12/15 07:55:26] [debug] [stdout:stdout.0] created event channels: read=23 write=24
[2022/12/15 07:55:26] [debug] [opentelemetry:opentelemetry.1] created event channels: read=30 write=31
[2022/12/15 07:55:26] [debug] [router] match rule dummy.0:stdout.0
[2022/12/15 07:55:26] [debug] [router] match rule dummy.0:opentelemetry.1
[2022/12/15 07:55:26] [ info] [output:stdout:stdout.0] worker #0 started
[2022/12/15 07:55:26] [ info] [sp] stream processor started
[2022/12/15 07:55:26] [debug] [input chunk] update output instances with new chunk size diff=68
[2022/12/15 07:55:27] [debug] [task] created task=0xffff94014e60 id=0 OK
[2022/12/15 07:55:27] [debug] [output:stdout:stdout.0] task_id=0 assigned to thread #0
[0] dummy.log: [1671090926.974088388, {"key"=>["val1", 1, 2.300000, [4, 4.600000, 2], {"mim"=>"mimvalue"}], "key2"=>"val2"}]
[2022/12/15 07:55:27] [debug] [out flush] cb_destroy coro_id=0
[2022/12/15 07:55:27] [debug] [input chunk] update output instances with new chunk size diff=68
[2022/12/15 07:55:27] [debug] [http_client] not using http_proxy for header
[2022/12/15 07:55:27] [ info] [output:opentelemetry:opentelemetry.1] 0.0.0.0:3434, HTTP status=200


[2022/12/15 07:55:27] [debug] [out flush] cb_destroy coro_id=0
[2022/12/15 07:55:27] [debug] [task] destroy task=0xffff94014e60 (task_id=0)
^C[2022/12/15 07:55:28] [engine] caught signal (SIGINT)
[2022/12/15 07:55:28] [debug] [task] created task=0xffff94016390 id=0 OK
[2022/12/15 07:55:28] [debug] [output:stdout:stdout.0] task_id=0 assigned to thread #0
[2022/12/15 07:55:28] [ warn] [engine] service will shutdown in max 5 seconds
[2022/12/15 07:55:28] [ info] [input] pausing dummy.0
[0] dummy.log: [1671090927.968851930, {"key"=>["val1", 1, 2.300000, [4, 4.600000, 2], {"mim"=>"mimvalue"}], "key2"=>"val2"}]
[2022/12/15 07:55:28] [debug] [out flush] cb_destroy coro_id=1
[2022/12/15 07:55:28] [debug] [http_client] not using http_proxy for header
[2022/12/15 07:55:28] [ info] [output:opentelemetry:opentelemetry.1] 0.0.0.0:3434, HTTP status=200


[2022/12/15 07:55:28] [debug] [out flush] cb_destroy coro_id=1
[2022/12/15 07:55:28] [debug] [task] destroy task=0xffff94016390 (task_id=0)
[2022/12/15 07:55:28] [ info] [engine] service has stopped (0 pending tasks)
[2022/12/15 07:55:28] [ info] [input] pausing dummy.0
[2022/12/15 07:55:28] [ info] [output:stdout:stdout.0] thread worker #0 stopping...
[2022/12/15 07:55:28] [ info] [output:stdout:stdout.0] thread worker #0 stopped
  • Attached Valgrind output that shows no leaks or memory corruption was found
root@171c1e35560a:/fluent-bit/build# valgrind --leak-check=full ./bin/fluent-bit -c ../dev-files/confs/out_otel_logs.conf
==13495== Memcheck, a memory error detector
==13495== Copyright (C) 2002-2017, and GNU GPL'd, by Julian Seward et al.
==13495== Using Valgrind-3.18.1 and LibVEX; rerun with -h for copyright info
==13495== Command: ./bin/fluent-bit -c ../dev-files/confs/out_otel_logs.conf
==13495==
Fluent Bit v2.0.7
* Copyright (C) 2015-2022 The Fluent Bit Authors
* Fluent Bit is a CNCF sub-project under the umbrella of Fluentd
* https://fluentbit.io

[2022/12/15 07:47:47] [ info] [fluent bit] version=2.0.7, commit=91ebce5540, pid=13495
[2022/12/15 07:47:47] [ info] [storage] ver=1.3.0, type=memory, sync=normal, checksum=off, max_chunks_up=128
[2022/12/15 07:47:47] [ info] [cmetrics] version=0.5.7
[2022/12/15 07:47:47] [ info] [output:stdout:stdout.0] worker #0 started
[2022/12/15 07:47:47] [ info] [ctraces ] version=0.2.5
[2022/12/15 07:47:47] [ info] [input:dummy:dummy.0] initializing
[2022/12/15 07:47:47] [ info] [input:dummy:dummy.0] storage_strategy='memory' (memory only)
[2022/12/15 07:47:47] [ info] [sp] stream processor started
^C[2022/12/15 07:47:48] [engine] caught signal (SIGINT)
[0] dummy.log: [1671090468.010704176, {"key"=>["val1", 1, 2.300000, [4, 4.600000, 2], {"mim"=>"mimvalue"}], "key2"=>"val2"}]
[2022/12/15 07:47:48] [ warn] [engine] service will shutdown in max 5 seconds
[2022/12/15 07:47:48] [ info] [input] pausing dummy.0
[2022/12/15 07:47:48] [ info] [output:opentelemetry:opentelemetry.1] 0.0.0.0:3434, HTTP status=200


[2022/12/15 07:47:48] [ info] [engine] service has stopped (0 pending tasks)
[2022/12/15 07:47:48] [ info] [input] pausing dummy.0
[2022/12/15 07:47:48] [ info] [output:stdout:stdout.0] thread worker #0 stopping...
[2022/12/15 07:47:49] [ info] [output:stdout:stdout.0] thread worker #0 stopped
==13495==
==13495== HEAP SUMMARY:
==13495==     in use at exit: 0 bytes in 0 blocks
==13495==   total heap usage: 1,828 allocs, 1,828 frees, 1,090,535 bytes allocated
==13495==
==13495== All heap blocks were freed -- no leaks are possible
==13495==
==13495== For lists of detected and suppressed errors, rerun with: -s
==13495== ERROR SUMMARY: 0 errors from 0 contexts (suppressed: 0 from 0)

If this is a change to packaging of containers or native binaries then please confirm it works for all targets.

Documentation

  • Documentation required for this feature

Backporting

  • Backport to latest stable release.

Fluent Bit is licensed under Apache 2.0, by submitting this pull request I understand that this code will be released under the terms of that license.

Signed-off-by: Aditya Prajapati [email protected]

@edsiper
Copy link
Member

edsiper commented Dec 15, 2022

@Syn3rman windows CI is failing, to review the PR we need at least CI passing

@leonardo-albertovich would you please review the PR once CI is ready ?

@edsiper
Copy link
Member

edsiper commented Dec 16, 2022

@Syn3rman

D:\a\fluent-bit\fluent-bit\plugins\out_opentelemetry\opentelemetry.c(697): error C2133: 'log_records': unknown size
D:\a\fluent-bit\fluent-bit\plugins\out_opentelemetry\opentelemetry.c(698): error C2057: expected constant expression
D:\a\fluent-bit\fluent-bit\plugins\out_opentelemetry\opentelemetry.c(698): error C2466: cannot allocate an array of constant size 0
D:\a\fluent-bit\fluent-bit\plugins\out_opentelemetry\opentelemetry.c(698): error C2133: 'log_bodies': unknown size

@Syn3rman Syn3rman temporarily deployed to pr December 16, 2022 16:39 — with GitHub Actions Inactive
@Syn3rman Syn3rman temporarily deployed to pr December 16, 2022 16:39 — with GitHub Actions Inactive
@Syn3rman
Copy link
Contributor Author

@edsiper fixed, the failing tests are unrelated and should be adressed by #6569

@leonardo-albertovich this is now ready for review, I've moved the variables to heap and added check on batch_size (valgrind is happy too)

@Syn3rman Syn3rman temporarily deployed to pr December 16, 2022 16:55 — with GitHub Actions Inactive
@Syn3rman Syn3rman force-pushed the logs-batch-size-config branch from 445dce0 to 46bb784 Compare December 19, 2022 03:59
@Syn3rman Syn3rman temporarily deployed to pr December 19, 2022 04:00 — with GitHub Actions Inactive
@Syn3rman Syn3rman temporarily deployed to pr December 19, 2022 04:00 — with GitHub Actions Inactive
@Syn3rman Syn3rman temporarily deployed to pr December 19, 2022 04:15 — with GitHub Actions Inactive
struct opentelemetry_context *ctx;
ctx = out_context;

// These were initially variable length arrays.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

use /* for comments */, what's the issue number you are referring to here ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

#6512
When it was a variable length array stored on the stack, the event chunk was being corrupted somewhere between 720 & 730 initialisation


for(index = 0 ; index < FLB_LOG_RECORD_BATCH_SIZE ; index++) {
log_record_list = (Opentelemetry__Proto__Logs__V1__LogRecord *) flb_calloc(ctx->batch_size, sizeof(Opentelemetry__Proto__Logs__V1__LogRecord *));
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

validate all memory allocations

@@ -31,7 +31,7 @@
* log records, and a later batch fails, Fluent Bit will retry ALL the batches,
* including the ones that succeeded. This is not ideal.
*/
#define FLB_LOG_RECORD_BATCH_SIZE 1000
#define DEFAULT_LOG_RECORD_BATCH_SIZE "64"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

isn't "64" too low ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh yeah should be fine since it is on the heap now

@Syn3rman Syn3rman force-pushed the logs-batch-size-config branch from 46bb784 to 0158080 Compare December 19, 2022 15:02
@edsiper
Copy link
Member

edsiper commented Dec 20, 2022

merged as #6583 (this CI is stuck)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Fluent-Bit Crashes when using opentelemetry logs output (version 2.0.6)
2 participants