You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I found (again) a weird issue where it seems boolean types in config maps are not setting the defaults properly. Use the following runtime test as an example bin/flb-rt-filter_lua hello_world with Valgrind:
valgrind bin/flb-rt-filter_lua hello_world
==314183== Memcheck, a memory error detector
==314183== Copyright (C) 2002-2022, and GNU GPL'd, by Julian Seward et al.==314183== Using Valgrind-3.22.0 and LibVEX; rerun with -h for copyright info==314183== Command: bin/flb-rt-filter_lua hello_world==314183==Test hello_world... [2024/08/18 20:16:44] [ info] [fluent bit] version=3.1.7, commit=d3639b0f40, pid=314183[2024/08/18 20:16:44] [ info] [storage] ver=1.5.2, type=memory, sync=normal, checksum=off, max_chunks_up=128[2024/08/18 20:16:44] [ info] [cmetrics] version=0.9.4==314183== Thread 2 flb-pipeline:==314183== Conditional jump or move depends on uninitialised value(s)==314183== at 0x2344AA: in_dummy_init (plugins/in_dummy/in_dummy.c:354)==314183== by 0x16AE72: flb_input_instance_init (src/flb_input.c:1219)==314183== by 0x16B37F: flb_input_init_all (src/flb_input.c:1278)==314183== by 0x190659: flb_engine_start (src/flb_engine.c:805)==314183== by 0x164A9D: flb_lib_worker (src/flb_lib.c:674)==314183== by 0x50A6A93: start_thread (pthread_create.c:447)==314183== by 0x5133A33: clone (clone.S:100)==314183==
Valgrind complains that the variable ctx->flush_on_startup has not been initialized, however the config map contains a proper default value:
{
FLB_CONFIG_MAP_BOOL, "flush_on_startup", "false",
0, FLB_TRUE, offsetof(structflb_dummy, flush_on_startup),
"generate the first event on startup"
},
Yes, of course the report goes away if we initialize the plugin context with calloc(2), however the goal of config maps is to initialize variables with default values.
Why boolean is not initializing the variable ?, are we writing to a different memory address by mistake ?
Instead of Valgrind, using the compiler memory AddressSanitizer (as found by CI), the problem moves to a different location. CMake configuration line:
Running the unit test, we see a failure in filter_lua (not sure if related):
bin/flb-rt-filter_lua hello_world
Test hello_world... [2024/08/18 20:29:52] [ info] [fluent bit] version=3.1.7, commit=d3639b0f40, pid=340223
[2024/08/18 20:29:52] [ info] [storage] ver=1.5.2, type=memory, sync=normal, checksum=off, max_chunks_up=128
[2024/08/18 20:29:52] [ info] [cmetrics] version=0.9.4
[2024/08/18 20:29:52] [ info] [ctraces ] version=0.5.5
[2024/08/18 20:29:52] [ info] [input:dummy:dummy.0] initializing
[2024/08/18 20:29:52] [ info] [input:dummy:dummy.0] storage_strategy='memory' (memory only)
AddressSanitizer:DEADLYSIGNAL
=================================================================
==340223==ERROR: AddressSanitizer: SEGV on unknown address 0x000000000048 (pc 0x603c87931604 bp 0x7e077d3faab0 sp 0x7e077d3fa480 T1)
==340223==The signal is caused by a READ memory access.
==340223==Hint: address points to the zero page.
#0 0x603c87931604 in cb_lua_filter /home/edsiper/c/fluent-bit/plugins/filter_lua/lua.c:539:9#1 0x603c87317fb3 in flb_filter_do /home/edsiper/c/fluent-bit/src/flb_filter.c:158:19#2 0x603c8730bb3f in input_chunk_append_raw /home/edsiper/c/fluent-bit/src/flb_input_chunk.c:1588:9#3 0x603c8730c9aa in flb_input_chunk_append_raw /home/edsiper/c/fluent-bit/src/flb_input_chunk.c:1929:15#4 0x603c8740033f in input_log_append /home/edsiper/c/fluent-bit/src/flb_input_log.c:71:11#5 0x603c873fff31 in flb_input_log_append /home/edsiper/c/fluent-bit/src/flb_input_log.c:90:11#6 0x603c87516268 in in_dummy_collect /home/edsiper/c/fluent-bit/plugins/in_dummy/in_dummy.c:160:13#7 0x603c87515af5 in in_dummy_init /home/edsiper/c/fluent-bit/plugins/in_dummy/in_dummy.c:355:9#8 0x603c872fd703 in flb_input_instance_init /home/edsiper/c/fluent-bit/src/flb_input.c:1219:19#9 0x603c872fe1e0 in flb_input_init_all /home/edsiper/c/fluent-bit/src/flb_input.c:1278:15#10 0x603c8735d648 in flb_engine_start /home/edsiper/c/fluent-bit/src/flb_engine.c:805:11#11 0x603c872eb542 in flb_lib_worker /home/edsiper/c/fluent-bit/src/flb_lib.c:674:11#12 0x603c8729a3dc in asan_thread_start(void*) asan_interceptors.cpp.o#13 0x7e078049ca93 in start_thread nptl/pthread_create.c:447:8#14 0x7e0780529c3b in clone3 misc/../sysdeps/unix/sysv/linux/x86_64/clone3.S:78
AddressSanitizer can not provide additional info.
SUMMARY: AddressSanitizer: SEGV /home/edsiper/c/fluent-bit/plugins/filter_lua/lua.c:539:9 in cb_lua_filter
Thread T1 (flb-pipeline) created by T0 here:
#0 0x603c87282265 in pthread_create (/home/edsiper/c/fluent-bit/build/bin/flb-rt-filter_lua+0x1f9265) (BuildId: 3503a020cf50f68410a1051dda465754fc6db988)#1 0x603c87f5aa7c in mk_utils_worker_spawn /home/edsiper/c/fluent-bit/lib/monkey/mk_core/mk_utils.c:284:9#2 0x603c872ea933 in do_start /home/edsiper/c/fluent-bit/src/flb_lib.c:710:11#3 0x603c872ea764 in flb_start /home/edsiper/c/fluent-bit/src/flb_lib.c:757:11#4 0x603c872e0e2d in flb_test_helloworld /home/edsiper/c/fluent-bit/tests/runtime/filter_lua.c:449:11#5 0x603c872e4586 in acutest_do_run_ /home/edsiper/c/fluent-bit/tests/runtime/../lib/acutest/acutest.h:1034:9#6 0x603c872df43b in acutest_run_ /home/edsiper/c/fluent-bit/tests/runtime/../lib/acutest/acutest.h:1205:19#7 0x603c872dc876 in main /home/edsiper/c/fluent-bit/tests/runtime/../lib/acutest/acutest.h:1769:13#8 0x7e078042a1c9 in __libc_start_call_main csu/../sysdeps/nptl/libc_start_call_main.h:58:16#9 0x7e078042a28a in __libc_start_main csu/../csu/libc-start.c:360:3#10 0x603c87201aa4 in _start (/home/edsiper/c/fluent-bit/build/bin/flb-rt-filter_lua+0x178aa4) (BuildId: 3503a020cf50f68410a1051dda465754fc6db988)
==340223==ABORTING
The text was updated successfully, but these errors were encountered:
The core issue here is that all of those PRs that switched the mistyped boolean properties to int were supposed to be merged in conjunction with PR #8904 because that one modifies the config map subsystem to properly write the whole integer instead of the first byte in memory which would fix a longstanding flaw.
However, this is very interesting because it exposed a few additional interesting bits of information :
In this particular case the error was caused by the non initialized part of flush_on_startup but what's interesting is that the initialization order could cause problems if the user deliberately enabled the option because when the plugin ingests that initial entry the filters have not been initialized yet.
Any of those plugins that were already using int as the boolean type could have this erratic behavior in the same conditions (non zeroed context allocation).
The pedantic semantic if (variable == FLB_TRUE) { is superior =)
So I think in a way it's good that this happened because it brought up design issue that we might not have found otherwise.
Edit: BTW, I manually applied the patch file for PR #8904 and verified that it fixes the issue before I wrote my rant.
This absolutely explains why the log_to_metrics filter IS NOW DROPPING ALL LOGS BY DEFAULT. This is caused by this issue in combination with the new discard_logs option (3d4ad31). The option is always true, no matter what you configure. As a result, the filter in versions 3.1.5 and the latest 3.1.6 are both broken and cannot be used normally (without a code change)!
Bug Report
I found (again) a weird issue where it seems boolean types in config maps are not setting the defaults properly. Use the following runtime test as an example
bin/flb-rt-filter_lua hello_world
with Valgrind:Valgrind complains that the variable ctx->flush_on_startup has not been initialized, however the config map contains a proper default value:
Yes, of course the report goes away if we initialize the plugin context with calloc(2), however the goal of config maps is to initialize variables with default values.
Why boolean is not initializing the variable ?, are we writing to a different memory address by mistake ?
Instead of Valgrind, using the compiler memory AddressSanitizer (as found by CI), the problem moves to a different location. CMake configuration line:
Running the unit test, we see a failure in filter_lua (not sure if related):
The text was updated successfully, but these errors were encountered: