Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

proxy: fix memory bugs when use same golang output plugin multiple times #6469

Merged
merged 2 commits into from
Dec 3, 2022

Conversation

jan-zajic
Copy link
Contributor

@jan-zajic jan-zajic commented Nov 28, 2022

Change proxy cb_exit to correctly pass right ctx to Go callback FLBPluginExitCtx and remove global plugin related memory freeing code from this method. Add new method cb_destroy that is correctly called when whole plugin is unregistered to free up whole plugin related memory.

Fix fluent/fluent-bit-go#49 in fluent-bit-go repository.

Wrong behavior with code from current master and latest stable branch:

2022/11/28 14:59:14 [multiinstance] Exit called for id: dummy_metrics
2022/11/28 14:59:14 [multiinstance] Unregister called
2022/11/28 14:59:14 [multiinstance] Unregister called
double free or corruption (out)
./multiold.sh: line 7: 1092320 Aborted ./bin/fluent-bit-orig -c test_multi/fluent-bit.conf


Enter [N/A] in the box, if an item is not applicable to your change.

Testing
Before we can approve your change; please submit the following in a comment:

  • [ X] Example configuration file for the change
  • [ X] Debug log output from testing the change
  • [ X] Attached Valgrind output that shows no leaks or memory corruption was found

If this is a change to packaging of containers or native binaries then please confirm it works for all targets.

Documentation

  • [ N/A] Documentation required for this feature

Backporting

  • [ -] Backport to latest stable release.

Fluent Bit is licensed under Apache 2.0, by submitting this pull request I understand that this code will be released under the terms of that license.

Configuration

[SERVICE]
    Flush        5
    Daemon       Off
    Log_Level    warning
    Parsers_File parsers.conf
    Plugins_File plugins.conf
    HTTP_Server  Off
    HTTP_Listen  0.0.0.0
    HTTP_Port    2020

[INPUT]
    Name cpu
    Tag  cpu.local
    Interval_Sec 1

[INPUT]
    Name dummy
    Tag  dummy.local

[OUTPUT]
    Name  multiinstance
    Match cpu*
    Id cpu_metrics

[OUTPUT]
    Name  multiinstance
    Match dummy*
    Id dummy_metrics

In plugins.conf use multiinstance plugin from https://github.com/fluent/fluent-bit-go examples/out_multiinstance:

[PLUGINS]
Path /opt/fluentbit/bin/out_multiinstance.so

@jan-zajic
Copy link
Contributor Author

Correct output from run after changes:

2022/11/28 14:58:29 [multiinstance] Register called
2022/11/28 14:58:29 [multiinstance] id = "cpu_metrics"
2022/11/28 14:58:29 [multiinstance] id = "dummy_metrics"
2022/11/28 14:58:33 [multiinstance] Flush called for id: cpu_metrics
[0] cpu.local: [2022-11-28 14:58:29.740715171 +0100 CET, {"cpu3.p_cpu": 0, "cpu3.p_user": 0, "cpu3.p_system": 0, "cpu4.p_system": 0, "cpu5.p_system": 0, "cpu1.p_cpu": 0, "cpu1.p_user": 0, "user_p": 0, "cpu0.p_system": 0, "cpu1.p_system": 0, "cpu2.p_cpu": 0, "cpu2.p_user": 0, "cpu2.p_system": 0, "cpu4.p_user": 0, "system_p": 0, "cpu0.p_cpu": 0, "cpu5.p_cpu": 0, "cpu4.p_cpu": 0, "cpu5.p_user": 0, "cpu_p": 0, "cpu0.p_user": 0, }
[1] cpu.local: [2022-11-28 14:58:30.740609792 +0100 CET, {"cpu1.p_user": 0, "cpu2.p_system": 0, "cpu3.p_cpu": 0, "cpu3.p_user": 0, "cpu5.p_cpu": 0, "cpu_p": 0.5, "user_p": 0.3333333333333333, "cpu1.p_cpu": 0, "cpu0.p_user": 2, "cpu1.p_system": 0, "cpu2.p_cpu": 1, "cpu3.p_system": 0, "system_p": 0.16666666666666666, "cpu0.p_system": 0, "cpu2.p_user": 1, "cpu4.p_system": 0, "cpu5.p_user": 0, "cpu5.p_system": 0, "cpu0.p_cpu": 2, "cpu4.p_cpu": 0, "cpu4.p_user": 0, }
[2] cpu.local: [2022-11-28 14:58:31.740598277 +0100 CET, {"cpu4.p_system": 0, "cpu5.p_user": 3, "cpu_p": 4.333333333333333, "cpu1.p_cpu": 5, "cpu1.p_system": 3, "cpu4.p_user": 3, "cpu5.p_cpu": 5, "cpu1.p_user": 2, "cpu2.p_system": 0, "cpu4.p_cpu": 3, "cpu0.p_user": 4, "cpu0.p_system": 1, "cpu5.p_system": 2, "cpu2.p_cpu": 4, "cpu2.p_user": 4, "cpu3.p_cpu": 5, "cpu3.p_user": 2, "cpu3.p_system": 3, "user_p": 2.8333333333333335, "system_p": 1.5, "cpu0.p_cpu": 5, }
[3] cpu.local: [2022-11-28 14:58:32.7406452 +0100 CET, {"cpu2.p_user": 0, "cpu3.p_user": 0, "cpu4.p_user": 0, "cpu0.p_cpu": 2, "cpu0.p_user": 2, "cpu1.p_system": 0, "cpu4.p_system": 0, "cpu5.p_user": 0, "cpu5.p_system": 0, "user_p": 0.3333333333333333, "cpu1.p_user": 0, "cpu4.p_cpu": 0, "cpu1.p_cpu": 0, "cpu5.p_cpu": 0, "cpu2.p_cpu": 0, "cpu2.p_system": 0, "cpu3.p_cpu": 0, "cpu3.p_system": 0, "cpu_p": 0.3333333333333333, "system_p": 0, "cpu0.p_system": 0, }
2022/11/28 14:58:33 [multiinstance] Flush called for id: dummy_metrics
[0] dummy.local: [2022-11-28 14:58:29.740922863 +0100 CET, {"message": [100 117 109 109 121], }
[1] dummy.local: [2022-11-28 14:58:30.740659313 +0100 CET, {"message": [100 117 109 109 121], }
[2] dummy.local: [2022-11-28 14:58:31.740648626 +0100 CET, {"message": [100 117 109 109 121], }
[3] dummy.local: [2022-11-28 14:58:32.74069623 +0100 CET, {"message": [100 117 109 109 121], }
^C[2022/11/28 14:58:37] [engine] caught signal (SIGINT)
[2022/11/28 14:58:37] [ warn] [engine] service will shutdown in max 5 seconds
2022/11/28 14:58:37 [multiinstance] Flush called for id: cpu_metrics
[0] cpu.local: [2022-11-28 14:58:33.740865918 +0100 CET, {"cpu_p": 0.6666666666666666, "user_p": 0.6666666666666666, "cpu1.p_user": 0, "cpu2.p_cpu": 0, "cpu2.p_user": 0, "cpu3.p_cpu": 0, "cpu4.p_cpu": 0, "cpu4.p_user": 0, "cpu5.p_cpu": 0, "cpu5.p_user": 0, "cpu0.p_system": 0, "cpu1.p_cpu": 0, "cpu1.p_system": 0, "cpu2.p_system": 0, "cpu3.p_system": 0, "system_p": 0, "cpu4.p_system": 0, "cpu5.p_system": 0, "cpu0.p_cpu": 3, "cpu0.p_user": 3, "cpu3.p_user": 0, }
[1] cpu.local: [2022-11-28 14:58:34.740701217 +0100 CET, {"system_p": 0.16666666666666666, "cpu1.p_user": 0, "cpu1.p_system": 0, "cpu2.p_system": 0, "cpu4.p_user": 0, "cpu4.p_system": 0, "cpu_p": 0.5, "cpu1.p_cpu": 0, "cpu2.p_user": 1, "cpu3.p_cpu": 0, "cpu3.p_user": 0, "cpu4.p_cpu": 0, "cpu5.p_user": 0, "cpu0.p_cpu": 2, "user_p": 0.3333333333333333, "cpu0.p_system": 0, "cpu2.p_cpu": 1, "cpu3.p_system": 0, "cpu5.p_cpu": 0, "cpu5.p_system": 0, "cpu0.p_user": 2, }
[2] cpu.local: [2022-11-28 14:58:35.740633827 +0100 CET, {"cpu5.p_cpu": 0, "cpu3.p_system": 0, "cpu4.p_cpu": 0, "cpu1.p_system": 0, "cpu3.p_cpu": 0, "cpu0.p_user": 2, "cpu1.p_user": 0, "cpu2.p_user": 0, "cpu4.p_system": 0, "cpu5.p_user": 0, "user_p": 0.3333333333333333, "cpu0.p_cpu": 2, "cpu0.p_system": 0, "cpu1.p_cpu": 0, "cpu2.p_cpu": 0, "cpu2.p_system": 0, "cpu3.p_user": 0, "cpu4.p_user": 0, "cpu_p": 0.3333333333333333, "system_p": 0, "cpu5.p_system": 0, }
[3] cpu.local: [2022-11-28 14:58:36.740631391 +0100 CET, {"system_p": 1.1666666666666667, "cpu2.p_cpu": 5, "cpu5.p_cpu": 4, "cpu4.p_user": 1, "cpu5.p_user": 4, "user_p": 2.6666666666666665, "cpu0.p_cpu": 6, "cpu0.p_user": 4, "cpu1.p_cpu": 3, "cpu1.p_user": 2, "cpu1.p_system": 1, "cpu5.p_system": 0, "cpu_p": 3.8333333333333335, "cpu0.p_system": 2, "cpu2.p_system": 1, "cpu4.p_system": 1, "cpu2.p_user": 4, "cpu3.p_cpu": 4, "cpu3.p_user": 2, "cpu3.p_system": 2, "cpu4.p_cpu": 2, }
2022/11/28 14:58:37 [multiinstance] Flush called for id: dummy_metrics
[0] dummy.local: [2022-11-28 14:58:33.740964917 +0100 CET, {"message": [100 117 109 109 121], }
[1] dummy.local: [2022-11-28 14:58:34.740763377 +0100 CET, {"message": [100 117 109 109 121], }
[2] dummy.local: [2022-11-28 14:58:35.740684362 +0100 CET, {"message": [100 117 109 109 121], }
[3] dummy.local: [2022-11-28 14:58:36.74068221 +0100 CET, {"message": [100 117 109 109 121], }
2022/11/28 14:58:37 [multiinstance] Exit called for id: cpu_metrics
2022/11/28 14:58:37 [multiinstance] Exit called for id: dummy_metrics
2022/11/28 14:58:37 [multiinstance] Unregister called

@jan-zajic
Copy link
Contributor Author

VALGRIND output after changes from this PR:

==1092344==
==1092344== HEAP SUMMARY:
==1092344== in use at exit: 3,858 bytes in 14 blocks
==1092344== total heap usage: 1,921 allocs, 1,907 frees, 1,100,328 bytes allocated
==1092344==
==1092344== LEAK SUMMARY:
==1092344== definitely lost: 0 bytes in 0 blocks
==1092344== indirectly lost: 0 bytes in 0 blocks
==1092344== possibly lost: 2,432 bytes in 8 blocks
==1092344== still reachable: 1,426 bytes in 6 blocks
==1092344== suppressed: 0 bytes in 0 blocks
==1092344== Rerun with --leak-check=full to see details of leaked memory
==1092344==
==1092344== Use --track-origins=yes to see where uninitialised values come from
==1092344== For lists of detected and suppressed errors, rerun with: -s
==1092344== ERROR SUMMARY: 663 errors from 323 contexts (suppressed: 0 from 0)

VALGRIND output BEFORE (on current master HEAD):

[2022/11/28 15:00:54] [engine] caught signal (SIGSEGV)
==1092380==
==1092380== Process terminating with default action of signal 6 (SIGABRT)
==1092380== at 0x4DC8CE1: raise (raise.c:51)
==1092380==
==1092380== HEAP SUMMARY:
==1092380== in use at exit: 121,170 bytes in 693 blocks
==1092380== total heap usage: 1,992 allocs, 1,305 frees, 1,885,060 bytes allocated
==1092380==
==1092380== LEAK SUMMARY:
==1092380== definitely lost: 16 bytes in 1 blocks
==1092380== indirectly lost: 0 bytes in 0 blocks
==1092380== possibly lost: 46,856 bytes in 609 blocks
==1092380== still reachable: 74,298 bytes in 83 blocks
==1092380== of which reachable via heuristic:
==1092380== newarray : 320 bytes in 4 blocks
==1092380== suppressed: 0 bytes in 0 blocks

src/flb_plugin_proxy.c Outdated Show resolved Hide resolved
jan-zajic added a commit to corpus-solutions/fluent-bit that referenced this pull request Nov 29, 2022
@jan-zajic
Copy link
Contributor Author

I tried input plugin after last commit, with example fluent-bit-go/example/in_gdummy with cfg:

[INPUT]
Name gdummy
Tag dummy.local
and it works as it should.

@jan-zajic jan-zajic force-pushed the feature/FixFLBPluginExitCtx branch from f49176a to 53cdbe9 Compare November 29, 2022 09:54
Copy link
Contributor

@cosmo0920 cosmo0920 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good. 👍

@jan-zajic jan-zajic temporarily deployed to pr November 30, 2022 02:33 Inactive
@jan-zajic jan-zajic temporarily deployed to pr November 30, 2022 02:33 Inactive
@jan-zajic jan-zajic temporarily deployed to pr November 30, 2022 02:52 Inactive
@edsiper edsiper merged commit 7f40367 into fluent:master Dec 3, 2022
sumitd2 pushed a commit to sumitd2/fluent-bit that referenced this pull request Feb 8, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

FLBPluginExitCtx returns incorrect context with multiinstance
3 participants