Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

aws: imds fallback to v1 if token request fails #4

Open
wants to merge 2 commits into
base: 1.8.8-imds-sync-timeout
Choose a base branch
from

Conversation

matthewfala
Copy link
Owner

Signed-off-by: Matthew Fala [email protected]


Enter [N/A] in the box, if an item is not applicable to your change.

Testing
Before we can approve your change; please submit the following in a comment:

  • Example configuration file for the change
  • Debug log output from testing the change
  • Attached Valgrind output that shows no leaks or memory corruption was found

Documentation

  • Documentation required for this feature

Fluent Bit is licensed under Apache 2.0, by submitting this pull request I understand that this code will be released under the terms of that license.

@matthewfala matthewfala force-pushed the 1.8.8-imds-sync-timeout branch from 4b2014d to d42645b Compare October 13, 2021 23:47
@matthewfala matthewfala force-pushed the 1.8.8-imds-sync-timeout branch from d42645b to 5b6cdbd Compare November 11, 2021 20:20
@matthewfala matthewfala force-pushed the 1.8.8-imds-sync-timeout branch 3 times, most recently from 148f661 to 5b48be7 Compare December 2, 2021 00:54
matthewfala pushed a commit that referenced this pull request Dec 3, 2021
When workers are enabled and a timeout occurs in a connection most of
cases a deadlock is held in the active worker:

  ==1654992== Thread #4: Attempt to re-lock a non-recursive lock I already hold
  ==1654992==    at 0x484BB44: ??? (in /usr/libexec/valgrind/vgpreload_helgrind-amd64-linux.so)
  ==1654992==    by 0x197579: prepare_destroy_conn_safe (flb_upstream.c:435)
  ==1654992==    by 0x197887: create_conn (flb_upstream.c:533)
  ==1654992==    by 0x197DBB: flb_upstream_conn_get (flb_upstream.c:674)
  ==1654992==    by 0x2396D3: http_post (http.c:86)
  ==1654992==    by 0x23A5E5: cb_http_flush (http.c:338)
  ==1654992==    by 0x17FE6B: output_pre_cb_flush (flb_output.h:511)
  ==1654992==    by 0x503DAA: co_init (amd64.c:117)
  ==1654992==  Lock was previously acquired
  ==1654992==    at 0x484BC0F: ??? (in /usr/libexec/valgrind/vgpreload_helgrind-amd64-linux.so)
  ==1654992==    by 0x19815F: flb_upstream_conn_timeouts (flb_upstream.c:780)
  ==1654992==    by 0x17FEFC: cb_thread_sched_timer (flb_output_thread.c:58)
  ==1654992==    by 0x193ED7: flb_sched_event_handler (flb_scheduler.c:422)
  ==1654992==    by 0x180672: output_thread (flb_output_thread.c:265)
  ==1654992==    by 0x199602: step_callback (flb_worker.c:44)
  ==1654992==    by 0x484E8AA: ??? (in /usr/libexec/valgrind/vgpreload_helgrind-amd64-linux.so)
  ==1654992==    by 0x4E3F926: start_thread (pthread_create.c:435)
  ==1654992==    by 0x4ECF9E3: clone (clone.S:100)

The following patch fix the behavior on prepare_destroy_conn_safe by 'trying to acquire'
the mutex lock, if it fails to acquire it, it will asssume it's already locked and no
new lock is required.

Signed-off-by: Eduardo Silva <[email protected]>
matthewfala pushed a commit that referenced this pull request Dec 8, 2021
When workers are enabled and a timeout occurs in a connection most of
cases a deadlock is held in the active worker:

  ==1654992== Thread #4: Attempt to re-lock a non-recursive lock I already hold
  ==1654992==    at 0x484BB44: ??? (in /usr/libexec/valgrind/vgpreload_helgrind-amd64-linux.so)
  ==1654992==    by 0x197579: prepare_destroy_conn_safe (flb_upstream.c:435)
  ==1654992==    by 0x197887: create_conn (flb_upstream.c:533)
  ==1654992==    by 0x197DBB: flb_upstream_conn_get (flb_upstream.c:674)
  ==1654992==    by 0x2396D3: http_post (http.c:86)
  ==1654992==    by 0x23A5E5: cb_http_flush (http.c:338)
  ==1654992==    by 0x17FE6B: output_pre_cb_flush (flb_output.h:511)
  ==1654992==    by 0x503DAA: co_init (amd64.c:117)
  ==1654992==  Lock was previously acquired
  ==1654992==    at 0x484BC0F: ??? (in /usr/libexec/valgrind/vgpreload_helgrind-amd64-linux.so)
  ==1654992==    by 0x19815F: flb_upstream_conn_timeouts (flb_upstream.c:780)
  ==1654992==    by 0x17FEFC: cb_thread_sched_timer (flb_output_thread.c:58)
  ==1654992==    by 0x193ED7: flb_sched_event_handler (flb_scheduler.c:422)
  ==1654992==    by 0x180672: output_thread (flb_output_thread.c:265)
  ==1654992==    by 0x199602: step_callback (flb_worker.c:44)
  ==1654992==    by 0x484E8AA: ??? (in /usr/libexec/valgrind/vgpreload_helgrind-amd64-linux.so)
  ==1654992==    by 0x4E3F926: start_thread (pthread_create.c:435)
  ==1654992==    by 0x4ECF9E3: clone (clone.S:100)

The following patch fix the behavior on prepare_destroy_conn_safe by 'trying to acquire'
the mutex lock, if it fails to acquire it, it will asssume it's already locked and no
new lock is required.

Signed-off-by: Eduardo Silva <[email protected]>
@github-actions
Copy link

github-actions bot commented Mar 2, 2022

This PR is stale because it has been open 45 days with no activity. Remove stale label or comment or this will be closed in 10 days.

@github-actions github-actions bot added the Stale label Mar 2, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant