Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[libco] race condition #3055

Closed
hsmatulisgoogle opened this issue Feb 10, 2021 · 5 comments
Closed

[libco] race condition #3055

hsmatulisgoogle opened this issue Feb 10, 2021 · 5 comments
Assignees

Comments

@hsmatulisgoogle
Copy link
Contributor

Bug Report

There seems to be a race condition in libco:

if(!co_swap) {
co_init();
co_swap = (void (*)(cothread_t, cothread_t))co_swap_function;
}

Here co_swap is a static function pointer, which is assigned to a bunch of data in co_swap_function that corresponds to assembly instructions before first use.

However this code assume that pointer assignment is regular, which is likely platform dependent. Better to use pthread_once rather than checking if its null.

This code should probably be using inline assembly macros instead of writing data to an unsigned char and interpreting it as a function, then they wouldn't have to worry about initializing things before first use in the first place.

@edsiper
Copy link
Member

edsiper commented Feb 10, 2021

I think the libco backend must be updated, there is some modifications in the newer versions:

https://github.com/libretro/RetroArch/blob/master/libretro-common/libco/amd64.c

@hsmatulisgoogle
Copy link
Contributor Author

I don't know where they are getting updates from, as I couldn't find a libco repo to update to. Looks like they are using inline assembly in the following case

#if defined(__GNUC__) && !defined(_WIN32) && !defined(__cplusplus)
#define CO_USE_INLINE_ASM
#endif

So if we are on windows, compiling with C++ or don't have GNU extensions we'd still have the race condition. In the remaining cases this updated version doesn't have the race condition

@edsiper
Copy link
Member

edsiper commented Feb 11, 2021

so I see two things to do:

  • update libco
  • on windows, use maximum 1 worker

edsiper added a commit that referenced this issue Feb 25, 2021
When libco starts, it might enter in a race condition if multiple
threads are trying to initialize the 'co_swap' function, this check
is done on every coroutine creation:

  ==346246== Possible data race during read of size 8 at 0x5CA890 by thread #5
  ==346246== Locks held: none
  ==346246==    at 0x48EFAE: co_create (amd64.c:132)
  ==346246==    by 0x173035: flb_output_coro_create (flb_output.h:511)
  ==346246==    by 0x173035: output_thread (flb_output_thread.c:281)
  ==346246==    by 0x1889BE: step_callback (flb_worker.c:44)
  ==346246==    by 0x4843B1A: ??? (in /usr/lib/x86_64-linux-gnu/valgrind/vgpreload_helgrind-amd64-linux.so)
  ==346246==    by 0x487E58F: start_thread (pthread_create.c:463)
  ==346246==    by 0x4F47222: clone (clone.S:95)
  ==346246==
  ==346246== This conflicts with a previous write of size 8 by thread #4
  ==346246== Locks held: none
  ==346246==    at 0x48EFCB: co_create (amd64.c:134)
  ==346246==    by 0x173035: flb_output_coro_create (flb_output.h:511)
  ==346246==    by 0x173035: output_thread (flb_output_thread.c:281)
  ==346246==    by 0x1889BE: step_callback (flb_worker.c:44)
  ==346246==    by 0x4843B1A: ??? (in /usr/lib/x86_64-linux-gnu/valgrind/vgpreload_helgrind-amd64-linux.so)
  ==346246==    by 0x487E58F: start_thread (pthread_create.c:463)
  ==346246==    by 0x4F47222: clone (clone.S:95)
  ==346246==  Address 0x5ca890 is 0 bytes inside data symbol "co_swap"

This patch introduce a new API for flb_coro interface that aims to
be called inside every worker thread. The access to this first
initialization is protected.

No more race conditions on that piece of code has been seen with valgrind
after the usage of this new function (next patches).

Signed-off-by: Eduardo Silva <[email protected]>
edsiper added a commit that referenced this issue Feb 25, 2021
@edsiper
Copy link
Member

edsiper commented Feb 26, 2021

issue has been fixed. Now the first initialization of co_swap is protected

@edsiper edsiper self-assigned this Feb 26, 2021
@edsiper
Copy link
Member

edsiper commented Feb 26, 2021

refs:

@edsiper edsiper closed this as completed Feb 26, 2021
edsiper added a commit that referenced this issue Mar 2, 2021
When libco starts, it might enter in a race condition if multiple
threads are trying to initialize the 'co_swap' function, this check
is done on every coroutine creation:

  ==346246== Possible data race during read of size 8 at 0x5CA890 by thread #5
  ==346246== Locks held: none
  ==346246==    at 0x48EFAE: co_create (amd64.c:132)
  ==346246==    by 0x173035: flb_output_coro_create (flb_output.h:511)
  ==346246==    by 0x173035: output_thread (flb_output_thread.c:281)
  ==346246==    by 0x1889BE: step_callback (flb_worker.c:44)
  ==346246==    by 0x4843B1A: ??? (in /usr/lib/x86_64-linux-gnu/valgrind/vgpreload_helgrind-amd64-linux.so)
  ==346246==    by 0x487E58F: start_thread (pthread_create.c:463)
  ==346246==    by 0x4F47222: clone (clone.S:95)
  ==346246==
  ==346246== This conflicts with a previous write of size 8 by thread #4
  ==346246== Locks held: none
  ==346246==    at 0x48EFCB: co_create (amd64.c:134)
  ==346246==    by 0x173035: flb_output_coro_create (flb_output.h:511)
  ==346246==    by 0x173035: output_thread (flb_output_thread.c:281)
  ==346246==    by 0x1889BE: step_callback (flb_worker.c:44)
  ==346246==    by 0x4843B1A: ??? (in /usr/lib/x86_64-linux-gnu/valgrind/vgpreload_helgrind-amd64-linux.so)
  ==346246==    by 0x487E58F: start_thread (pthread_create.c:463)
  ==346246==    by 0x4F47222: clone (clone.S:95)
  ==346246==  Address 0x5ca890 is 0 bytes inside data symbol "co_swap"

This patch introduce a new API for flb_coro interface that aims to
be called inside every worker thread. The access to this first
initialization is protected.

No more race conditions on that piece of code has been seen with valgrind
after the usage of this new function (next patches).

Signed-off-by: Eduardo Silva <[email protected]>
edsiper added a commit that referenced this issue Mar 2, 2021
DrewZhang13 pushed a commit to DrewZhang13/fluent-bit that referenced this issue May 3, 2021
…3055)

When libco starts, it might enter in a race condition if multiple
threads are trying to initialize the 'co_swap' function, this check
is done on every coroutine creation:

  ==346246== Possible data race during read of size 8 at 0x5CA890 by thread #5
  ==346246== Locks held: none
  ==346246==    at 0x48EFAE: co_create (amd64.c:132)
  ==346246==    by 0x173035: flb_output_coro_create (flb_output.h:511)
  ==346246==    by 0x173035: output_thread (flb_output_thread.c:281)
  ==346246==    by 0x1889BE: step_callback (flb_worker.c:44)
  ==346246==    by 0x4843B1A: ??? (in /usr/lib/x86_64-linux-gnu/valgrind/vgpreload_helgrind-amd64-linux.so)
  ==346246==    by 0x487E58F: start_thread (pthread_create.c:463)
  ==346246==    by 0x4F47222: clone (clone.S:95)
  ==346246==
  ==346246== This conflicts with a previous write of size 8 by thread #4
  ==346246== Locks held: none
  ==346246==    at 0x48EFCB: co_create (amd64.c:134)
  ==346246==    by 0x173035: flb_output_coro_create (flb_output.h:511)
  ==346246==    by 0x173035: output_thread (flb_output_thread.c:281)
  ==346246==    by 0x1889BE: step_callback (flb_worker.c:44)
  ==346246==    by 0x4843B1A: ??? (in /usr/lib/x86_64-linux-gnu/valgrind/vgpreload_helgrind-amd64-linux.so)
  ==346246==    by 0x487E58F: start_thread (pthread_create.c:463)
  ==346246==    by 0x4F47222: clone (clone.S:95)
  ==346246==  Address 0x5ca890 is 0 bytes inside data symbol "co_swap"

This patch introduce a new API for flb_coro interface that aims to
be called inside every worker thread. The access to this first
initialization is protected.

No more race conditions on that piece of code has been seen with valgrind
after the usage of this new function (next patches).

Signed-off-by: Eduardo Silva <[email protected]>
DrewZhang13 pushed a commit to DrewZhang13/fluent-bit that referenced this issue May 3, 2021
DrewZhang13 pushed a commit to DrewZhang13/fluent-bit that referenced this issue May 3, 2021
…3055)

When libco starts, it might enter in a race condition if multiple
threads are trying to initialize the 'co_swap' function, this check
is done on every coroutine creation:

  ==346246== Possible data race during read of size 8 at 0x5CA890 by thread #5
  ==346246== Locks held: none
  ==346246==    at 0x48EFAE: co_create (amd64.c:132)
  ==346246==    by 0x173035: flb_output_coro_create (flb_output.h:511)
  ==346246==    by 0x173035: output_thread (flb_output_thread.c:281)
  ==346246==    by 0x1889BE: step_callback (flb_worker.c:44)
  ==346246==    by 0x4843B1A: ??? (in /usr/lib/x86_64-linux-gnu/valgrind/vgpreload_helgrind-amd64-linux.so)
  ==346246==    by 0x487E58F: start_thread (pthread_create.c:463)
  ==346246==    by 0x4F47222: clone (clone.S:95)
  ==346246==
  ==346246== This conflicts with a previous write of size 8 by thread #4
  ==346246== Locks held: none
  ==346246==    at 0x48EFCB: co_create (amd64.c:134)
  ==346246==    by 0x173035: flb_output_coro_create (flb_output.h:511)
  ==346246==    by 0x173035: output_thread (flb_output_thread.c:281)
  ==346246==    by 0x1889BE: step_callback (flb_worker.c:44)
  ==346246==    by 0x4843B1A: ??? (in /usr/lib/x86_64-linux-gnu/valgrind/vgpreload_helgrind-amd64-linux.so)
  ==346246==    by 0x487E58F: start_thread (pthread_create.c:463)
  ==346246==    by 0x4F47222: clone (clone.S:95)
  ==346246==  Address 0x5ca890 is 0 bytes inside data symbol "co_swap"

This patch introduce a new API for flb_coro interface that aims to
be called inside every worker thread. The access to this first
initialization is protected.

No more race conditions on that piece of code has been seen with valgrind
after the usage of this new function (next patches).

Signed-off-by: Eduardo Silva <[email protected]>
DrewZhang13 pushed a commit to DrewZhang13/fluent-bit that referenced this issue May 3, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants