-
Notifications
You must be signed in to change notification settings - Fork 19
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ThreadSanitizer: data race in qdr_close_connection_CT src/router_core/connections.c:267 #243
Comments
I've removed the link to #213 since I believe there's a bigger problem here. The mere fact that the timer thread is executing the qdr_connection_process() function while the core is busy deleting it is A Bad Thing. IIUC this should not happen because once the connection teardown is initiated the adaptor should have canceled that timer. The fact that this is not the case is a big old bug. |
@kgiusti I forgot a disclaimer in the original issue. The ThreadSanitizer warning appears for me only when I use relaxed atomics, as in #239. I still think there is likely a problem, somewhere, and the default sequentially consistent atomics only mask it, not resolve it, but that fact IMO redelegates this to be even lower priority than the other races. |
``` WARNING: ThreadSanitizer: data race (pid=3244) Write of size 1 at 0x7b54000c9942 by thread T1: #0 qdr_close_connection_CT src/router_core/connections.c:267 (skrouterd+0x4875d8) #1 qdr_core_close_connection_CT src/router_core/connections.c:283 (skrouterd+0x487766) #2 router_core_thread src/router_core/router_core_thread.c:236 (skrouterd+0x4a382a) #3 _thread_init src/posix/threading.c:172 (skrouterd+0x47ad5d) Previous read of size 1 at 0x7b54000c9942 by main thread: #0 qdr_connection_process src/router_core/connections.c:308 (skrouterd+0x48783c) #1 _do_reconnect src/adaptors/http1/http1_server.c:432 (skrouterd+0x43bfcc) #2 qd_timer_visit src/timer.c:320 (skrouterd+0x4c587f) #3 handle src/server.c:980 (skrouterd+0x4c114e) #4 thread_run src/server.c:1095 (skrouterd+0x4c2f57) #5 qd_server_run src/server.c:1491 (skrouterd+0x4c3acc) #6 main_process router/src/main.c:105 (skrouterd+0x424e5c) #7 main router/src/main.c:359 (skrouterd+0x4242ec) Location is heap block of size 640 at 0x7b54000c9900 allocated by thread T4: #0 posix_memalign <null> (libtsan.so.0+0x32a23) #1 qd_alloc src/alloc_pool.c:396 (skrouterd+0x448d09) #2 new_qdr_connection_t src/router_core/connections.c:44 (skrouterd+0x486d21) #3 qdr_connection_opened src/router_core/connections.c:88 (skrouterd+0x486d21) #4 _setup_client_connection src/adaptors/http1/http1_client.c:316 (skrouterd+0x4342da) #5 _handle_connection_events src/adaptors/http1/http1_client.c:452 (skrouterd+0x4342da) #6 handle_event_with_context src/server.c:780 (skrouterd+0x4c11f9) #7 do_handle_raw_connection_event src/server.c:786 (skrouterd+0x4c11f9) #8 handle src/server.c:1063 (skrouterd+0x4c11f9) #9 thread_run src/server.c:1095 (skrouterd+0x4c2f57) #10 _thread_init src/posix/threading.c:172 (skrouterd+0x47ad5d) ```
``` WARNING: ThreadSanitizer: data race (pid=3244) Write of size 1 at 0x7b54000c9942 by thread T1: #0 qdr_close_connection_CT src/router_core/connections.c:267 (skrouterd+0x4875d8) #1 qdr_core_close_connection_CT src/router_core/connections.c:283 (skrouterd+0x487766) #2 router_core_thread src/router_core/router_core_thread.c:236 (skrouterd+0x4a382a) #3 _thread_init src/posix/threading.c:172 (skrouterd+0x47ad5d) Previous read of size 1 at 0x7b54000c9942 by main thread: #0 qdr_connection_process src/router_core/connections.c:308 (skrouterd+0x48783c) #1 _do_reconnect src/adaptors/http1/http1_server.c:432 (skrouterd+0x43bfcc) #2 qd_timer_visit src/timer.c:320 (skrouterd+0x4c587f) #3 handle src/server.c:980 (skrouterd+0x4c114e) #4 thread_run src/server.c:1095 (skrouterd+0x4c2f57) #5 qd_server_run src/server.c:1491 (skrouterd+0x4c3acc) #6 main_process router/src/main.c:105 (skrouterd+0x424e5c) #7 main router/src/main.c:359 (skrouterd+0x4242ec) Location is heap block of size 640 at 0x7b54000c9900 allocated by thread T4: #0 posix_memalign <null> (libtsan.so.0+0x32a23) #1 qd_alloc src/alloc_pool.c:396 (skrouterd+0x448d09) #2 new_qdr_connection_t src/router_core/connections.c:44 (skrouterd+0x486d21) #3 qdr_connection_opened src/router_core/connections.c:88 (skrouterd+0x486d21) #4 _setup_client_connection src/adaptors/http1/http1_client.c:316 (skrouterd+0x4342da) #5 _handle_connection_events src/adaptors/http1/http1_client.c:452 (skrouterd+0x4342da) #6 handle_event_with_context src/server.c:780 (skrouterd+0x4c11f9) #7 do_handle_raw_connection_event src/server.c:786 (skrouterd+0x4c11f9) #8 handle src/server.c:1063 (skrouterd+0x4c11f9) #9 thread_run src/server.c:1095 (skrouterd+0x4c2f57) #10 _thread_init src/posix/threading.c:172 (skrouterd+0x47ad5d) ```
Here's the race on an unadulterated main branch of the router. So my worries from previous comment have not materialized
Now I feel better about merging the PR. Good enough I might actually merge it. |
…nnection_CT` (#213) ``` WARNING: ThreadSanitizer: data race (pid=3244) Write of size 1 at 0x7b54000c9942 by thread T1: #0 qdr_close_connection_CT src/router_core/connections.c:267 (skrouterd+0x4875d8) #1 qdr_core_close_connection_CT src/router_core/connections.c:283 (skrouterd+0x487766) #2 router_core_thread src/router_core/router_core_thread.c:236 (skrouterd+0x4a382a) #3 _thread_init src/posix/threading.c:172 (skrouterd+0x47ad5d) Previous read of size 1 at 0x7b54000c9942 by main thread: #0 qdr_connection_process src/router_core/connections.c:308 (skrouterd+0x48783c) #1 _do_reconnect src/adaptors/http1/http1_server.c:432 (skrouterd+0x43bfcc) #2 qd_timer_visit src/timer.c:320 (skrouterd+0x4c587f) #3 handle src/server.c:980 (skrouterd+0x4c114e) #4 thread_run src/server.c:1095 (skrouterd+0x4c2f57) #5 qd_server_run src/server.c:1491 (skrouterd+0x4c3acc) #6 main_process router/src/main.c:105 (skrouterd+0x424e5c) #7 main router/src/main.c:359 (skrouterd+0x4242ec) Location is heap block of size 640 at 0x7b54000c9900 allocated by thread T4: #0 posix_memalign <null> (libtsan.so.0+0x32a23) #1 qd_alloc src/alloc_pool.c:396 (skrouterd+0x448d09) #2 new_qdr_connection_t src/router_core/connections.c:44 (skrouterd+0x486d21) #3 qdr_connection_opened src/router_core/connections.c:88 (skrouterd+0x486d21) #4 _setup_client_connection src/adaptors/http1/http1_client.c:316 (skrouterd+0x4342da) #5 _handle_connection_events src/adaptors/http1/http1_client.c:452 (skrouterd+0x4342da) #6 handle_event_with_context src/server.c:780 (skrouterd+0x4c11f9) #7 do_handle_raw_connection_event src/server.c:786 (skrouterd+0x4c11f9) #8 handle src/server.c:1063 (skrouterd+0x4c11f9) #9 thread_run src/server.c:1095 (skrouterd+0x4c2f57) #10 _thread_init src/posix/threading.c:172 (skrouterd+0x47ad5d) ```
After looking at the HTTP/1 adaptor's teardown logic I believe the above statement is completely wrong. Running qdr_connection_process() via the _do_reconnect timer handler is by-design: it is the only way to handle core connection work once the proactor's raw connection has been closed. So yeah - TL;DR: @jiridanek fix is definitely the Right Way to Fix This since the I/O and core thread share the reference to the qdr_connection_t object. Closing this issue as fixed and moving the milestone to 2.0.0 since the fix is present in that release. |
Configure relaxed atomics, as in #239. Then:
The text was updated successfully, but these errors were encountered: