Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ThreadSanitizer: data race in qdr_close_connection_CT src/router_core/connections.c:267 #243

Closed
jiridanek opened this issue Mar 25, 2022 · 4 comments
Assignees
Labels
bug Something isn't working race Reported by Thread Sanitizer
Milestone

Comments

@jiridanek
Copy link
Contributor

jiridanek commented Mar 25, 2022

Configure relaxed atomics, as in #239. Then:

WARNING: ThreadSanitizer: data race (pid=3244)
  Write of size 1 at 0x7b54000c9942 by thread T1:
    #0 qdr_close_connection_CT src/router_core/connections.c:267 (skrouterd+0x4875d8)
    #1 qdr_core_close_connection_CT src/router_core/connections.c:283 (skrouterd+0x487766)
    #2 router_core_thread src/router_core/router_core_thread.c:236 (skrouterd+0x4a382a)
    #3 _thread_init src/posix/threading.c:172 (skrouterd+0x47ad5d)

  Previous read of size 1 at 0x7b54000c9942 by main thread:
    #0 qdr_connection_process src/router_core/connections.c:308 (skrouterd+0x48783c)
    #1 _do_reconnect src/adaptors/http1/http1_server.c:432 (skrouterd+0x43bfcc)
    #2 qd_timer_visit src/timer.c:320 (skrouterd+0x4c587f)
    #3 handle src/server.c:980 (skrouterd+0x4c114e)
    #4 thread_run src/server.c:1095 (skrouterd+0x4c2f57)
    #5 qd_server_run src/server.c:1491 (skrouterd+0x4c3acc)
    #6 main_process router/src/main.c:105 (skrouterd+0x424e5c)
    #7 main router/src/main.c:359 (skrouterd+0x4242ec)

  Location is heap block of size 640 at 0x7b54000c9900 allocated by thread T4:
    #0 posix_memalign <null> (libtsan.so.0+0x32a23)
    #1 qd_alloc src/alloc_pool.c:396 (skrouterd+0x448d09)
    #2 new_qdr_connection_t src/router_core/connections.c:44 (skrouterd+0x486d21)
    #3 qdr_connection_opened src/router_core/connections.c:88 (skrouterd+0x486d21)
    #4 _setup_client_connection src/adaptors/http1/http1_client.c:316 (skrouterd+0x4342da)
    #5 _handle_connection_events src/adaptors/http1/http1_client.c:452 (skrouterd+0x4342da)
    #6 handle_event_with_context src/server.c:780 (skrouterd+0x4c11f9)
    #7 do_handle_raw_connection_event src/server.c:786 (skrouterd+0x4c11f9)
    #8 handle src/server.c:1063 (skrouterd+0x4c11f9)
    #9 thread_run src/server.c:1095 (skrouterd+0x4c2f57)
    #10 _thread_init src/posix/threading.c:172 (skrouterd+0x47ad5d)
@jiridanek jiridanek added the race Reported by Thread Sanitizer label Mar 25, 2022
@jiridanek jiridanek linked a pull request Mar 25, 2022 that will close this issue
@jiridanek jiridanek linked a pull request Mar 25, 2022 that will close this issue
@kgiusti kgiusti self-assigned this Apr 1, 2022
@kgiusti
Copy link
Contributor

kgiusti commented Apr 1, 2022

I've removed the link to #213 since I believe there's a bigger problem here. The mere fact that the timer thread is executing the qdr_connection_process() function while the core is busy deleting it is A Bad Thing. IIUC this should not happen because once the connection teardown is initiated the adaptor should have canceled that timer. The fact that this is not the case is a big old bug.

@jiridanek
Copy link
Contributor Author

@kgiusti I forgot a disclaimer in the original issue. The ThreadSanitizer warning appears for me only when I use relaxed atomics, as in #239. I still think there is likely a problem, somewhere, and the default sequentially consistent atomics only mask it, not resolve it, but that fact IMO redelegates this to be even lower priority than the other races.

jiridanek added a commit to jiridanek/skupper-router that referenced this issue Apr 4, 2022
```
WARNING: ThreadSanitizer: data race (pid=3244)
  Write of size 1 at 0x7b54000c9942 by thread T1:
    #0 qdr_close_connection_CT src/router_core/connections.c:267 (skrouterd+0x4875d8)
    #1 qdr_core_close_connection_CT src/router_core/connections.c:283 (skrouterd+0x487766)
    #2 router_core_thread src/router_core/router_core_thread.c:236 (skrouterd+0x4a382a)
    #3 _thread_init src/posix/threading.c:172 (skrouterd+0x47ad5d)

  Previous read of size 1 at 0x7b54000c9942 by main thread:
    #0 qdr_connection_process src/router_core/connections.c:308 (skrouterd+0x48783c)
    #1 _do_reconnect src/adaptors/http1/http1_server.c:432 (skrouterd+0x43bfcc)
    #2 qd_timer_visit src/timer.c:320 (skrouterd+0x4c587f)
    #3 handle src/server.c:980 (skrouterd+0x4c114e)
    #4 thread_run src/server.c:1095 (skrouterd+0x4c2f57)
    #5 qd_server_run src/server.c:1491 (skrouterd+0x4c3acc)
    #6 main_process router/src/main.c:105 (skrouterd+0x424e5c)
    #7 main router/src/main.c:359 (skrouterd+0x4242ec)

  Location is heap block of size 640 at 0x7b54000c9900 allocated by thread T4:
    #0 posix_memalign <null> (libtsan.so.0+0x32a23)
    #1 qd_alloc src/alloc_pool.c:396 (skrouterd+0x448d09)
    #2 new_qdr_connection_t src/router_core/connections.c:44 (skrouterd+0x486d21)
    #3 qdr_connection_opened src/router_core/connections.c:88 (skrouterd+0x486d21)
    #4 _setup_client_connection src/adaptors/http1/http1_client.c:316 (skrouterd+0x4342da)
    #5 _handle_connection_events src/adaptors/http1/http1_client.c:452 (skrouterd+0x4342da)
    #6 handle_event_with_context src/server.c:780 (skrouterd+0x4c11f9)
    #7 do_handle_raw_connection_event src/server.c:786 (skrouterd+0x4c11f9)
    #8 handle src/server.c:1063 (skrouterd+0x4c11f9)
    #9 thread_run src/server.c:1095 (skrouterd+0x4c2f57)
    #10 _thread_init src/posix/threading.c:172 (skrouterd+0x47ad5d)
```
jiridanek added a commit to jiridanek/skupper-router that referenced this issue Apr 4, 2022
```
WARNING: ThreadSanitizer: data race (pid=3244)
  Write of size 1 at 0x7b54000c9942 by thread T1:
    #0 qdr_close_connection_CT src/router_core/connections.c:267 (skrouterd+0x4875d8)
    #1 qdr_core_close_connection_CT src/router_core/connections.c:283 (skrouterd+0x487766)
    #2 router_core_thread src/router_core/router_core_thread.c:236 (skrouterd+0x4a382a)
    #3 _thread_init src/posix/threading.c:172 (skrouterd+0x47ad5d)

  Previous read of size 1 at 0x7b54000c9942 by main thread:
    #0 qdr_connection_process src/router_core/connections.c:308 (skrouterd+0x48783c)
    #1 _do_reconnect src/adaptors/http1/http1_server.c:432 (skrouterd+0x43bfcc)
    #2 qd_timer_visit src/timer.c:320 (skrouterd+0x4c587f)
    #3 handle src/server.c:980 (skrouterd+0x4c114e)
    #4 thread_run src/server.c:1095 (skrouterd+0x4c2f57)
    #5 qd_server_run src/server.c:1491 (skrouterd+0x4c3acc)
    #6 main_process router/src/main.c:105 (skrouterd+0x424e5c)
    #7 main router/src/main.c:359 (skrouterd+0x4242ec)

  Location is heap block of size 640 at 0x7b54000c9900 allocated by thread T4:
    #0 posix_memalign <null> (libtsan.so.0+0x32a23)
    #1 qd_alloc src/alloc_pool.c:396 (skrouterd+0x448d09)
    #2 new_qdr_connection_t src/router_core/connections.c:44 (skrouterd+0x486d21)
    #3 qdr_connection_opened src/router_core/connections.c:88 (skrouterd+0x486d21)
    #4 _setup_client_connection src/adaptors/http1/http1_client.c:316 (skrouterd+0x4342da)
    #5 _handle_connection_events src/adaptors/http1/http1_client.c:452 (skrouterd+0x4342da)
    #6 handle_event_with_context src/server.c:780 (skrouterd+0x4c11f9)
    #7 do_handle_raw_connection_event src/server.c:786 (skrouterd+0x4c11f9)
    #8 handle src/server.c:1063 (skrouterd+0x4c11f9)
    #9 thread_run src/server.c:1095 (skrouterd+0x4c2f57)
    #10 _thread_init src/posix/threading.c:172 (skrouterd+0x47ad5d)
```
@jiridanek
Copy link
Contributor Author

Here's the race on an unadulterated main branch of the router. So my worries from previous comment have not materialized

https://github.com/skupperproject/skupper-router/runs/5852739930?check_suite_focus=true#step:26:1271

56: WARNING: ThreadSanitizer: data race (pid=3062)
56:   Write of size 1 at 0x7b54000e58c2 by thread T2:
56:     #0 qdr_close_connection_CT /home/runner/work/skupper-router/skupper-router/skupper-router/src/router_core/connections.c:267 (skrouterd+0x48d678)
56:     #1 qdr_core_close_connection_CT /home/runner/work/skupper-router/skupper-router/skupper-router/src/router_core/connections.c:283 (skrouterd+0x48d806)
56:     #2 router_core_thread /home/runner/work/skupper-router/skupper-router/skupper-router/src/router_core/router_core_thread.c:247 (skrouterd+0x4a9b0a)
56:     #3 _thread_init /home/runner/work/skupper-router/skupper-router/skupper-router/src/posix/threading.c:172 (skrouterd+0x47c51d)
56: 
56:   Previous read of size 1 at 0x7b54000e58c2 by thread T3:
56:     #0 qdr_connection_process /home/runner/work/skupper-router/skupper-router/skupper-router/src/router_core/connections.c:308 (skrouterd+0x48d8dc)
56:     #1 _do_reconnect /home/runner/work/skupper-router/skupper-router/skupper-router/src/adaptors/http1/http1_server.c:439 (skrouterd+0x43d48c)
56:     #2 qd_timer_visit /home/runner/work/skupper-router/skupper-router/skupper-router/src/timer.c:320 (skrouterd+0x4cbd1f)
56:     #3 handle /home/runner/work/skupper-router/skupper-router/skupper-router/src/server.c:980 (skrouterd+0x4c75ee)
56:     #4 thread_run /home/runner/work/skupper-router/skupper-router/skupper-router/src/server.c:1095 (skrouterd+0x4c93f7)
56:     #5 _thread_init /home/runner/work/skupper-router/skupper-router/skupper-router/src/posix/threading.c:172 (skrouterd+0x47c51d)
56: 
56:   Location is heap block of size 640 at 0x7b54000e5880 allocated by thread T5:
56:     #0 posix_memalign <null> (libtsan.so.0+0x32a23)
56:     #1 qd_alloc /home/runner/work/skupper-router/skupper-router/skupper-router/src/alloc_pool.c:396 (skrouterd+0x44a9e9)
56:     #2 new_qdr_connection_t /home/runner/work/skupper-router/skupper-router/skupper-router/src/router_core/connections.c:44 (skrouterd+0x48cdc1)
56:     #3 qdr_connection_opened /home/runner/work/skupper-router/skupper-router/skupper-router/src/router_core/connections.c:88 (skrouterd+0x48cdc1)
56:     #4 _create_server_connection /home/runner/work/skupper-router/skupper-router/skupper-router/src/adaptors/http1/http1_server.c:194 (skrouterd+0x43edf4)
56:     #5 qd_http1_configure_connector /home/runner/work/skupper-router/skupper-router/skupper-router/src/adaptors/http1/http1_server.c:232 (skrouterd+0x43edf4)
56:     #6 qd_dispatch_configure_http_connector /home/runner/work/skupper-router/skupper-router/skupper-router/src/adaptors/http_common.c:167 (skrouterd+0x42928e)
56:     #7 ffi_call_unix64 <null> (libffi.so.6+0x6c03)
56:     #8 qdr_forward_on_message /home/runner/work/skupper-router/skupper-router/skupper-router/src/router_core/forwarder.c:324 (skrouterd+0x49d41c)
56:     #9 qdr_general_handler /home/runner/work/skupper-router/skupper-router/skupper-router/src/router_core/router_core.c:824 (skrouterd+0x4a2fa1)
56:     #10 qd_timer_visit /home/runner/work/skupper-router/skupper-router/skupper-router/src/timer.c:320 (skrouterd+0x4cbd1f)
56:     #11 handle /home/runner/work/skupper-router/skupper-router/skupper-router/src/server.c:980 (skrouterd+0x4c75ee)
56:     #12 thread_run /home/runner/work/skupper-router/skupper-router/skupper-router/src/server.c:1095 (skrouterd+0x4c93f7)
56:     #13 _thread_init /home/runner/work/skupper-router/skupper-router/skupper-router/src/posix/threading.c:172 (skrouterd+0x47c51d)
56: 
56:   Thread T2 (tid=3067, running) created by main thread at:
56:     #0 pthread_create <null> (libtsan.so.0+0x5bef5)
56:     #1 sys_thread /home/runner/work/skupper-router/skupper-router/skupper-router/src/posix/threading.c:181 (skrouterd+0x47c9ac)
56:     #2 qdr_core /home/runner/work/skupper-router/skupper-router/skupper-router/src/router_core/router_core.c:122 (skrouterd+0x4a43d1)
56:     #3 qd_router_setup_late /home/runner/work/skupper-router/skupper-router/skupper-router/src/router_node.c:2058 (skrouterd+0x4c29d8)
56:     #4 ffi_call_unix64 <null> (libffi.so.6+0x6c03)
56:     #5 main_process /home/runner/work/skupper-router/skupper-router/skupper-router/router/src/main.c:92 (skrouterd+0x425ebc)
56:     #6 main /home/runner/work/skupper-router/skupper-router/skupper-router/router/src/main.c:359 (skrouterd+0x42531c)
56: 
56:   Thread T3 (tid=3069, running) created by main thread at:
56:     #0 pthread_create <null> (libtsan.so.0+0x5bef5)
56:     #1 sys_thread /home/runner/work/skupper-router/skupper-router/skupper-router/src/posix/threading.c:181 (skrouterd+0x47c9ac)
56:     #2 qd_server_run /home/runner/work/skupper-router/skupper-router/skupper-router/src/server.c:1489 (skrouterd+0x4c9f4c)
56:     #3 main_process /home/runner/work/skupper-router/skupper-router/skupper-router/router/src/main.c:105 (skrouterd+0x425f1c)
56:     #4 main /home/runner/work/skupper-router/skupper-router/skupper-router/router/src/main.c:359 (skrouterd+0x42531c)
56: 
56:   Thread T5 (tid=3071, running) created by main thread at:
56:     #0 pthread_create <null> (libtsan.so.0+0x5bef5)
56:     #1 sys_thread /home/runner/work/skupper-router/skupper-router/skupper-router/src/posix/threading.c:181 (skrouterd+0x47c9ac)
56:     #2 qd_server_run /home/runner/work/skupper-router/skupper-router/skupper-router/src/server.c:1489 (skrouterd+0x4c9f4c)
56:     #3 main_process /home/runner/work/skupper-router/skupper-router/skupper-router/router/src/main.c:105 (skrouterd+0x425f1c)
56:     #4 main /home/runner/work/skupper-router/skupper-router/skupper-router/router/src/main.c:359 (skrouterd+0x42531c)
56: 
56: SUMMARY: ThreadSanitizer: data race /home/runner/work/skupper-router/skupper-router/skupper-router/src/router_core/connections.c:267 in qdr_close_connection_CT

Now I feel better about merging the PR. Good enough I might actually merge it.

jiridanek added a commit that referenced this issue Apr 7, 2022
…nnection_CT` (#213)

```
WARNING: ThreadSanitizer: data race (pid=3244)
  Write of size 1 at 0x7b54000c9942 by thread T1:
    #0 qdr_close_connection_CT src/router_core/connections.c:267 (skrouterd+0x4875d8)
    #1 qdr_core_close_connection_CT src/router_core/connections.c:283 (skrouterd+0x487766)
    #2 router_core_thread src/router_core/router_core_thread.c:236 (skrouterd+0x4a382a)
    #3 _thread_init src/posix/threading.c:172 (skrouterd+0x47ad5d)

  Previous read of size 1 at 0x7b54000c9942 by main thread:
    #0 qdr_connection_process src/router_core/connections.c:308 (skrouterd+0x48783c)
    #1 _do_reconnect src/adaptors/http1/http1_server.c:432 (skrouterd+0x43bfcc)
    #2 qd_timer_visit src/timer.c:320 (skrouterd+0x4c587f)
    #3 handle src/server.c:980 (skrouterd+0x4c114e)
    #4 thread_run src/server.c:1095 (skrouterd+0x4c2f57)
    #5 qd_server_run src/server.c:1491 (skrouterd+0x4c3acc)
    #6 main_process router/src/main.c:105 (skrouterd+0x424e5c)
    #7 main router/src/main.c:359 (skrouterd+0x4242ec)

  Location is heap block of size 640 at 0x7b54000c9900 allocated by thread T4:
    #0 posix_memalign <null> (libtsan.so.0+0x32a23)
    #1 qd_alloc src/alloc_pool.c:396 (skrouterd+0x448d09)
    #2 new_qdr_connection_t src/router_core/connections.c:44 (skrouterd+0x486d21)
    #3 qdr_connection_opened src/router_core/connections.c:88 (skrouterd+0x486d21)
    #4 _setup_client_connection src/adaptors/http1/http1_client.c:316 (skrouterd+0x4342da)
    #5 _handle_connection_events src/adaptors/http1/http1_client.c:452 (skrouterd+0x4342da)
    #6 handle_event_with_context src/server.c:780 (skrouterd+0x4c11f9)
    #7 do_handle_raw_connection_event src/server.c:786 (skrouterd+0x4c11f9)
    #8 handle src/server.c:1063 (skrouterd+0x4c11f9)
    #9 thread_run src/server.c:1095 (skrouterd+0x4c2f57)
    #10 _thread_init src/posix/threading.c:172 (skrouterd+0x47ad5d)
```
@ganeshmurthy ganeshmurthy added this to the 2.0.0 milestone Apr 12, 2022
@ganeshmurthy ganeshmurthy modified the milestones: 2.0.0, 2.1.0 Apr 19, 2022
@kgiusti kgiusti modified the milestones: 2.1.0, 2.0.0 Apr 20, 2022
@kgiusti
Copy link
Contributor

kgiusti commented Apr 20, 2022

I've removed the link to #213 since I believe there's a bigger problem here. The mere fact that the timer thread is executing the qdr_connection_process() function while the core is busy deleting it is A Bad Thing. IIUC this should not happen because once the connection teardown is initiated the adaptor should have canceled that timer. The fact that this is not the case is a big old bug.

After looking at the HTTP/1 adaptor's teardown logic I believe the above statement is completely wrong. Running qdr_connection_process() via the _do_reconnect timer handler is by-design: it is the only way to handle core connection work once the proactor's raw connection has been closed.

So yeah - TL;DR: @jiridanek fix is definitely the Right Way to Fix This since the I/O and core thread share the reference to the qdr_connection_t object.

Closing this issue as fixed and moving the milestone to 2.0.0 since the fix is present in that release.

@kgiusti kgiusti closed this as completed Apr 20, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working race Reported by Thread Sanitizer
Projects
None yet
Development

No branches or pull requests

3 participants