Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DISPATCH-848, DISPATCH-1962 Fix leak of IoAdapter_init #1052

Draft
wants to merge 7 commits into
base: main
Choose a base branch
from

Conversation

jiridanek
Copy link
Contributor

This PR depends on PRs to fix leaks in qdr_subscription_t (#1051) and qdr_core_t (#1049). The former is still WIP, so I won't carry the commit here, because it would make review harder.

The original problem is

9: Direct leak of 56 byte(s) in 1 object(s) allocated from:
9:     #0 0x7f78a3606e8f in __interceptor_malloc (/nix/store/g40sl3zh3nv52vj0mrl4iki5iphh5ika-gcc-10.2.0-lib/lib/libasan.so.6+0xace8f)
9:     #1 0x7f78a2d64afb in qd_malloc ../include/qpid/dispatch/ctools.h:229
9:     #2 0x7f78a2d657da in qdr_core_subscribe ../src/router_core/route_tables.c:149
9:     #3 0x7f78a2c83072 in IoAdapter_init ../src/python_embedded.c:711
9:     #4 0x7f78a2353a6c in type_call (/nix/store/r85nxfnwiv45nbmf5yb60jj8ajim4m7w-python3-3.8.5/lib/libpython3.8.so.1.0+0x165a6c)

The solution consists of

  1. implementing support for GC in IoAdapter so that Python can free the cycle described in the Jira.
  2. Fixing Decreffing so that Python does free
  3. Calling PyGC_Collect() at an opportune place where finalizers can be called without accessing previously already freed memory.

@jiridanek jiridanek requested a review from kgiusti February 20, 2021 22:37
@codecov-io
Copy link

Codecov Report

Merging #1052 (78fb8d1) into master (844f995) will increase coverage by 0.14%.
The diff coverage is 100.00%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master    #1052      +/-   ##
==========================================
+ Coverage   82.48%   82.63%   +0.14%     
==========================================
  Files         111      111              
  Lines       27425    27439      +14     
==========================================
+ Hits        22621    22673      +52     
+ Misses       4804     4766      -38     
Impacted Files Coverage Δ
src/dispatch.c 83.80% <100.00%> (+0.07%) ⬆️
src/python_embedded.c 69.55% <100.00%> (+1.12%) ⬆️
src/router_node.c 94.03% <100.00%> (ø)
src/router_pynode.c 88.34% <100.00%> (+0.37%) ⬆️
tests/core_timer_test.c 90.47% <100.00%> (+0.09%) ⬆️
src/router_core/transfer.c 91.81% <0.00%> (-0.44%) ⬇️
src/router_core/delivery.c 93.12% <0.00%> (-0.39%) ⬇️
src/router_core/connections.c 88.61% <0.00%> (-0.20%) ⬇️
src/message.c 87.07% <0.00%> (-0.08%) ⬇️
... and 6 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 844f995...78fb8d1. Read the comment docs.

@jiridanek
Copy link
Contributor Author

Here's a GHA run with PR #1051 merged in. There are some leaks from Proton then, but the IoAdapter leak is fixed. https://github.com/jiridanek/qpid-dispatch/actions/runs/585158010

@jiridanek jiridanek removed the request for review from kgiusti March 18, 2021 16:38
@jiridanek jiridanek force-pushed the jd_2021_02_20_lsan_specific_IoAdapter_init branch from 78fb8d1 to 7ea579e Compare March 19, 2021 20:07
@jiridanek jiridanek requested a review from kgiusti March 20, 2021 11:43
@jiridanek jiridanek changed the title DISPATCH-1962 Fix leak of IoAdapter_init DISPATCH-848, DISPATCH-1962 Fix leak of IoAdapter_init Apr 2, 2021
@jiridanek jiridanek force-pushed the jd_2021_02_20_lsan_specific_IoAdapter_init branch from 7ea579e to 30721cd Compare April 15, 2021 18:31
@jiridanek
Copy link
Contributor Author

not sure what to think about this

https://travis-ci.com/github/apache/qpid-dispatch/jobs/498893659#L4652

24: =================================================================
24: ==13469==ERROR: AddressSanitizer: attempting free on address which was not malloc()-ed: 0xffff7f0441d8 in thread T0
24:     #0 0x48f30c in free (/home/travis/build/apache/qpid-dispatch/build/router/qdrouterd+0x48f30c)
24:     #1 0xffff7ed81524  (/lib/aarch64-linux-gnu/libpython3.8.so.1.0+0x23a524)
24:     #2 0xffff7eddbd70  (/lib/aarch64-linux-gnu/libpython3.8.so.1.0+0x294d70)
24:     #3 0xffff7edc0808  (/lib/aarch64-linux-gnu/libpython3.8.so.1.0+0x279808)
24:     #4 0xffff7edc2a58  (/lib/aarch64-linux-gnu/libpython3.8.so.1.0+0x27ba58)
24:     #5 0xffff7eda918c  (/lib/aarch64-linux-gnu/libpython3.8.so.1.0+0x26218c)
24:     #6 0xffff7eda4a28  (/lib/aarch64-linux-gnu/libpython3.8.so.1.0+0x25da28)
24:     #7 0xffff7ebc79d4  (/lib/aarch64-linux-gnu/libpython3.8.so.1.0+0x809d4)
24:     #8 0xffff7eca6d88 in _PyGC_CollectNoFail (/lib/aarch64-linux-gnu/libpython3.8.so.1.0+0x15fd88)
24:     #9 0xffff7ece42e8 in PyImport_Cleanup (/lib/aarch64-linux-gnu/libpython3.8.so.1.0+0x19d2e8)
24:     #10 0xffff7ecd17e8 in Py_FinalizeEx (/lib/aarch64-linux-gnu/libpython3.8.so.1.0+0x18a7e8)
24:     #11 0xffff7f509654 in qd_dispatch_free /home/travis/build/apache/qpid-dispatch/src/dispatch.c:377:5
24:     #12 0x4bd0a4 in main_process /home/travis/build/apache/qpid-dispatch/router/src/main.c:119:5
24:     #13 0x4bbe24 in main /home/travis/build/apache/qpid-dispatch/router/src/main.c:369:9
24:     #14 0xffff7e8f308c in __libc_start_main (/lib/aarch64-linux-gnu/libc.so.6+0x2408c)
24:     #15 0x4214a8 in _start (/home/travis/build/apache/qpid-dispatch/build/router/qdrouterd+0x4214a8)
24: 
24: Address 0xffff7f0441d8 is a wild pointer.
24: SUMMARY: AddressSanitizer: bad-free (/home/travis/build/apache/qpid-dispatch/build/router/qdrouterd+0x48f30c) in free
24: ==13469==ABORTING

@jiridanek jiridanek force-pushed the jd_2021_02_20_lsan_specific_IoAdapter_init branch 2 times, most recently from ad3df5f to 9eab6a8 Compare April 20, 2021 21:06
@jiridanek
Copy link
Contributor Author

jiridanek commented Apr 21, 2021

That AArch64 crash happened under TSan too. Only issue somewhat related I can find is https://bugs.python.org/issue42697, which suggests that this could possibly be Python issue, as well as PR issue

24: FATAL: ThreadSanitizer CHECK failed: ../../../../gcc-10.2.0/libsanitizer/sanitizer_common/sanitizer_allocator_secondary.h:301 "((IsAligned(p, page_size_))) != (0)" (0x0, 0x0)
24:     #0 <null> <null> (libtsan.so.0+0x9b2e2)
24:     #1 <null> <null> (libtsan.so.0+0xb914a)
24:     #2 <null> <null> (libtsan.so.0+0x85a7c)
24:     #3 <null> <null> (libtsan.so.0+0x84dd5)
24:     #4 free <null> (libtsan.so.0+0x38be4)
24:     #5 tupledealloc <null> (libpython3.8.so.1.0+0x160c07)
24:     #6 code_dealloc <null> (libpython3.8.so.1.0+0x1300c3)
24:     #7 func_dealloc <null> (libpython3.8.so.1.0+0x136a2c)
24:     #8 free_keys_object <null> (libpython3.8.so.1.0+0x14f664)
24:     #9 type_clear <null> (libpython3.8.so.1.0+0x1f17d5)
24:     #10 collect.constprop.0 <null> (libpython3.8.so.1.0+0x1c26b1)
24:     #11 _PyGC_CollectNoFail <null> (libpython3.8.so.1.0+0x2237d1)
24:     #12 PyImport_Cleanup <null> (libpython3.8.so.1.0+0x212792)
24:     #13 Py_FinalizeEx <null> (libpython3.8.so.1.0+0x219d10)
24:     #14 qd_python_finalize ../src/python_embedded.c:71 (libqpid-dispatch.so+0x90827)
24:     #15 qd_dispatch_free ../src/dispatch.c:377 (libqpid-dispatch.so+0x713b8)
24:     #16 main_process ../router/src/main.c:119 (qdrouterd+0x40263a)
24:     #17 main ../router/src/main.c:369 (qdrouterd+0x4032ae)
24:     #18 __libc_start_main <null> (libc.so.6+0x23cbc)
24:     #19 _start <null> (qdrouterd+0x4023a9)

24: FAILED (errors=1)
24/74 Test #24: system_tests_policy ...............................***Failed   48.86 sec

@jiridanek jiridanek force-pushed the jd_2021_02_20_lsan_specific_IoAdapter_init branch from 9eab6a8 to 2f94ec7 Compare April 21, 2021 06:21
@codecov-commenter
Copy link

codecov-commenter commented Apr 21, 2021

Codecov Report

Merging #1052 (fe13407) into main (8147b62) will increase coverage by 0.01%.
The diff coverage is 94.28%.

Impacted file tree graph

@@            Coverage Diff             @@
##             main    #1052      +/-   ##
==========================================
+ Coverage   84.86%   84.88%   +0.01%     
==========================================
  Files         116      116              
  Lines       28644    28635       -9     
==========================================
- Hits        24310    24307       -3     
+ Misses       4334     4328       -6     
Impacted Files Coverage Δ
src/python_embedded.c 69.70% <90.47%> (+1.27%) ⬆️
src/dispatch.c 87.03% <100.00%> (+0.06%) ⬆️
src/router_core/router_core.c 85.62% <100.00%> (-1.43%) ⬇️
src/router_node.c 93.96% <100.00%> (+0.30%) ⬆️
src/router_pynode.c 88.34% <100.00%> (+0.49%) ⬆️
src/server.c 87.40% <100.00%> (+0.02%) ⬆️
src/router_core/delivery.c 93.71% <0.00%> (-0.74%) ⬇️
src/router_core/transfer.c 94.29% <0.00%> (-0.43%) ⬇️
src/adaptors/tcp_adaptor.c 77.10% <0.00%> (-0.32%) ⬇️
... and 7 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 8147b62...fe13407. Read the comment docs.

@jiridanek
Copy link
Contributor Author

The Python crash happens on Python 3.8.2 (on Travis in Focal), on Python 3.8.5 (what I had), but not on Python 3.8.9 (version I upgraded my machine to). I am hoping that upgrading Travis will help. Ubuntu Focal does not have 3rd party releases for Python 3.8, so I am going for Python 3.9. There seems to be small incompatibility in tests due to how Python's exception hierarchy changed (??? !!! ???).

@jiridanek
Copy link
Contributor Author

test 68
      Start 68: system_tests_http2
68: Test command: /usr/bin/python3.9 "/home/travis/build/apache/qpid-dispatch/build/tests/run.py" "-m" "unittest" "-v" "system_tests_http2"
68: Test timeout computed to be: 600
68: ERROR
68: /usr/lib/python3.9/subprocess.py:1052: ResourceWarning: subprocess 19770 is still running
68:   _warn("subprocess %s is still running" % self.pid,
68: ResourceWarning: Enable tracemalloc to get the object allocation traceback
68: ERROR
68: /usr/lib/python3.9/subprocess.py:1052: ResourceWarning: subprocess 19774 is still running
68:   _warn("subprocess %s is still running" % self.pid,
68: ResourceWarning: Enable tracemalloc to get the object allocation traceback
68: ERROR
68: /usr/lib/python3.9/subprocess.py:1052: ResourceWarning: subprocess 19779 is still running
68:   _warn("subprocess %s is still running" % self.pid,
68: ResourceWarning: Enable tracemalloc to get the object allocation traceback
68: ERROR
68: /usr/lib/python3.9/subprocess.py:1052: ResourceWarning: subprocess 19783 is still running
68:   _warn("subprocess %s is still running" % self.pid,
68: ResourceWarning: Enable tracemalloc to get the object allocation traceback
68: ERROR
68: /usr/lib/python3.9/subprocess.py:1052: ResourceWarning: subprocess 19787 is still running
68:   _warn("subprocess %s is still running" % self.pid,
68: ResourceWarning: Enable tracemalloc to get the object allocation traceback
68: ERROR
68: /usr/lib/python3.9/subprocess.py:1052: ResourceWarning: subprocess 19791 is still running
68:   _warn("subprocess %s is still running" % self.pid,
68: ResourceWarning: Enable tracemalloc to get the object allocation traceback
68: ERROR
68: /usr/lib/python3.9/subprocess.py:1052: ResourceWarning: subprocess 19796 is still running
68:   _warn("subprocess %s is still running" % self.pid,
68: ResourceWarning: Enable tracemalloc to get the object allocation traceback
68: 
68: ======================================================================
68: ERROR: setUpClass (system_tests_http2.Http2TestEdgeInteriorRouter)
68: ----------------------------------------------------------------------
68: Traceback (most recent call last):
68:   File "/home/travis/build/apache/qpid-dispatch/tests/system_test.py", line 225, in wait_port
68:     retry_exception(connect, exception_test=check, **retry_kwargs)
68:   File "/home/travis/build/apache/qpid-dispatch/tests/system_test.py", line 167, in retry_exception
68:     return function()
68:   File "/home/travis/build/apache/qpid-dispatch/tests/system_test.py", line 220, in connect
68:     s.connect((host, port))
68: ConnectionRefusedError: [Errno 111] Connection refused
68: 
68: During handling of the above exception, another exception occurred:
68: 
68: Traceback (most recent call last):
68:   File "/home/travis/build/apache/qpid-dispatch/tests/system_tests_http2.py", line 558, in setUpClass
68:     cls.http2_server = cls.tester.http2server(name=cls.http2_server_name,
68:   File "/home/travis/build/apache/qpid-dispatch/tests/system_test.py", line 828, in http2server
68:     return self.cleanup(Http2Server(*args, **kwargs))
68:   File "/home/travis/build/apache/qpid-dispatch/tests/system_test.py", line 387, in __init__
68:     self.wait_ready()
68:   File "/home/travis/build/apache/qpid-dispatch/tests/system_test.py", line 395, in wait_ready
68:     self.wait_ports(**retry_kwargs)
68:   File "/home/travis/build/apache/qpid-dispatch/tests/system_test.py", line 398, in wait_ports
68:     wait_ports(self.ports_family, **retry_kwargs)
68:   File "/home/travis/build/apache/qpid-dispatch/tests/system_test.py", line 234, in wait_ports
68:     wait_port(port=port, protocol_family=protocol_family, **retry_kwargs)
68:   File "/home/travis/build/apache/qpid-dispatch/tests/system_test.py", line 227, in wait_port
68:     raise Exception("wait_port timeout on host %s port %s: %s" % (host, port, e))
68: Exception: wait_port timeout on host None port 20816: [Errno 111] Connection refused
68: 
68: ======================================================================
68: ERROR: setUpClass (system_tests_http2.Http2TestEdgeToEdgeViaInteriorRouter)
68: ----------------------------------------------------------------------
68: Traceback (most recent call last):
68:   File "/home/travis/build/apache/qpid-dispatch/tests/system_test.py", line 225, in wait_port
68:     retry_exception(connect, exception_test=check, **retry_kwargs)
68:   File "/home/travis/build/apache/qpid-dispatch/tests/system_test.py", line 167, in retry_exception
68:     return function()
68:   File "/home/travis/build/apache/qpid-dispatch/tests/system_test.py", line 220, in connect
68:     s.connect((host, port))
68: ConnectionRefusedError: [Errno 111] Connection refused
68: 
68: During handling of the above exception, another exception occurred:
68: 
68: Traceback (most recent call last):
68:   File "/home/travis/build/apache/qpid-dispatch/tests/system_tests_http2.py", line 644, in setUpClass
68:     cls.http2_server = cls.tester.http2server(name=cls.http2_server_name,
68:   File "/home/travis/build/apache/qpid-dispatch/tests/system_test.py", line 828, in http2server
68:     return self.cleanup(Http2Server(*args, **kwargs))
68:   File "/home/travis/build/apache/qpid-dispatch/tests/system_test.py", line 387, in __init__
68:     self.wait_ready()
68:   File "/home/travis/build/apache/qpid-dispatch/tests/system_test.py", line 395, in wait_ready
68:     self.wait_ports(**retry_kwargs)
68:   File "/home/travis/build/apache/qpid-dispatch/tests/system_test.py", line 398, in wait_ports
68:     wait_ports(self.ports_family, **retry_kwargs)
68:   File "/home/travis/build/apache/qpid-dispatch/tests/system_test.py", line 234, in wait_ports
68:     wait_port(port=port, protocol_family=protocol_family, **retry_kwargs)
68:   File "/home/travis/build/apache/qpid-dispatch/tests/system_test.py", line 227, in wait_port
68:     raise Exception("wait_port timeout on host %s port %s: %s" % (host, port, e))
68: Exception: wait_port timeout on host None port 20817: [Errno 111] Connection refused
68: 
68: ======================================================================
68: ERROR: setUpClass (system_tests_http2.Http2TestInteriorEdgeRouter)
68: ----------------------------------------------------------------------
68: Traceback (most recent call last):
68:   File "/home/travis/build/apache/qpid-dispatch/tests/system_test.py", line 225, in wait_port
68:     retry_exception(connect, exception_test=check, **retry_kwargs)
68:   File "/home/travis/build/apache/qpid-dispatch/tests/system_test.py", line 167, in retry_exception
68:     return function()
68:   File "/home/travis/build/apache/qpid-dispatch/tests/system_test.py", line 220, in connect
68:     s.connect((host, port))
68: ConnectionRefusedError: [Errno 111] Connection refused
68: 
68: During handling of the above exception, another exception occurred:
68: 
68: Traceback (most recent call last):
68:   File "/home/travis/build/apache/qpid-dispatch/tests/system_tests_http2.py", line 600, in setUpClass
68:     cls.http2_server = cls.tester.http2server(name=cls.http2_server_name,
68:   File "/home/travis/build/apache/qpid-dispatch/tests/system_test.py", line 828, in http2server
68:     return self.cleanup(Http2Server(*args, **kwargs))
68:   File "/home/travis/build/apache/qpid-dispatch/tests/system_test.py", line 387, in __init__
68:     self.wait_ready()
68:   File "/home/travis/build/apache/qpid-dispatch/tests/system_test.py", line 395, in wait_ready
68:     self.wait_ports(**retry_kwargs)
68:   File "/home/travis/build/apache/qpid-dispatch/tests/system_test.py", line 398, in wait_ports
68:     wait_ports(self.ports_family, **retry_kwargs)
68:   File "/home/travis/build/apache/qpid-dispatch/tests/system_test.py", line 234, in wait_ports
68:     wait_port(port=port, protocol_family=protocol_family, **retry_kwargs)
68:   File "/home/travis/build/apache/qpid-dispatch/tests/system_test.py", line 227, in wait_port
68:     raise Exception("wait_port timeout on host %s port %s: %s" % (host, port, e))
68: Exception: wait_port timeout on host None port 20818: [Errno 111] Connection refused
68: 
68: ======================================================================
68: ERROR: setUpClass (system_tests_http2.Http2TestOneEdgeRouter)
68: ----------------------------------------------------------------------
68: Traceback (most recent call last):
68:   File "/home/travis/build/apache/qpid-dispatch/tests/system_test.py", line 225, in wait_port
68:     retry_exception(connect, exception_test=check, **retry_kwargs)
68:   File "/home/travis/build/apache/qpid-dispatch/tests/system_test.py", line 167, in retry_exception
68:     return function()
68:   File "/home/travis/build/apache/qpid-dispatch/tests/system_test.py", line 220, in connect
68:     s.connect((host, port))
68: ConnectionRefusedError: [Errno 111] Connection refused
68: 
68: During handling of the above exception, another exception occurred:
68: 
68: Traceback (most recent call last):
68:   File "/home/travis/build/apache/qpid-dispatch/tests/system_tests_http2.py", line 373, in setUpClass
68:     cls.http2_server = cls.tester.http2server(name=cls.http2_server_name,
68:   File "/home/travis/build/apache/qpid-dispatch/tests/system_test.py", line 828, in http2server
68:     return self.cleanup(Http2Server(*args, **kwargs))
68:   File "/home/travis/build/apache/qpid-dispatch/tests/system_test.py", line 387, in __init__
68:     self.wait_ready()
68:   File "/home/travis/build/apache/qpid-dispatch/tests/system_test.py", line 395, in wait_ready
68:     self.wait_ports(**retry_kwargs)
68:   File "/home/travis/build/apache/qpid-dispatch/tests/system_test.py", line 398, in wait_ports
68:     wait_ports(self.ports_family, **retry_kwargs)
68:   File "/home/travis/build/apache/qpid-dispatch/tests/system_test.py", line 234, in wait_ports
68:     wait_port(port=port, protocol_family=protocol_family, **retry_kwargs)
68:   File "/home/travis/build/apache/qpid-dispatch/tests/system_test.py", line 227, in wait_port
68:     raise Exception("wait_port timeout on host %s port %s: %s" % (host, port, e))
68: Exception: wait_port timeout on host None port 20819: [Errno 111] Connection refused
68: 
68: ======================================================================
68: ERROR: setUpClass (system_tests_http2.Http2TestOneInteriorRouter)
68: ----------------------------------------------------------------------
68: Traceback (most recent call last):
68:   File "/home/travis/build/apache/qpid-dispatch/tests/system_test.py", line 225, in wait_port
68:     retry_exception(connect, exception_test=check, **retry_kwargs)
68:   File "/home/travis/build/apache/qpid-dispatch/tests/system_test.py", line 167, in retry_exception
68:     return function()
68:   File "/home/travis/build/apache/qpid-dispatch/tests/system_test.py", line 220, in connect
68:     s.connect((host, port))
68: ConnectionRefusedError: [Errno 111] Connection refused
68: 
68: During handling of the above exception, another exception occurred:
68: 
68: Traceback (most recent call last):
68:   File "/home/travis/build/apache/qpid-dispatch/tests/system_tests_http2.py", line 411, in setUpClass
68:     cls.http2_server = cls.tester.http2server(name=cls.http2_server_name,
68:   File "/home/travis/build/apache/qpid-dispatch/tests/system_test.py", line 828, in http2server
68:     return self.cleanup(Http2Server(*args, **kwargs))
68:   File "/home/travis/build/apache/qpid-dispatch/tests/system_test.py", line 387, in __init__
68:     self.wait_ready()
68:   File "/home/travis/build/apache/qpid-dispatch/tests/system_test.py", line 395, in wait_ready
68:     self.wait_ports(**retry_kwargs)
68:   File "/home/travis/build/apache/qpid-dispatch/tests/system_test.py", line 398, in wait_ports
68:     wait_ports(self.ports_family, **retry_kwargs)
68:   File "/home/travis/build/apache/qpid-dispatch/tests/system_test.py", line 234, in wait_ports
68:     wait_port(port=port, protocol_family=protocol_family, **retry_kwargs)
68:   File "/home/travis/build/apache/qpid-dispatch/tests/system_test.py", line 227, in wait_port
68:     raise Exception("wait_port timeout on host %s port %s: %s" % (host, port, e))
68: Exception: wait_port timeout on host None port 20820: [Errno 111] Connection refused
68: 
68: ======================================================================
68: ERROR: setUpClass (system_tests_http2.Http2TestOneStandaloneRouter)
68: ----------------------------------------------------------------------
68: Traceback (most recent call last):
68:   File "/home/travis/build/apache/qpid-dispatch/tests/system_test.py", line 225, in wait_port
68:     retry_exception(connect, exception_test=check, **retry_kwargs)
68:   File "/home/travis/build/apache/qpid-dispatch/tests/system_test.py", line 167, in retry_exception
68:     return function()
68:   File "/home/travis/build/apache/qpid-dispatch/tests/system_test.py", line 220, in connect
68:     s.connect((host, port))
68: ConnectionRefusedError: [Errno 111] Connection refused
68: 
68: During handling of the above exception, another exception occurred:
68: 
68: Traceback (most recent call last):
68:   File "/home/travis/build/apache/qpid-dispatch/tests/system_tests_http2.py", line 290, in setUpClass
68:     cls.http2_server = cls.tester.http2server(name=cls.http2_server_name,
68:   File "/home/travis/build/apache/qpid-dispatch/tests/system_test.py", line 828, in http2server
68:     return self.cleanup(Http2Server(*args, **kwargs))
68:   File "/home/travis/build/apache/qpid-dispatch/tests/system_test.py", line 387, in __init__
68:     self.wait_ready()
68:   File "/home/travis/build/apache/qpid-dispatch/tests/system_test.py", line 395, in wait_ready
68:     self.wait_ports(**retry_kwargs)
68:   File "/home/travis/build/apache/qpid-dispatch/tests/system_test.py", line 398, in wait_ports
68:     wait_ports(self.ports_family, **retry_kwargs)
68:   File "/home/travis/build/apache/qpid-dispatch/tests/system_test.py", line 234, in wait_ports
68:     wait_port(port=port, protocol_family=protocol_family, **retry_kwargs)
68:   File "/home/travis/build/apache/qpid-dispatch/tests/system_test.py", line 227, in wait_port
68:     raise Exception("wait_port timeout on host %s port %s: %s" % (host, port, e))
68: Exception: wait_port timeout on host None port 20821: [Errno 111] Connection refused
68: 
68: ======================================================================
68: ERROR: setUpClass (system_tests_http2.Http2TestTwoRouter)
68: ----------------------------------------------------------------------
68: Traceback (most recent call last):
68:   File "/home/travis/build/apache/qpid-dispatch/tests/system_test.py", line 225, in wait_port
68:     retry_exception(connect, exception_test=check, **retry_kwargs)
68:   File "/home/travis/build/apache/qpid-dispatch/tests/system_test.py", line 167, in retry_exception
68:     return function()
68:   File "/home/travis/build/apache/qpid-dispatch/tests/system_test.py", line 220, in connect
68:     s.connect((host, port))
68: ConnectionRefusedError: [Errno 111] Connection refused
68: 
68: During handling of the above exception, another exception occurred:
68: 
68: Traceback (most recent call last):
68:   File "/home/travis/build/apache/qpid-dispatch/tests/system_tests_http2.py", line 449, in setUpClass
68:     cls.http2_server = cls.tester.http2server(name=cls.http2_server_name,
68:   File "/home/travis/build/apache/qpid-dispatch/tests/system_test.py", line 828, in http2server
68:     return self.cleanup(Http2Server(*args, **kwargs))
68:   File "/home/travis/build/apache/qpid-dispatch/tests/system_test.py", line 387, in __init__
68:     self.wait_ready()
68:   File "/home/travis/build/apache/qpid-dispatch/tests/system_test.py", line 395, in wait_ready
68:     self.wait_ports(**retry_kwargs)
68:   File "/home/travis/build/apache/qpid-dispatch/tests/system_test.py", line 398, in wait_ports
68:     wait_ports(self.ports_family, **retry_kwargs)
68:   File "/home/travis/build/apache/qpid-dispatch/tests/system_test.py", line 234, in wait_ports
68:     wait_port(port=port, protocol_family=protocol_family, **retry_kwargs)
68:   File "/home/travis/build/apache/qpid-dispatch/tests/system_test.py", line 227, in wait_port
68:     raise Exception("wait_port timeout on host %s port %s: %s" % (host, port, e))
68: Exception: wait_port timeout on host None port 20822: [Errno 111] Connection refused
68: 
68: ----------------------------------------------------------------------
68: Ran 0 tests in 422.274s
68: 
68: FAILED (errors=7)
68/74 Test #68: system_tests_http2 ................................***Failed  428.01 sec

@jiridanek jiridanek force-pushed the jd_2021_02_20_lsan_specific_IoAdapter_init branch from 9e6b7a5 to e9b81f8 Compare April 22, 2021 06:42
@jiridanek
Copy link
Contributor Author

https://travis-ci.com/github/apache/qpid-dispatch/jobs/500355035#L4493

45: ======================================================================
45: ERROR: tearDownClass (system_tests_topology.TopologyTests)
45: ----------------------------------------------------------------------
45: Traceback (most recent call last):
45:   File "/Users/travis/build/apache/qpid-dispatch/tests/system_test.py", line 876, in tearDownClass
45:     cls.tester.teardown()
45:   File "/Users/travis/build/apache/qpid-dispatch/tests/system_test.py", line 819, in teardown
45:     raise RuntimeError("Errors during teardown: \n\n%s" % "\n\n".join([str(e) for e in errors]))
45: RuntimeError: Errors during teardown: 
45: 
45: Process 6605 error: exit code -6, expected -1
45: qdrouterd -c D.conf -I /Users/travis/build/apache/qpid-dispatch/python
45: /Users/travis/build/apache/qpid-dispatch/build/tests/system_test.dir/system_tests_topology/TopologyTests/setUpClass/D-11.cmd
45: >>>>
45: AddressSanitizer:DEADLYSIGNAL
45: =================================================================
45: ==6605==ERROR: AddressSanitizer: SEGV on unknown address 0x000000000000 (pc 0x00010c0d4dfc bp 0x70000925fcf0 sp 0x70000925fc80 T2)
45: ==6605==The signal is caused by a READ memory access.
45: ==6605==Hint: address points to the zero page.
45:     #0 0x10c0d4dfb in qd_connection_invoke_deferred server.c:1506
45:     #1 0x10be818f0 in qd_connection_manager_delete_connector connection_manager.c:1060
45:     #2 0x10e44fef4 in ffi_call_unix64 (libffi.7.dylib:x86_64+0x4ef4)
45:     #3 0x70000925fe4f  (<unknown module>)
45: 
45: ==6605==Register values:
45: rax = 0x00001c3c000097d1  rbx = 0x000061e00004be88  rcx = 0x0000100000000000  rdx = 0x0000000000000000  
45: rdi = 0x000061e00004bd10  rsi = 0xffffffff00000000  rbp = 0x000070000925fcf0  rsp = 0x000070000925fc80  
45:  r8 = 0x00000000000020a0   r9 = 0x000060a000000498  r10 = 0x000000000000594c  r11 = 0x0000000000000206  
45: r12 = 0x00001c3c000097a5  r13 = 0x00001c22000151ea  r14 = 0x000061e00004bd10  r15 = 0x000061e00004bd28  
45: AddressSanitizer can not provide additional info.
45: SUMMARY: AddressSanitizer: SEGV server.c:1506 in qd_connection_invoke_deferred
45: Thread T2 created by T0 here:
45:     #0 0x10c915add in wrap_pthread_create (libclang_rt.asan_osx_dynamic.dylib:x86_64+0x56add)
45:     #1 0x10bf22e1d in sys_thread threading.c:181
45:     #2 0x10c0d372f in qd_server_run server.c:1485
45:     #3 0x10bd9996e in main_process main.c:115
45:     #4 0x10bd9827b in main main.c:369
45:     #5 0x7fff63c853d4 in start (libdyld.dylib:x86_64+0x163d4)
45: 
45: ==6605==ABORTING
45: <<<<
45: 
45: ----------------------------------------------------------------------
45: Ran 3 tests in 153.471s
45: 
45: FAILED (errors=2)
45/69 Test #45: system_tests_topology .............................***Failed  153.64 sec

@jiridanek
Copy link
Contributor Author

jiridanek commented Apr 22, 2021

https://issues.apache.org/jira/browse/DISPATCH-1917, already known issue https://travis-ci.com/github/apache/qpid-dispatch/jobs/500384328#L6074

45: ==================
45: WARNING: ThreadSanitizer: data race (pid=17221)
45:   Read of size 8 at 0x7b4400000a00 by thread T3 (mutexes: write M5, write M265, write M0):
45:     #0 strlen <null> (libtsan.so.0+0x32abf)
45:     #1 PyUnicode_FromString <null> (libpython3.9.so.1.0+0x1b291c)
45:     #2 qd_entity_refresh_connector /home/travis/build/apache/qpid-dispatch/src/connection_manager.c:869 (libqpid-dispatch.so+0x70f66)
45:     #3 <null> <null> (libffi.so.7+0x6ff4)
45:     #4 qdr_forward_on_message /home/travis/build/apache/qpid-dispatch/src/router_core/forwarder.c:336 (libqpid-dispatch.so+0xba5bf)
45:     #5 qdr_general_handler /home/travis/build/apache/qpid-dispatch/src/router_core/router_core.c:928 (libqpid-dispatch.so+0xc3209)
45:     #6 qd_timer_visit /home/travis/build/apache/qpid-dispatch/src/timer.c:205 (libqpid-dispatch.so+0xf1267)
45:     #7 handle /home/travis/build/apache/qpid-dispatch/src/server.c:1006 (libqpid-dispatch.so+0xebc5e)
45:     #8 thread_run /home/travis/build/apache/qpid-dispatch/src/server.c:1120 (libqpid-dispatch.so+0xedaf7)
45:     #9 _thread_init /home/travis/build/apache/qpid-dispatch/src/posix/threading.c:172 (libqpid-dispatch.so+0x96006)
45:     #10 <null> <null> (libtsan.so.0+0x2d1af)
45: 
45:   Previous write of size 8 at 0x7b4400000a00 by thread T2:
45:     #0 strcpy <null> (libtsan.so.0+0x45931)
45:     #1 strcpy /usr/include/x86_64-linux-gnu/bits/string_fortified.h:90 (libqpid-dispatch.so+0xe60b6)
45:     #2 AMQP_opened_handler /home/travis/build/apache/qpid-dispatch/src/router_node.c:1388 (libqpid-dispatch.so+0xe60b6)
45:     #3 AMQP_outbound_opened_handler /home/travis/build/apache/qpid-dispatch/src/router_node.c:1549 (libqpid-dispatch.so+0xe6e07)
45:     #4 notify_opened /home/travis/build/apache/qpid-dispatch/src/container.c:277 (libqpid-dispatch.so+0x72a73)
45:     #5 policy_notify_opened /home/travis/build/apache/qpid-dispatch/src/container.c:288 (libqpid-dispatch.so+0x72a73)
45:     #6 qd_policy_amqp_open_connector /home/travis/build/apache/qpid-dispatch/src/policy.c:1373 (libqpid-dispatch.so+0x91b3e)
45:     #7 qd_container_handle_event /home/travis/build/apache/qpid-dispatch/src/container.c:515 (libqpid-dispatch.so+0x7559c)
45:     #8 handle /home/travis/build/apache/qpid-dispatch/src/server.c:1095 (libqpid-dispatch.so+0xebb91)
45:     #9 thread_run /home/travis/build/apache/qpid-dispatch/src/server.c:1120 (libqpid-dispatch.so+0xeda7c)
45:     #10 _thread_init /home/travis/build/apache/qpid-dispatch/src/posix/threading.c:172 (libqpid-dispatch.so+0x96006)
45:     #11 <null> <null> (libtsan.so.0+0x2d1af)
45: 
45:   Location is heap block of size 300 at 0x7b4400000a00 allocated by main thread:
45:     #0 malloc <null> (libtsan.so.0+0x30343)
45:     #1 qd_server_connector /home/travis/build/apache/qpid-dispatch/src/server.c:1658 (libqpid-dispatch.so+0xefb17)
45:     #2 qd_dispatch_configure_connector /home/travis/build/apache/qpid-dispatch/src/connection_manager.c:879 (libqpid-dispatch.so+0x70fc1)
45:     #3 <null> <null> (libffi.so.7+0x6ff4)
45:     #4 main_process /home/travis/build/apache/qpid-dispatch/router/src/main.c:97 (qdrouterd+0x2bb2)
45:     #5 main /home/travis/build/apache/qpid-dispatch/router/src/main.c:369 (qdrouterd+0x2882)
45: 
45:   Mutex M5 (0x7b1000000300) created at:
45:     #0 pthread_mutex_init <null> (libtsan.so.0+0x4a636)
45:     #1 sys_mutex /home/travis/build/apache/qpid-dispatch/src/posix/threading.c:43 (libqpid-dispatch.so+0x9606f)
45:     #2 qd_python_initialize /home/travis/build/apache/qpid-dispatch/src/python_embedded.c:54 (libqpid-dispatch.so+0x98a67)
45:     #3 qd_dispatch /home/travis/build/apache/qpid-dispatch/src/dispatch.c:111 (libqpid-dispatch.so+0x77409)
45:     #4 main_process /home/travis/build/apache/qpid-dispatch/router/src/main.c:92 (qdrouterd+0x2b39)
45:     #5 main /home/travis/build/apache/qpid-dispatch/router/src/main.c:369 (qdrouterd+0x2882)
45: 
45:   Mutex M265 (0x7b1000002c40) created at:
45:     #0 pthread_mutex_init <null> (libtsan.so.0+0x4a636)
45:     #1 sys_mutex /home/travis/build/apache/qpid-dispatch/src/posix/threading.c:43 (libqpid-dispatch.so+0x9606f)
45:     #2 qd_router /home/travis/build/apache/qpid-dispatch/src/router_node.c:1631 (libqpid-dispatch.so+0xe6ffa)
45:     #3 qd_dispatch_prepare /home/travis/build/apache/qpid-dispatch/src/dispatch.c:335 (libqpid-dispatch.so+0x77003)
45:     #4 <null> <null> (libffi.so.7+0x6ff4)
45:     #5 main_process /home/travis/build/apache/qpid-dispatch/router/src/main.c:97 (qdrouterd+0x2bb2)
45:     #6 main /home/travis/build/apache/qpid-dispatch/router/src/main.c:369 (qdrouterd+0x2882)
45: 
45:   Mutex M0 (0x7b1000000200) created at:
45:     #0 pthread_mutex_init <null> (libtsan.so.0+0x4a636)
45:     #1 sys_mutex /home/travis/build/apache/qpid-dispatch/src/posix/threading.c:43 (libqpid-dispatch.so+0x9606f)
45:     #2 qd_entity_cache_initialize /home/travis/build/apache/qpid-dispatch/src/entity_cache.c:55 (libqpid-dispatch.so+0x7878a)
45:     #3 qd_dispatch /home/travis/build/apache/qpid-dispatch/src/dispatch.c:88 (libqpid-dispatch.so+0x772be)
45:     #4 main_process /home/travis/build/apache/qpid-dispatch/router/src/main.c:92 (qdrouterd+0x2b39)
45:     #5 main /home/travis/build/apache/qpid-dispatch/router/src/main.c:369 (qdrouterd+0x2882)
45: 
45:   Thread T3 (tid=17225, running) created by main thread at:
45:     #0 pthread_create <null> (libtsan.so.0+0x5ea99)
45:     #1 sys_thread /home/travis/build/apache/qpid-dispatch/src/posix/threading.c:181 (libqpid-dispatch.so+0x965c2)
45:     #2 qd_server_run /home/travis/build/apache/qpid-dispatch/src/server.c:1485 (libqpid-dispatch.so+0xee56a)
45:     #3 main_process /home/travis/build/apache/qpid-dispatch/router/src/main.c:115 (qdrouterd+0x2c27)
45:     #4 main /home/travis/build/apache/qpid-dispatch/router/src/main.c:369 (qdrouterd+0x2882)
45: 
45:   Thread T2 (tid=17224, running) created by main thread at:
45:     #0 pthread_create <null> (libtsan.so.0+0x5ea99)
45:     #1 sys_thread /home/travis/build/apache/qpid-dispatch/src/posix/threading.c:181 (libqpid-dispatch.so+0x965c2)
45:     #2 qd_server_run /home/travis/build/apache/qpid-dispatch/src/server.c:1485 (libqpid-dispatch.so+0xee56a)
45:     #3 main_process /home/travis/build/apache/qpid-dispatch/router/src/main.c:115 (qdrouterd+0x2c27)
45:     #4 main /home/travis/build/apache/qpid-dispatch/router/src/main.c:369 (qdrouterd+0x2882)
45: 
45: SUMMARY: ThreadSanitizer: data race (/lib/x86_64-linux-gnu/libtsan.so.0+0x32abf) in __interceptor_strlen

@jiridanek jiridanek force-pushed the jd_2021_02_20_lsan_specific_IoAdapter_init branch 2 times, most recently from e723639 to 988b5dc Compare April 24, 2021 06:57
@jiridanek jiridanek added this to the 1.17.0 milestone Apr 24, 2021
@jiridanek jiridanek force-pushed the jd_2021_02_20_lsan_specific_IoAdapter_init branch from 988b5dc to 9d8c9ef Compare April 26, 2021 19:23
@jiridanek jiridanek force-pushed the jd_2021_02_20_lsan_specific_IoAdapter_init branch from 9d8c9ef to 7b4fe69 Compare May 6, 2021 18:20
tests/system_test.py Outdated Show resolved Hide resolved
@jiridanek jiridanek force-pushed the jd_2021_02_20_lsan_specific_IoAdapter_init branch from edd5b83 to 1f2bbb4 Compare May 6, 2021 20:38
@jiridanek jiridanek force-pushed the jd_2021_02_20_lsan_specific_IoAdapter_init branch 2 times, most recently from 07e775c to bc29b0a Compare May 26, 2021 12:46
@jiridanek jiridanek force-pushed the jd_2021_02_20_lsan_specific_IoAdapter_init branch 2 times, most recently from 66b4724 to aacf272 Compare July 23, 2021 17:23
@jiridanek
Copy link
Contributor Author

For some reason, this is not triggering a TravisCI jobs. I'll mark it as Ready for review, hoping that will do the trick.

In fact, however, this is not ready for review, yet. I intend to

  • Run this with debug build of cpython interpreter
  • Fix any issues that show in TravisCI

@jiridanek jiridanek marked this pull request as ready for review January 22, 2022 13:44
@jiridanek jiridanek force-pushed the jd_2021_02_20_lsan_specific_IoAdapter_init branch 2 times, most recently from 8ed9b4b to 3046b4d Compare January 22, 2022 23:07
@jiridanek
Copy link
Contributor Author

Few more issues for me to investigate,

21: ::OneRouterTest::test_17_multiframe_presettled FAILED
21: 
21: =================================== FAILURES ===================================
21: _________________ OneRouterTest.test_17_multiframe_presettled __________________
21: 
21: self = <system_tests_one_router.OneRouterTest testMethod=test_17_multiframe_presettled>
21: 
21:     def test_17_multiframe_presettled(self):
21:         test = MultiframePresettledTest(self.address)
21: >       test.run()
21: 
21: /home/jdanek/repos/qpid/qpid-dispatch/tests/system_tests_one_router.py:542: 
21: _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
21: /home/jdanek/repos/qpid/qpid-dispatch/tests/system_tests_one_router.py:2884: in run
21:     Container(self).run()
21: /home/jdanek/.cache/pypoetry/virtualenvs/qpid-dispatch-6uYJVB5u-py3.10/lib64/python3.10/site-packages/proton/_reactor.py:180: in run
21:     while self.process():
21: /home/jdanek/.cache/pypoetry/virtualenvs/qpid-dispatch-6uYJVB5u-py3.10/lib64/python3.10/site-packages/proton/_reactor.py:243: in process
21:     event.dispatch(handler)
21: /home/jdanek/.cache/pypoetry/virtualenvs/qpid-dispatch-6uYJVB5u-py3.10/lib64/python3.10/site-packages/proton/_events.py:165: in dispatch
21:     self.dispatch(h, type)
21: /home/jdanek/.cache/pypoetry/virtualenvs/qpid-dispatch-6uYJVB5u-py3.10/lib64/python3.10/site-packages/proton/_events.py:165: in dispatch
21:     self.dispatch(h, type)
21: /home/jdanek/.cache/pypoetry/virtualenvs/qpid-dispatch-6uYJVB5u-py3.10/lib64/python3.10/site-packages/proton/_events.py:162: in dispatch
21:     _dispatch(handler, type.method, self)
21: /home/jdanek/.cache/pypoetry/virtualenvs/qpid-dispatch-6uYJVB5u-py3.10/lib64/python3.10/site-packages/proton/_events.py:123: in _dispatch
21:     m(*args)
21: /home/jdanek/.cache/pypoetry/virtualenvs/qpid-dispatch-6uYJVB5u-py3.10/lib64/python3.10/site-packages/proton/_handlers.py:255: in on_delivery
21:     event.message = recv_msg(dlv)
21: /home/jdanek/.cache/pypoetry/virtualenvs/qpid-dispatch-6uYJVB5u-py3.10/lib64/python3.10/site-packages/proton/_handlers.py:140: in recv_msg
21:     msg.decode(delivery.link.recv(delivery.pending))
21: /home/jdanek/.cache/pypoetry/virtualenvs/qpid-dispatch-6uYJVB5u-py3.10/lib64/python3.10/site-packages/proton/_message.py:499: in decode
21:     self._check(pn_message_decode(self._msg, data))
21: _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
21: 
21: self = Message(priority=4), err = -6
21: 
21:     def _check(self, err):
21:         if err < 0:
21:             exc = EXCEPTIONS.get(err, MessageException)
21: >           raise exc("[%s]: %s" % (err, pn_error_text(pn_message_error(self._msg))))
21: E           proton._exceptions.MessageException: [-6]: data error: (null)
21: 
21: /home/jdanek/.cache/pypoetry/virtualenvs/qpid-dispatch-6uYJVB5u-py3.10/lib64/python3.10/site-packages/proton/_message.py:80: MessageException
21: =========================== short test summary info ============================
21: FAILED ::OneRouterTest::test_17_multiframe_presettled - proton._exceptions.Me...
21: !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!!
21: =================== 1 failed, 24 passed in 77.12s (0:01:17) ====================
    Test #21: system_tests_one_router ..........***Failed   78.27 sec

and https://github.com/jiridanek/qpid-dispatch/runs/4911640230?check_suite_focus=true#step:27:5140

37/38 Test #75: c_unittests .............................***Failed    0.46 sec
[doctest] doctest version is "2.4.7"
[doctest] run with "--help" for options
=================================================================
==3586==ERROR: AddressSanitizer: attempting double-free on 0x60600000b120 in thread T4:
    #0 0x7ffa7bf58627 in free (/lib64/libasan.so.6+0xae627)
    #1 0x935f37 in router_core_thread /home/runner/work/qpid-dispatch/qpid-dispatch/qpid-dispatch/src/router_core/router_core_thread.c:236
    #2 0x7ffa7a877a86 in start_thread (/lib64/libc.so.6+0x8da86)
    #3 0x7ffa7a8fb8d3 in __GI___clone (/lib64/libc.so.6+0x1118d3)

0x60600000b120 is located 0 bytes inside of 56-byte region [0x60600000b120,0x60600000b158)
freed by thread T4 here:
    #0 0x7ffa7bf58627 in free (/lib64/libasan.so.6+0xae627)
    #1 0x938cb7 in qdr_subscribe_CT /home/runner/work/qpid-dispatch/qpid-dispatch/qpid-dispatch/src/router_core/route_tables.c:675
    #2 0x935f37 in router_core_thread /home/runner/work/qpid-dispatch/qpid-dispatch/qpid-dispatch/src/router_core/router_core_thread.c:236
    #3 0x7ffa7a877a86 in start_thread (/lib64/libc.so.6+0x8da86)

previously allocated by thread T3 here:
    #0 0x7ffa7bf5891f in __interceptor_malloc (/lib64/libasan.so.6+0xae91f)
    #1 0x93ed3d in qd_malloc /home/runner/work/qpid-dispatch/qpid-dispatch/qpid-dispatch/include/qpid/dispatch/ctools.h:234
    #2 0x93ed3d in qdr_core_subscribe /home/runner/work/qpid-dispatch/qpid-dispatch/qpid-dispatch/src/router_core/route_tables.c:147
    #3 0x86e872 in IoAdapter_init /home/runner/work/qpid-dispatch/qpid-dispatch/qpid-dispatch/src/python_embedded.c:718
    #4 0x7ffa7b81ce22 in type_call (/lib64/libpython3.10.so.1.0+0x11ae22)

Thread T4 created by T3 here:
    #0 0x7ffa7bf00866 in pthread_create (/lib64/libasan.so.6+0x56866)
    #1 0x86d6a5 in sys_thread /home/runner/work/qpid-dispatch/qpid-dispatch/qpid-dispatch/src/posix/threading.c:181
    #2 0x91be02 in qdr_core /home/runner/work/qpid-dispatch/qpid-dispatch/qpid-dispatch/src/router_core/router_core.c:124
    #3 0x99f402 in qd_router_setup_late /home/runner/work/qpid-dispatch/qpid-dispatch/qpid-dispatch/src/router_node.c:2127
    #4 0x7ffa76707c03 in ffi_call_unix64 (/lib64/libffi.so.6+0x6c03)
    #5 0x7ffa760fc98f  (<unknown module>)

Thread T3 created by T0 here:
    #0 0x7ffa7bf00866 in pthread_create (/lib64/libasan.so.6+0x56866)
    #1 0x7ffa7b534698 in std::thread::_M_start_thread(std::unique_ptr<std::thread::_State, std::default_delete<std::thread::_State> >, void (*)()) (/lib64/libstdc++.so.6+0xd9698)
    #2 0x519c4d in doctest::Context::run() /home/runner/work/qpid-dispatch/qpid-dispatch/qpid-dispatch/tests/c_unittests/doctest.h:6656
    #3 0x45b88c in main /home/runner/work/qpid-dispatch/qpid-dispatch/qpid-dispatch/tests/c_unittests/doctest.h:6741
    #4 0x7ffa7a81755f in __libc_start_call_main (/lib64/libc.so.6+0x2d55f)

SUMMARY: AddressSanitizer: double-free (/lib64/libasan.so.6+0xae627) in free
==3586==ABORTING

@jiridanek
Copy link
Contributor Author

One more, it looks like they are exceedingly rare, but quite devastating

36: E           Process 163489 error: exit code 1, expected 0
36: E           qdrouterd -c INT.B.conf -I /home/jdanek/repos/qpid/qpid-dispatch/python
36: E           /home/jdanek/repos/qpid/qpid-dispatch/cmake-build-debug-asan/tests/system_test.dir/system_tests_interior_sync_up/RouterTest/setUpClass/INT.B-2.cmd
36: E           >>>>
36: E           AddressSanitizer:DEADLYSIGNAL
36: E           =================================================================
36: E           ==163489==ERROR: AddressSanitizer: SEGV on unknown address (pc 0x7f631709656b bp 0x7fff5541b8a0 sp 0x7fff5541b870 T0)
36: E           ==163489==The signal is caused by a READ memory access.
36: E           ==163489==Hint: this fault was caused by a dereference of a high value address (see register values below).  Dissassemble the provided pc to learn which register was used.
36: E               #0 0x7f631709656b in PyVectorcall_Function.lto_priv.5 (/lib64/libpython3.10d.so.1.0+0x12f56b)
36: E               #1 0x7f63170966e1 in _PyObject_VectorcallTstate.lto_priv.5 (/lib64/libpython3.10d.so.1.0+0x12f6e1)
36: E               #2 0x7f6317097df8 in _PyObject_CallFunctionVa (/lib64/libpython3.10d.so.1.0+0x130df8)
36: E               #3 0x7f6317097f49 in PyObject_CallFunction (/lib64/libpython3.10d.so.1.0+0x130f49)
36: E               #4 0x578623 in qd_io_rx_handler /home/jdanek/repos/qpid/qpid-dispatch/src/python_embedded.c:669
36: E               #5 0x612162 in qdr_forward_on_message /home/jdanek/repos/qpid/qpid-dispatch/src/router_core/forwarder.c:335
36: E               #6 0x64c6cb in qdr_general_handler /home/jdanek/repos/qpid/qpid-dispatch/src/router_core/router_core.c:952
36: E               #7 0x6383fe in qdr_core_free /home/jdanek/repos/qpid/qpid-dispatch/src/router_core/router_core.c:272
36: E               #8 0x6d28e2 in qd_router_free /home/jdanek/repos/qpid/qpid-dispatch/src/router_node.c:2165
36: E               #9 0x4ffbdd in qd_dispatch_free /home/jdanek/repos/qpid/qpid-dispatch/src/dispatch.c:375
36: E               #10 0x7067a9 in main_process /home/jdanek/repos/qpid/qpid-dispatch/router/src/main.c:119
36: E               #11 0x708607 in main /home/jdanek/repos/qpid/qpid-dispatch/router/src/main.c:369
36: E               #12 0x7f631639255f in __libc_start_call_main (/lib64/libc.so.6+0x2d55f)
36: E               #13 0x7f631639260b in __libc_start_main_impl (/lib64/libc.so.6+0x2d60b)
36: E               #14 0x42c2e4 in _start (/home/jdanek/repos/qpid/qpid-dispatch/cmake-build-debug-asan/router/qdrouterd+0x42c2e4)
36: E           
36: E           AddressSanitizer can not provide additional info.
36: E           SUMMARY: AddressSanitizer: SEGV (/lib64/libpython3.10d.so.1.0+0x12f56b) in PyVectorcall_Function.lto_priv.5
36: E           ==163489==ABORTING
36: E           <<<<
36: 
36: /home/jdanek/repos/qpid/qpid-dispatch/tests/system_test.py:788: RuntimeError
36: =========================== short test summary info ============================
36: ERROR ::RouterTest::test_interior_sync_up - RuntimeError: Errors during teard...
36: !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!!
36: ========================= 1 passed, 1 error in 39.23s ==========================
    Test #36: system_tests_interior_sync_up ....***Failed   40.18 sec

@jiridanek
Copy link
Contributor Author

This really is crazy. Consider what may happen when both the subscribe and unsubscribe actions get discarded:

==218518==ERROR: AddressSanitizer: attempting double-free on 0x6060000227c0 in thread T3:
    #0 0x14d6627 in free (/lib64/libasan.so.6+0xae627)
    #1 0x6f32e6 in qdr_unsubscribe_CT /home/jdanek/repos/qpid/qpid-dispatch/src/router_core/route_tables.c:691
    #2 0x6c8fe7 in qdr_core_free /home/jdanek/repos/qpid/qpid-dispatch/src/router_core/router_core.c:270
    #3 0x7634f3 in qd_router_free /home/jdanek/repos/qpid/qpid-dispatch/src/router_node.c:2165
    #4 0x5906bd in qd_dispatch_free /home/jdanek/repos/qpid/qpid-dispatch/src/dispatch.c:375
    #5 0x869b8d in QDR::deinitialize(bool) const /home/jdanek/repos/qpid/qpid-dispatch/tests/c_unittests/./helpers.hpp:265
    #6 0x858299 in check_amqp_listener_startup_log_message(qd_server_config_t, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >) /home/jdanek/repos/qpid/qpid-dispatch/tests/c_unittests/test_listener_startup.cpp:58
    #7 0x85cd25 in operator() /home/jdanek/repos/qpid/qpid-dispatch/tests/c_unittests/test_listener_startup.cpp:129
    #8 0x863eef in __invoke_impl<void, DOCTEST_ANON_FUNC_28()::<lambda()> > /usr/include/c++/11/bits/invoke.h:61
    #9 0x863cd7 in __invoke<DOCTEST_ANON_FUNC_28()::<lambda()> > /usr/include/c++/11/bits/invoke.h:96
    #10 0x8639de in _M_invoke<0> /usr/include/c++/11/bits/std_thread.h:253
    #11 0x863817 in operator() /usr/include/c++/11/bits/std_thread.h:260
    #12 0x863200 in _M_run /usr/include/c++/11/bits/std_thread.h:211
    #13 0x7f0a01eac5c3 in execute_native_thread_routine (/lib64/libstdc++.so.6+0xd95c3)
    #14 0x440d10670a86 in start_thread (/lib64/libc.so.6+0x8da86)
    #15 0x440d106f48d3 in __GI___clone (/lib64/libc.so.6+0x1118d3)

0x6060000227c0 is located 0 bytes inside of 56-byte region [0x6060000227c0,0x6060000227f8)
freed by thread T3 here:
    #0 0x14d6627 in free (/lib64/libasan.so.6+0xae627)
    #1 0x6f26a9 in qdr_subscribe_CT /home/jdanek/repos/qpid/qpid-dispatch/src/router_core/route_tables.c:675
    #2 0x6c8fe7 in qdr_core_free /home/jdanek/repos/qpid/qpid-dispatch/src/router_core/router_core.c:270
    #3 0x7634f3 in qd_router_free /home/jdanek/repos/qpid/qpid-dispatch/src/router_node.c:2165
    #4 0x5906bd in qd_dispatch_free /home/jdanek/repos/qpid/qpid-dispatch/src/dispatch.c:375
    #5 0x869b8d in QDR::deinitialize(bool) const /home/jdanek/repos/qpid/qpid-dispatch/tests/c_unittests/./helpers.hpp:265
    #6 0x858299 in check_amqp_listener_startup_log_message(qd_server_config_t, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >) /home/jdanek/repos/qpid/qpid-dispatch/tests/c_unittests/test_listener_startup.cpp:58
    #7 0x85cd25 in operator() /home/jdanek/repos/qpid/qpid-dispatch/tests/c_unittests/test_listener_startup.cpp:129
    #8 0x863eef in __invoke_impl<void, DOCTEST_ANON_FUNC_28()::<lambda()> > /usr/include/c++/11/bits/invoke.h:61
    #9 0x863cd7 in __invoke<DOCTEST_ANON_FUNC_28()::<lambda()> > /usr/include/c++/11/bits/invoke.h:96
    #10 0x8639de in _M_invoke<0> /usr/include/c++/11/bits/std_thread.h:253
    #11 0x863817 in operator() /usr/include/c++/11/bits/std_thread.h:260
    #12 0x863200 in _M_run /usr/include/c++/11/bits/std_thread.h:211
    #13 0x7f0a01eac5c3 in execute_native_thread_routine (/lib64/libstdc++.so.6+0xd95c3)

previously allocated by thread T3 here:
    #0 0x14d691f in __interceptor_malloc (/lib64/libasan.so.6+0xae91f)
    #1 0x6e68c3 in qd_malloc /home/jdanek/repos/qpid/qpid-dispatch/include/qpid/dispatch/ctools.h:234
    #2 0x6e752c in qdr_core_subscribe /home/jdanek/repos/qpid/qpid-dispatch/src/router_core/route_tables.c:147
    #3 0x609db3 in IoAdapter_init /home/jdanek/repos/qpid/qpid-dispatch/src/python_embedded.c:726
    #4 0x716250fdc15a in type_call (/lib64/libpython3.10d.so.1.0+0x1c415a)
    #5 0x716250f48139 in _PyObject_MakeTpCall (/lib64/libpython3.10d.so.1.0+0x130139)
    #6 0x7162510b3aed in _PyObject_VectorcallTstate.lto_priv.26 (/lib64/libpython3.10d.so.1.0+0x29baed)
    #7 0x7162510b3b69 in PyObject_Vectorcall.lto_priv.1 (/lib64/libpython3.10d.so.1.0+0x29bb69)
    #8 0x7162510ceb04 in call_function (/lib64/libpython3.10d.so.1.0+0x2b6b04)
    #9 0x7162510c83dd in _PyEval_EvalFrameDefault (/lib64/libpython3.10d.so.1.0+0x2b03dd)
    #10 0x7162510b3e1d in _PyEval_EvalFrame.lto_priv.1 (/lib64/libpython3.10d.so.1.0+0x29be1d)
    #11 0x7162510cc8b5 in _PyEval_Vector (/lib64/libpython3.10d.so.1.0+0x2b48b5)
    #12 0x716250f487df in _PyFunction_Vectorcall (/lib64/libpython3.10d.so.1.0+0x1307df)
    #13 0x7162510b3b09 in _PyObject_VectorcallTstate.lto_priv.26 (/lib64/libpython3.10d.so.1.0+0x29bb09)
    #14 0x7162510b3b69 in PyObject_Vectorcall.lto_priv.1 (/lib64/libpython3.10d.so.1.0+0x29bb69)
    #15 0x7162510ceb04 in call_function (/lib64/libpython3.10d.so.1.0+0x2b6b04)
    #16 0x7162510c8208 in _PyEval_EvalFrameDefault (/lib64/libpython3.10d.so.1.0+0x2b0208)
    #17 0x7162510b3e1d in _PyEval_EvalFrame.lto_priv.1 (/lib64/libpython3.10d.so.1.0+0x29be1d)
    #18 0x7162510cc8b5 in _PyEval_Vector (/lib64/libpython3.10d.so.1.0+0x2b48b5)
    #19 0x716250f487df in _PyFunction_Vectorcall (/lib64/libpython3.10d.so.1.0+0x1307df)
    #20 0x716250f47737 in _PyObject_VectorcallTstate.lto_priv.5 (/lib64/libpython3.10d.so.1.0+0x12f737)
    #21 0x716250f48dd1 in _PyObject_CallFunctionVa (/lib64/libpython3.10d.so.1.0+0x130dd1)
    #22 0x716250f48f49 in PyObject_CallFunction (/lib64/libpython3.10d.so.1.0+0x130f49)
    #23 0x58deb7 in qd_dispatch_load_config /home/jdanek/repos/qpid/qpid-dispatch/src/dispatch.c:130
    #24 0x8689d4 in QDR::initialize(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&) /home/jdanek/repos/qpid/qpid-dispatch/tests/c_unittests/./helpers.hpp:215
    #25 0x857d9d in check_amqp_listener_startup_log_message(qd_server_config_t, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >) /home/jdanek/repos/qpid/qpid-dispatch/tests/c_unittests/test_listener_startup.cpp:44
    #26 0x85cd25 in operator() /home/jdanek/repos/qpid/qpid-dispatch/tests/c_unittests/test_listener_startup.cpp:129
    #27 0x863eef in __invoke_impl<void, DOCTEST_ANON_FUNC_28()::<lambda()> > /usr/include/c++/11/bits/invoke.h:61
    #28 0x863cd7 in __invoke<DOCTEST_ANON_FUNC_28()::<lambda()> > /usr/include/c++/11/bits/invoke.h:96
    #29 0x8639de in _M_invoke<0> /usr/include/c++/11/bits/std_thread.h:253

Thread T3 created by T0 here:
    #0 0x147e866 in pthread_create (/lib64/libasan.so.6+0x56866)
    #1 0x7f0a01eac698 in std::thread::_M_start_thread(std::unique_ptr<std::thread::_State, std::default_delete<std::thread::_State> >, void (*)()) (/lib64/libstdc++.so.6+0xd9698)
    #2 0x85cfad in DOCTEST_ANON_FUNC_28 /home/jdanek/repos/qpid/qpid-dispatch/tests/c_unittests/test_listener_startup.cpp:122
    #3 0x7fe4a6 in doctest::Context::run() /home/jdanek/repos/qpid/qpid-dispatch/tests/c_unittests/doctest.h:6656
    #4 0x801ba8 in main /home/jdanek/repos/qpid/qpid-dispatch/tests/c_unittests/doctest.h:6741
    #5 0x440d1061055f in __libc_start_call_main (/lib64/libc.so.6+0x2d55f)

SUMMARY: AddressSanitizer: double-free (/lib64/libasan.so.6+0xae627) in free
==218518==ABORTING

@jiridanek
Copy link
Contributor Author

Created spinoff PRs #1495 and #1496

@jiridanek jiridanek force-pushed the jd_2021_02_20_lsan_specific_IoAdapter_init branch 4 times, most recently from 0cbef99 to 29fbc01 Compare January 31, 2022 11:05
@jiridanek jiridanek marked this pull request as draft February 17, 2022 01:24
@jiridanek jiridanek removed this from the 1.19.0 milestone Feb 17, 2022
@jiridanek
Copy link
Contributor Author

jiridanek commented Feb 17, 2022

Marking as draft and dropping it from 1.19 milestone. The PR fixes the leaks, but causes occasional shutdown crashes, which is not acceptable.

  • don't forget to put the following system_test.py addition to skupper-router
        super(Http2Server, self).__init__(self.args, name=name, expect=expect)
        if wait:
            try:
                self.wait_ready()
            except Exception:
                self.teardown()
                raise

@jiridanek jiridanek force-pushed the jd_2021_02_20_lsan_specific_IoAdapter_init branch from 29fbc01 to 03c7fcd Compare February 17, 2022 01:29
@jiridanek jiridanek force-pushed the jd_2021_02_20_lsan_specific_IoAdapter_init branch from 03c7fcd to 5a49d4c Compare February 17, 2022 10:33
@jiridanek jiridanek force-pushed the jd_2021_02_20_lsan_specific_IoAdapter_init branch from 6bfeb73 to fd1a69e Compare April 12, 2022 13:11
@jiridanek jiridanek force-pushed the jd_2021_02_20_lsan_specific_IoAdapter_init branch from fd1a69e to 3835b48 Compare April 30, 2022 17:55
@jiridanek jiridanek force-pushed the jd_2021_02_20_lsan_specific_IoAdapter_init branch from 3835b48 to 44dd9fa Compare December 2, 2023 20:56
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants