Verbs hangs in ibv_reg_mr() / fi mr caching issue #5687

frostedcmos · 2020-02-28T20:42:35Z

On CaRT project we've tried to update to ba597c9 of OFI and are now hitting a problem where ofi+verbs;ofi_rxm seems to be hanging at init time.

This has been reproduced on few systems with different server tests.

The same tests running with "FI_MR_CACHE_MAX_COUNT=0" envariable set, passes this particular hang.

Attaching to server via gdp -p / backtrace shows:
(gdb) bt
#0 0x00007f65f544951d in __lll_lock_wait () from /lib64/libpthread.so.0
#1 0x00007f65f5444e1b in _L_lock_812 () from /lib64/libpthread.so.0
#2 0x00007f65f5444ce8 in pthread_mutex_lock () from /lib64/libpthread.so.0
#3 0x00007f65f305fd98 in ofi_intercept_handler () from /home/aaoganez/github/liwei/cart/install/Linux/lib/libfabric.so.1
#4 0x00007f65f305fea2 in ofi_intercept_madvise () from /home/aaoganez/github/liwei/cart/install/Linux/lib/libfabric.so.1
#5 0x00007f65f27f759c in ibv_madvise_range.part.5 () from /lib64/libibverbs.so.1
#6 0x00007f65f27f8fd2 in ibv_reg_mr () from /lib64/libibverbs.so.1
#7 0x00007f65f3087849 in vrb_mr_cache_add_region () from /home/aaoganez/github/liwei/cart/install/Linux/lib/libfabric.so.1
#8 0x00007f65f30606a9 in util_mr_cache_create.isra.5 () from /home/aaoganez/github/liwei/cart/install/Linux/lib/libfabric.so.1
#9 0x00007f65f3060a3f in ofi_mr_cache_search () from /home/aaoganez/github/liwei/cart/install/Linux/lib/libfabric.so.1
#10 0x00007f65f30875d6 in vrb_mr_cache_reg () from /home/aaoganez/github/liwei/cart/install/Linux/lib/libfabric.so.1
#11 0x00007f65f309512e in rxm_mr_regv () from /home/aaoganez/github/liwei/cart/install/Linux/lib/libfabric.so.1
#12 0x00007f65f309525a in rxm_mr_reg () from /home/aaoganez/github/liwei/cart/install/Linux/lib/libfabric.so.1
#13 0x00007f65f4d0cdd4 in fi_mr_reg (context=0x0, mr=0x7ffe95704398, flags=0, requested_key=0, offset=0, acs=16128, len=1050672, buf=0x3034000, domain=)
at /home/aaoganez/github/liwei/cart/install/Linux/include/rdma/fi_domain.h:328
#14 na_ofi_mem_alloc (na_class=, mr_hdl=0x7ffe95704398, size=1050672) at /home/aaoganez/github/liwei/cart/_build.external-Linux/mercury/src/na/na_ofi.c:2360
#15 na_ofi_mem_pool_create (block_size=4096, block_count=256, na_class=0x187c600) at /home/aaoganez/github/liwei/cart/_build.external-Linux/mercury/src/na/na_ofi.c:2311
#16 na_ofi_mem_pool_alloc (mr_hdl=, size=4096, na_class=0x187c600) at /home/aaoganez/github/liwei/cart/_build.external-Linux/mercury/src/na/na_ofi.c:2415
#17 na_ofi_msg_buf_alloc (na_class=0x187c600, size=4096, plugin_data=0x3033b60) at /home/aaoganez/github/liwei/cart/_build.external-Linux/mercury/src/na/na_ofi.c:3649
#18 0x00007f65f4f2c012 in hg_core_alloc_na (use_sm=, hg_core_handle=0x3033a50) at /home/aaoganez/github/liwei/cart/_build.external-Linux/mercury/src/mercury_core.c:1708
#19 hg_core_create (context=context@entry=0x302f6d0, use_sm=) at /home/aaoganez/github/liwei/cart/_build.external-Linux/mercury/src/mercury_core.c:1624
#20 0x00007f65f4f2d9d8 in hg_core_context_post (use_sm=, repost=, request_count=, context=)
at /home/aaoganez/github/liwei/cart/_build.external-Linux/mercury/src/mercury_core.c:2747
#21 HG_Core_context_post (context=0x302f6d0, request_count=request_count@entry=256, repost=repost@entry=1 '\001')
at /home/aaoganez/github/liwei/cart/_build.external-Linux/mercury/src/mercury_core.c:3832
#22 0x00007f65f4f24f3a in HG_Context_create_id (hg_class=, id=id@entry=0 '\000') at /home/aaoganez/github/liwei/cart/_build.external-Linux/mercury/src/mercury.c:1082
#23 0x00007f65f58c6217 in crt_hg_ctx_init (hg_ctx=hg_ctx@entry=0x302a518, idx=0) at src/cart/crt_hg.c:651
#24 0x00007f65f5892297 in crt_context_create (crt_ctx=crt_ctx@entry=0x7ffe95704750) at src/cart/crt_context.c:239
#25 0x000000000040147f in get_self_uri (h=0x2c0c010) at src/crt_launch/crt_launch.c:163
#26 main (argc=8, argv=0x7ffe95704968) at src/crt_launch/crt_launch.c:306

The text was updated successfully, but these errors were encountered:

shefty · 2020-03-07T00:22:33Z

I believe I see the issue, and I'll work on a fix.

When a build a new cache entry (via util_mr_cache_create), we allocate memory and register the region with the underlying provider. This can result in the generation of monitor notifications, for example, intercepting the alloc calls. Because the notifications will acquire the cache lock in order to flush unusable entries, we cannot hold that same lock while building the entry, or deadlock can occur. This has been seen by applications. See issue ofiwg#5687. To handle this, we build new cache entries outside of the lock, and only acquire the lock when inserting them back into the cache. This opens a race condition where a conflicting entry can be inserted into the cache between the first find() call and the insert() call. We expect such occurences to be rare, as it requires a multi-threaded app to post transfers referencing the same region simultaneously from multiple threads. In order to handle the race, we need to duplicate the find() check after building the new entry prior to inserting it. If a conflict is found, we abort the insertion and restart the entire higher-level search operation. Signed-off-by: Sean Hefty <[email protected]>

shefty · 2020-03-13T20:57:57Z

PR #5729 is intended to address this issue. Testing has not been completed on those changes yet.

When a build a new cache entry (via util_mr_cache_create), we allocate memory and register the region with the underlying provider. This can result in the generation of monitor notifications, for example, intercepting the alloc calls. Because the notifications will acquire the cache lock in order to flush unusable entries, we cannot hold that same lock while building the entry, or deadlock can occur. This has been seen by applications. See issue ofiwg#5687. To handle this, we build new cache entries outside of the lock, and only acquire the lock when inserting them back into the cache. This opens a race condition where a conflicting entry can be inserted into the cache between the first find() call and the insert() call. We expect such occurences to be rare, as it requires a multi-threaded app to post transfers referencing the same region simultaneously from multiple threads. In order to handle the race, we need to duplicate the find() check after building the new entry prior to inserting it. If a conflict is found, we abort the insertion and restart the entire higher-level search operation. Signed-off-by: Sean Hefty <[email protected]>

shefty · 2020-03-16T18:20:33Z

Can you see if the changes in PR #5729 fix the problem for you?

When a build a new cache entry (via util_mr_cache_create), we allocate memory and register the region with the underlying provider. This can result in the generation of monitor notifications, for example, intercepting the alloc calls. Because the notifications will acquire the cache lock in order to flush unusable entries, we cannot hold that same lock while building the entry, or deadlock can occur. This has been seen by applications. See issue ofiwg#5687. To handle this, we build new cache entries outside of the lock, and only acquire the lock when inserting them back into the cache. This opens a race condition where a conflicting entry can be inserted into the cache between the first find() call and the insert() call. We expect such occurences to be rare, as it requires a multi-threaded app to post transfers referencing the same region simultaneously from multiple threads. In order to handle the race, we need to duplicate the find() check after building the new entry prior to inserting it. If a conflict is found, we abort the insertion and restart the entire higher-level search operation. Signed-off-by: Sean Hefty <[email protected]>

frostedcmos · 2020-03-17T01:13:34Z

Tried this on CaRT iv_test suite using verbs - passed all tests locally.

shefty mentioned this issue Mar 13, 2020

mr_cache: Do not hold monitor lock when allocating or registering memory #5729

Merged

shefty closed this as completed Mar 17, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Verbs hangs in ibv_reg_mr() / fi mr caching issue #5687

Verbs hangs in ibv_reg_mr() / fi mr caching issue #5687

frostedcmos commented Feb 28, 2020

shefty commented Mar 7, 2020

shefty commented Mar 13, 2020

shefty commented Mar 16, 2020

frostedcmos commented Mar 17, 2020

Verbs hangs in ibv_reg_mr() / fi mr caching issue #5687

Verbs hangs in ibv_reg_mr() / fi mr caching issue #5687

Comments

frostedcmos commented Feb 28, 2020

shefty commented Mar 7, 2020

shefty commented Mar 13, 2020

shefty commented Mar 16, 2020

frostedcmos commented Mar 17, 2020