Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Verbs hangs in ibv_reg_mr() / fi mr caching issue #5687

Closed
frostedcmos opened this issue Feb 28, 2020 · 4 comments
Closed

Verbs hangs in ibv_reg_mr() / fi mr caching issue #5687

frostedcmos opened this issue Feb 28, 2020 · 4 comments

Comments

@frostedcmos
Copy link

On CaRT project we've tried to update to ba597c9 of OFI and are now hitting a problem where ofi+verbs;ofi_rxm seems to be hanging at init time.

This has been reproduced on few systems with different server tests.

The same tests running with "FI_MR_CACHE_MAX_COUNT=0" envariable set, passes this particular hang.

Attaching to server via gdp -p / backtrace shows:
(gdb) bt
#0 0x00007f65f544951d in __lll_lock_wait () from /lib64/libpthread.so.0
#1 0x00007f65f5444e1b in _L_lock_812 () from /lib64/libpthread.so.0
#2 0x00007f65f5444ce8 in pthread_mutex_lock () from /lib64/libpthread.so.0
#3 0x00007f65f305fd98 in ofi_intercept_handler () from /home/aaoganez/github/liwei/cart/install/Linux/lib/libfabric.so.1
#4 0x00007f65f305fea2 in ofi_intercept_madvise () from /home/aaoganez/github/liwei/cart/install/Linux/lib/libfabric.so.1
#5 0x00007f65f27f759c in ibv_madvise_range.part.5 () from /lib64/libibverbs.so.1
#6 0x00007f65f27f8fd2 in ibv_reg_mr () from /lib64/libibverbs.so.1
#7 0x00007f65f3087849 in vrb_mr_cache_add_region () from /home/aaoganez/github/liwei/cart/install/Linux/lib/libfabric.so.1
#8 0x00007f65f30606a9 in util_mr_cache_create.isra.5 () from /home/aaoganez/github/liwei/cart/install/Linux/lib/libfabric.so.1
#9 0x00007f65f3060a3f in ofi_mr_cache_search () from /home/aaoganez/github/liwei/cart/install/Linux/lib/libfabric.so.1
#10 0x00007f65f30875d6 in vrb_mr_cache_reg () from /home/aaoganez/github/liwei/cart/install/Linux/lib/libfabric.so.1
#11 0x00007f65f309512e in rxm_mr_regv () from /home/aaoganez/github/liwei/cart/install/Linux/lib/libfabric.so.1
#12 0x00007f65f309525a in rxm_mr_reg () from /home/aaoganez/github/liwei/cart/install/Linux/lib/libfabric.so.1
#13 0x00007f65f4d0cdd4 in fi_mr_reg (context=0x0, mr=0x7ffe95704398, flags=0, requested_key=0, offset=0, acs=16128, len=1050672, buf=0x3034000, domain=)
at /home/aaoganez/github/liwei/cart/install/Linux/include/rdma/fi_domain.h:328
#14 na_ofi_mem_alloc (na_class=, mr_hdl=0x7ffe95704398, size=1050672) at /home/aaoganez/github/liwei/cart/_build.external-Linux/mercury/src/na/na_ofi.c:2360
#15 na_ofi_mem_pool_create (block_size=4096, block_count=256, na_class=0x187c600) at /home/aaoganez/github/liwei/cart/_build.external-Linux/mercury/src/na/na_ofi.c:2311
#16 na_ofi_mem_pool_alloc (mr_hdl=, size=4096, na_class=0x187c600) at /home/aaoganez/github/liwei/cart/_build.external-Linux/mercury/src/na/na_ofi.c:2415
#17 na_ofi_msg_buf_alloc (na_class=0x187c600, size=4096, plugin_data=0x3033b60) at /home/aaoganez/github/liwei/cart/_build.external-Linux/mercury/src/na/na_ofi.c:3649
#18 0x00007f65f4f2c012 in hg_core_alloc_na (use_sm=, hg_core_handle=0x3033a50) at /home/aaoganez/github/liwei/cart/_build.external-Linux/mercury/src/mercury_core.c:1708
#19 hg_core_create (context=context@entry=0x302f6d0, use_sm=) at /home/aaoganez/github/liwei/cart/_build.external-Linux/mercury/src/mercury_core.c:1624
#20 0x00007f65f4f2d9d8 in hg_core_context_post (use_sm=, repost=, request_count=, context=)
at /home/aaoganez/github/liwei/cart/_build.external-Linux/mercury/src/mercury_core.c:2747
#21 HG_Core_context_post (context=0x302f6d0, request_count=request_count@entry=256, repost=repost@entry=1 '\001')
at /home/aaoganez/github/liwei/cart/_build.external-Linux/mercury/src/mercury_core.c:3832
#22 0x00007f65f4f24f3a in HG_Context_create_id (hg_class=, id=id@entry=0 '\000') at /home/aaoganez/github/liwei/cart/_build.external-Linux/mercury/src/mercury.c:1082
#23 0x00007f65f58c6217 in crt_hg_ctx_init (hg_ctx=hg_ctx@entry=0x302a518, idx=0) at src/cart/crt_hg.c:651
#24 0x00007f65f5892297 in crt_context_create (crt_ctx=crt_ctx@entry=0x7ffe95704750) at src/cart/crt_context.c:239
#25 0x000000000040147f in get_self_uri (h=0x2c0c010) at src/crt_launch/crt_launch.c:163
#26 main (argc=8, argv=0x7ffe95704968) at src/crt_launch/crt_launch.c:306

@shefty
Copy link
Member

shefty commented Mar 7, 2020

I believe I see the issue, and I'll work on a fix.

shefty added a commit to shefty/libfabric that referenced this issue Mar 13, 2020
When a build a new cache entry (via util_mr_cache_create), we
allocate memory and register the region with the underlying
provider.  This can result in the generation of monitor notifications,
for example, intercepting the alloc calls.  Because the notifications
will acquire the cache lock in order to flush unusable entries, we
cannot hold that same lock while building the entry, or deadlock can
occur.

This has been seen by applications.  See issue ofiwg#5687.

To handle this, we build new cache entries outside of the lock, and
only acquire the lock when inserting them back into the cache. 
This opens a race condition where a conflicting entry can be inserted
into the cache between the first find() call and the insert() call.
We expect such occurences to be rare, as it requires a multi-threaded
app to post transfers referencing the same region simultaneously from
multiple threads.

In order to handle the race, we need to duplicate the find() check
after building the new entry prior to inserting it.  If a conflict
is found, we abort the insertion and restart the entire higher-level
search operation.

Signed-off-by: Sean Hefty <[email protected]>
shefty added a commit to shefty/libfabric that referenced this issue Mar 13, 2020
When a build a new cache entry (via util_mr_cache_create), we
allocate memory and register the region with the underlying
provider.  This can result in the generation of monitor notifications,
for example, intercepting the alloc calls.  Because the notifications
will acquire the cache lock in order to flush unusable entries, we
cannot hold that same lock while building the entry, or deadlock can
occur.

This has been seen by applications.  See issue ofiwg#5687.

To handle this, we build new cache entries outside of the lock, and
only acquire the lock when inserting them back into the cache. 
This opens a race condition where a conflicting entry can be inserted
into the cache between the first find() call and the insert() call.
We expect such occurences to be rare, as it requires a multi-threaded
app to post transfers referencing the same region simultaneously from
multiple threads.

In order to handle the race, we need to duplicate the find() check
after building the new entry prior to inserting it.  If a conflict
is found, we abort the insertion and restart the entire higher-level
search operation.

Signed-off-by: Sean Hefty <[email protected]>
shefty added a commit to shefty/libfabric that referenced this issue Mar 13, 2020
When a build a new cache entry (via util_mr_cache_create), we
allocate memory and register the region with the underlying
provider.  This can result in the generation of monitor notifications,
for example, intercepting the alloc calls.  Because the notifications
will acquire the cache lock in order to flush unusable entries, we
cannot hold that same lock while building the entry, or deadlock can
occur.

This has been seen by applications.  See issue ofiwg#5687.

To handle this, we build new cache entries outside of the lock, and
only acquire the lock when inserting them back into the cache. 
This opens a race condition where a conflicting entry can be inserted
into the cache between the first find() call and the insert() call.
We expect such occurences to be rare, as it requires a multi-threaded
app to post transfers referencing the same region simultaneously from
multiple threads.

In order to handle the race, we need to duplicate the find() check
after building the new entry prior to inserting it.  If a conflict
is found, we abort the insertion and restart the entire higher-level
search operation.

Signed-off-by: Sean Hefty <[email protected]>
@shefty
Copy link
Member

shefty commented Mar 13, 2020

PR #5729 is intended to address this issue. Testing has not been completed on those changes yet.

shefty added a commit to shefty/libfabric that referenced this issue Mar 16, 2020
When a build a new cache entry (via util_mr_cache_create), we
allocate memory and register the region with the underlying
provider.  This can result in the generation of monitor notifications,
for example, intercepting the alloc calls.  Because the notifications
will acquire the cache lock in order to flush unusable entries, we
cannot hold that same lock while building the entry, or deadlock can
occur.

This has been seen by applications.  See issue ofiwg#5687.

To handle this, we build new cache entries outside of the lock, and
only acquire the lock when inserting them back into the cache. 
This opens a race condition where a conflicting entry can be inserted
into the cache between the first find() call and the insert() call.
We expect such occurences to be rare, as it requires a multi-threaded
app to post transfers referencing the same region simultaneously from
multiple threads.

In order to handle the race, we need to duplicate the find() check
after building the new entry prior to inserting it.  If a conflict
is found, we abort the insertion and restart the entire higher-level
search operation.

Signed-off-by: Sean Hefty <[email protected]>
@shefty
Copy link
Member

shefty commented Mar 16, 2020

Can you see if the changes in PR #5729 fix the problem for you?

shefty added a commit to shefty/libfabric that referenced this issue Mar 16, 2020
When a build a new cache entry (via util_mr_cache_create), we
allocate memory and register the region with the underlying
provider.  This can result in the generation of monitor notifications,
for example, intercepting the alloc calls.  Because the notifications
will acquire the cache lock in order to flush unusable entries, we
cannot hold that same lock while building the entry, or deadlock can
occur.

This has been seen by applications.  See issue ofiwg#5687.

To handle this, we build new cache entries outside of the lock, and
only acquire the lock when inserting them back into the cache. 
This opens a race condition where a conflicting entry can be inserted
into the cache between the first find() call and the insert() call.
We expect such occurences to be rare, as it requires a multi-threaded
app to post transfers referencing the same region simultaneously from
multiple threads.

In order to handle the race, we need to duplicate the find() check
after building the new entry prior to inserting it.  If a conflict
is found, we abort the insertion and restart the entire higher-level
search operation.

Signed-off-by: Sean Hefty <[email protected]>
@frostedcmos
Copy link
Author

Tried this on CaRT iv_test suite using verbs - passed all tests locally.

@shefty shefty closed this as completed Mar 17, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants