-
Notifications
You must be signed in to change notification settings - Fork 865
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Revert "btl/openib: disable XRC in OpenIB BTL" #4082
Conversation
This reverts commit c22a7c7.
This is a PR to verify current XRC status for v2.x branch |
@hppritcha @jsquyres - it seems like only v2.x is affected but this problem. |
I've started playing with this and I found the following:
|
Ok, the reason why |
@jladd-mlnx @vspetrov @bureddy @jsquyres @hppritcha @hjelmn |
P.S. I was able to reproduce it manually when I fixed my config script. |
Hi, The bug is definitely inside XRC support in btl openib. I’ve modified the hello_c.c test by adding
And now the issue is reproduced w/o hcoll:
No XRC:
All the time when the issue is observed the MPI processes are getting SIGCONT signal that for some reason stops them (can be checked with either GDB or strace). |
So is there an XRC problem in both hcoll and openib? |
Please conduct all followup discussions on the umbrella issue for all the "revert the disable-XRC patch" PRs: #4087. |
This reverts commit c22a7c7.