-
Notifications
You must be signed in to change notification settings - Fork 285
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
set $UCX_TLS to 'all' for impi installed on top of UCX #2253
Conversation
Test report by @lexming Overview of tested easyconfigs (in order)
Build succeeded for 1 out of 1 (1 easyconfigs in total) |
# since impi v2019.8, the MLX provider works without UCX_TLS, but setting it does not hurt | ||
ucx_root = get_software_root('UCX') | ||
if ucx_root: | ||
txt += self.module_generator.set_environment('UCX_TLS', 'all') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are you sure that all versions of UCX up until 1.9.0 knows about "all" ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
also what is the difference between UCX_TLS=all and not setting UCX_TLS? Does UCX_TLS=all include more transports?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes, added ages ago with openucx/ucx#104
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
also what is the difference between UCX_TLS=all and not setting UCX_TLS? Does UCX_TLS=all include more transports?
There is no difference, in both cases UCX will consider all available TLS and choose the best one. So explicitly setting UCX_TLS=all
will not change the behaviour of systems that were already working well without it.
Test report by @lexming Overview of tested easyconfigs (in order)
Build succeeded for 1 out of 1 (1 easyconfigs in total) |
For reference, previous test is in a machine without a Mellanox HCA (so no MLX provider). The same test with current |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
Going in, thanks @lexming! |
Test report by @lexming Overview of tested easyconfigs (in order)
Build succeeded for 1 out of 1 (1 easyconfigs in total) |
Fixes issue easybuilders/easybuild-easyconfigs#10899
Intel added the MLX provider in
impi
v2019.5, which supports Mellanox HCAs and requires UCX. See: https://software.intel.com/en-us/articles/improve-performance-and-stability-with-intel-mpi-library-on-infinibandIn v2019.6, Intel MPI seems to be internally using something similar to
UCX_TLS=dc,ud,rc,sm,self
. This means thatimpi
will only work by default (without settingUCX_TLS
) on Mellanox ConnectX-5 and newer HCAs. Everything else fails due to the lack of thedc
TLS and requires explicitly settingUCX_TLS
with the available transports. This is actually the workaround instructed by Intel in the aforementioned link.In v2019.8,
impi
got some improvements in this regard and with it the MLX provider can determine the available transports in UCX on its own. So it works by default (without settingUCX_TLS
) in all hardware configurations as long as the communication is done with MLX, so with Mellanox cards. However, it still fails with everything else.The most effective solution is to explicitly set
UCX_TLS=all
, this will leave the choice to UCX to choose the best available transport on its own and avoids errors withimpi
in all those systems that do not work by default. Explicitly settingUCX_TLS=all
will not change the behaviour of systems that were already working well without it.The only alternative to this solution would be to not install UCX with
impi
in systems without Mellanox HCAs, but that option cannot be provided from EB.