Problem with multinode running with iimpi-2020a #10899

jhein32 · 2020-07-01T13:31:21Z

As mentioned in the Slack already, we have issues with MPI executables build against iimpi-2020a starting multinode. Within a node I am not aware of issues.

The problem seems associated with the UCX/1.8.0 dependency. Executables utilising iimpi/2020.00, which utilises Intel's MPI 19.6 without an UCX dependency work multinode. Also if I "massage" the easyconfig impi-2019.7.217-iccifort-2020.1.217.eb and comment the line

 ('UCX', '1.8.0'),

in the dependencies list, basic hello world codes or the HPL for intel/2020a will run. Though performance, when compared to an HPL build with intel/2017b is 10% poorer. Using the HPL from PR #10864 the performance is within 1% of the one from intel/2017b.

A few details on our cluster. The system is using Intel Xeon Xeon E5-2650 v3 (Haswell) and 4xFDR InfiniBand. We are using CentOS 7, currently 7.6 or 7.8, linux kernel 3.10, infiniband stuff from CentOS. Slurm is setup with cgroups for process control and accounting
(TaskPlugin=task/cgroup, ProctrackType=proctrack/cgroup ). The slurm is quite old slurm 17.02.

To get the Intel MPI started I add (in an editor)

setenv("I_MPI_PMI_LIBRARY", "/lib64/libpmi.so")

to the impi modules (we have versions as far back as iimpi/7.3.5, predating iimpi/2016b). I tested multiple times, but libpmi2.sodoes not work for us. From the methods to start an Intel MPI jobs, described in the slurm guide, only srun works for us. We never got hydra or MPD to work. I tested, setting 'I_MPI_HYDRA_TOPOLIB': 'ipl'does not help anything.

When running I load:

ml iccifort/2020.1.217 impi/2019.7.217

The modules are build with unmodified configs from EB 4.2.1. When compiling and running a simple MPI hello world code, I get the following in stdout:

[1593610029.017424] [au220:9811 :0]         select.c:433  UCX  ERROR no active messages transport to <no debug data>: po
six/memory - Destination is unreachable, sysv/memory - Destination is unreachable, self/memory - Destination is unreacha
ble, sockcm/sockaddr - no am bcopy, rdmacm/sockaddr - no am bcopy, cma/memory - no am bcopy
[1593610029.017913] [au219:19723:0]         select.c:433  UCX  ERROR no active messages transport to <no debug data>: po
six/memory - Destination is unreachable, sysv/memory - Destination is unreachable, self/memory - Destination is unreacha
ble, sockcm/sockaddr - no am bcopy, rdmacm/sockaddr - no am bcopy, cma/memory - no am bcopy

and this in stderr:

Abort(1091215) on node 0 (rank 0 in comm 0): Fatal error in PMPI_Init: Other MPI error, error stack:
MPIR_Init_thread(136)........: 
MPID_Init(904)...............: 
MPIDI_OFI_mpi_init_hook(1471): OFI get address vector map failed
In: PMI_Abort(1091215, Fatal error in PMPI_Init: Other MPI error, error stack:
MPIR_Init_thread(136)........: 
MPID_Init(904)...............: 
MPIDI_OFI_mpi_init_hook(1471): OFI get address vector map failed)
slurmstepd: error: *** STEP 4574138.0 ON au219 CANCELLED AT 2020-07-01T15:27:09 ***
srun: Job step aborted: Waiting up to 32 seconds for job step to finish.
Abort(1091215) on node 1 (rank 1 in comm 0): Fatal error in PMPI_Init: Other MPI error, error stack:
MPIR_Init_thread(136)........: 
MPID_Init(904)...............: 
MPIDI_OFI_mpi_init_hook(1471): OFI get address vector map failed
In: PMI_Abort(1091215, Fatal error in PMPI_Init: Other MPI error, error stack:
MPIR_Init_thread(136)........: 
MPID_Init(904)...............: 
MPIDI_OFI_mpi_init_hook(1471): OFI get address vector map failed)
srun: error: au219: task 0: Killed
srun: error: au220: task 1: Exited with exit code 143

Ok, that went long. Any suggestions would be highly appreciated.

The text was updated successfully, but these errors were encountered:

Micket · 2020-07-01T14:02:47Z

@jhein32 I have similar hardware. I'll see if i can repeat this when i'm back from vacation (during July)

boegel · 2020-07-01T18:30:09Z

@jhein32 We added UCX because it was recommended by Intel, see #10280 and https://software.intel.com/content/www/us/en/develop/articles/improve-performance-and-stability-with-intel-mpi-library-on-infiniband.html (where UCX is listed as required even).

Have you reported this to Intel support?

jhein32 · 2020-07-02T06:39:09Z

@boegel Thanks for getting involved. As written above, the performance of intel mpi 2019.7 without UCX is poor. So the decision to include UCX is correct. If you don't want UCX, in my current view Intel MPI 18.5 is the choice (which can't be your choice forever). From the error messages I am wondering whether the issues sits in UCX and not in Intel MPI. OFI is also mentioned. Who provides that? Intel MPI, UCX, CentOS? Anyone here any clues?

jhein32 · 2020-07-02T06:40:06Z

We haven't yet engaged with Intel.

jhein32 · 2020-07-02T14:04:33Z

Hi,

We (LUNAC team members) had a virtual 6-hands one keyboard sessions (via zoom thanks to Covid 19) and went over the error messages and the information available within the docs shared on here and within the easyconfigs.

Our hardware is a bit old (2015 or 2016), so we get:

-bash-4.2$ ucx_info -d | grep Transport
#   Transport: posix
#   Transport: sysv
#   Transport: self
#   Transport: tcp
#   Transport: tcp
#   Transport: tcp
#   Transport: tcp
#   Transport: rc_verbs
#   Transport: ud_verbs
#   Transport: cma

Intel writes, output should include: dc, rc, and ud transports. Our hardware lacks dc, which according to Intel is a common issue with older hardware. They recommend setting:

export UCX_TLS=rc,ud,sm,self

Intel calls it a workaround.
When we set this, we can run an MPI helloworld, if we unset this it fails again. I still need to do a performance test to see how it does when compared to older impi versions.

Assuming that goes well, here are two questions/tasks for EB:

Do we include an automatic check of ucx_info -d for the dc layer and set the variable?
If so, where should that go? UCX module or impi module?

bartoldeman · 2020-07-02T14:29:00Z

Yes "dc" is a little complex. It used to only work if you use MOFED, but now dc support is upstream and backported in newer CentOS (7.7 has it, not sure about 7.6, 7.5 and older definitely not). We have a cluster without dc as well running CentOS 7.8, will check it there later today (lspci | grep Mellanox reports

$ lspci | grep Mell
02:00.0 Network controller: Mellanox Technologies MT27500 Family [ConnectX-3]

there.

Setting UCX_TLS would be appropriate in the UCX module (as Open MPI uses UCX too and can't use dc either), ideally it should auto-detect the lack of dc though and not need that env var.

lexming · 2020-07-02T14:55:18Z

Setting UCX_TLS is documented by Intel in https://software.intel.com/content/www/us/en/develop/articles/improve-performance-and-stability-with-intel-mpi-library-on-infiniband.html

AFAIK disabling dc with UCX_TLS=rc,ud,sm,self is only necessary with ConnectX-4 and older. Since this setting depends on the capabilities of the hardware, defining anything in EB will cause issues one way or another. I recommend to not set anything on EB side.

For instance, we have nodes with ConnectX-4 and ConnectX-5. We only disable dc through UCX_TLS on the older ConnectX-4.

jhein32 · 2020-07-03T15:00:50Z

Hi, the performance test was ok.

We noticed the Intel MPI lib is happy with UCX_TLS being set to either dc,rc,ud,sm,selfor rc,ud,sm,self it is unhappy with the variable unset.

When it runs, we get a warning:
WARNING: release_mt library was used but no multi-ep feature was enabled. Please use release library instead.
which we still need to get at the bottom off.

I also looked at the foss/2020a Linpack, using the same UCX for OpenMPI. foss runs with UCX_TLS being any of unset, dc,rc,ud,sm,selfor rc,ud,sm,self. If dc is on the list it gives a warning. So this appears more clever than the Intel MPI.

Based on this, I feel the UCX_TLS should be set in the INTEL MPI module.

lexming · 2020-07-03T20:03:23Z

@jhein32 The warning message regarding release_mt has been recently fixed in easybuilders/easybuild-easyblocks#2080

Micket · 2020-07-08T14:59:35Z

I can just confirm that we see the same issues on our older cluster.
So, the options are

Setting UCX_TLS=rc,ud,sm,self in the UCX module.
Not do anything and rely on sysadmins to set this environment variable on their older machines by some other means.
(it does not belong in the intel MPI module)

Putting in something like this

# For systems with ConnectX4 (or older) interconnect, you need to disable "dc", else Intel MPI will try to use it and fail.
# Uncomment the line to disable dc:
# modextravars = {'UCX_TLS': 'rc,ud,sm,self'};

into the UCX config sounds about right to me.

jhein32 · 2020-08-10T09:54:10Z

This is still open - didn't get round to finish this off before the summer.

Based on my current understanding of the issue, I would like to add a comment as proposed by @Micket into a relevant config. However I feel that UCX module is not the correct place. To me it seems to be an Intel MPI issue. With the standard UCX module, as reported, the OpenMPI in foss seems to work well. I is only the Intel MPI that needs this kind of help. My proposal would be to amend the Intel MPI module.

In addition, when issues are encountered with Intel MPI, a user would look there first to look for hints. Until the UCX config is examined it would take some poking around.

Any opinions on the above?

lexming · 2020-09-25T14:17:05Z

@jhein32 can you test this again with the new version of IMPI in #11337. I think that you won't have any issue now. As far as I can tell, Intel has disabled the mlx provider and now verbs is used for ConnectX NCAs. So UCX is not used at all and then there is no need to set UCX_TLS.

boegel · 2020-10-09T09:11:22Z

@jhein32 How should we proceed with this?

lexming · 2020-10-09T09:56:27Z

I correct my previous statement. After further investigation, the mlx provider is still used by IMPI even though it is not presented by fi_info (see #11337 (comment)). However, setting UCX_TLS seems to not be strictly necessary any more. I tested it on systems with ConnectX-4 and ConnectX-5 and, in both cases, IMPI handles the appropriate transport seamlessly.

jhein32 · 2020-10-12T08:58:01Z

Hi,

I installed HPL and pre-requisites from PR #11337. I have massaged the setenv("I_MPI_PMI_LIBRARY", "/lib64/libpmi.so") by hacking into the impi module file, which I always have done for the Intel MPI modules for something like 5 years, so I am fine with that. Following this, I could run (without any further modification the hpl) on 2 nodes. I couldn't do that without massaging UCX with the intel 2020a.

So I am happy to proceed.

lexming · 2020-11-27T14:48:53Z

@jhein32 The impi easyblock has been updated to set UCX_TLS=all if UCX is in its dependency list (easybuilders/easybuild-easyblocks#2253). This fix works with all hardware configurations. Just be sure to reinstall impi from intel/2020a onwards with the updated easyblock. Thanks for reporting this issue.

boegel added the problem report label Jul 1, 2020

boegel added this to the 4.x milestone Jul 1, 2020

jhein32 mentioned this issue Oct 12, 2020

{toolchain} intel/2020b #11337

Merged

lexming mentioned this issue Nov 27, 2020

set $UCX_TLS to 'all' for impi installed on top of UCX easybuilders/easybuild-easyblocks#2253

Merged

Micket mentioned this issue Nov 27, 2020

{lang}[foss/2020b,intel/2020b] SciPy-bundle v2020.11 w/ Python 3.8.6 #11629

Merged

lexming closed this as completed Nov 27, 2020

lexming modified the milestones: 4.x, 4.3.2 (next release) Nov 27, 2020

branfosj mentioned this issue May 17, 2021

Add UCX dependency to OneAPI versions of impi #12873

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Problem with multinode running with iimpi-2020a #10899

Problem with multinode running with iimpi-2020a #10899

jhein32 commented Jul 1, 2020

Micket commented Jul 1, 2020

boegel commented Jul 1, 2020

jhein32 commented Jul 2, 2020

jhein32 commented Jul 2, 2020

jhein32 commented Jul 2, 2020

bartoldeman commented Jul 2, 2020

lexming commented Jul 2, 2020 •

edited

Loading

jhein32 commented Jul 3, 2020 •

edited

Loading

lexming commented Jul 3, 2020

Micket commented Jul 8, 2020

jhein32 commented Aug 10, 2020

lexming commented Sep 25, 2020 •

edited

Loading

boegel commented Oct 9, 2020

lexming commented Oct 9, 2020

jhein32 commented Oct 12, 2020 •

edited

Loading

lexming commented Nov 27, 2020

Problem with multinode running with iimpi-2020a #10899

Problem with multinode running with iimpi-2020a #10899

Comments

jhein32 commented Jul 1, 2020

Micket commented Jul 1, 2020

boegel commented Jul 1, 2020

jhein32 commented Jul 2, 2020

jhein32 commented Jul 2, 2020

jhein32 commented Jul 2, 2020

bartoldeman commented Jul 2, 2020

lexming commented Jul 2, 2020 • edited Loading

jhein32 commented Jul 3, 2020 • edited Loading

lexming commented Jul 3, 2020

Micket commented Jul 8, 2020

jhein32 commented Aug 10, 2020

lexming commented Sep 25, 2020 • edited Loading

boegel commented Oct 9, 2020

lexming commented Oct 9, 2020

jhein32 commented Oct 12, 2020 • edited Loading

lexming commented Nov 27, 2020

lexming commented Jul 2, 2020 •

edited

Loading

jhein32 commented Jul 3, 2020 •

edited

Loading

lexming commented Sep 25, 2020 •

edited

Loading

jhein32 commented Oct 12, 2020 •

edited

Loading