Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

icb_parpack_cpp test is failing on ppc64el and others archs #128

Open
sylvestre opened this issue Jun 23, 2018 · 15 comments
Open

icb_parpack_cpp test is failing on ppc64el and others archs #128

sylvestre opened this issue Jun 23, 2018 · 15 comments

Comments

@sylvestre
Copy link
Contributor

https://buildd.debian.org/status/fetch.php?pkg=arpack&arch=ppc64el&ver=3.6.1-1&stamp=1529756139&raw=0

rank 0 - 1000.15 1000.15
Correct eigenvalues not computed
--------------------------------------------------------------------------
MPI_ABORT was invoked on rank 0 in communicator MPI_COMM_WORLD
with errorcode 1.

NOTE: invoking MPI_ABORT causes Open MPI to kill all MPI processes.
You may or may not see output from other processes, depending on
exactly when Open MPI kills them.
--------------------------------------------------------------------------
FAIL icb_parpack_cpp (exit status: 1)
@sylvestre
Copy link
Contributor Author

@fghoussen Does it ring a bell?
we could just disable mpi on this arch...

@fghoussen
Copy link
Collaborator

Not really: soundless bell ! In build dependencies, I see libopenmpi3: could you try with openmpi2 ?

@fghoussen
Copy link
Collaborator

If switching from openmpi2 to openmpi3 is OK on ppc64el (?), try it also on armel

@sylvestre sylvestre changed the title icb_parpack_cpp test is failing on ppc64el icb_parpack_cpp test is failing on ppc64el and others archs Jun 24, 2018
@sylvestre
Copy link
Contributor Author

other archs are indeed failing:
https://buildd.debian.org/status/package.php?p=arpack

@fghoussen
Copy link
Collaborator

If libopenmpi3 is not to blame (?), then C++11 could be to blame....

I've just seen in the (armel) log (at the end) :

libtool: link: mpic++ -std=gnu++11 ...
E: Build killed with signal TERM after 150 minutes of inactivity

The build seems to hang. Looks like the compiler goes crazy because he doesn't know what to do with the (hard-coded) gnu++11 ?!... Really not sure (what is the compiler behind ?). I have no good answer here. I PR a few lines: kill them if they don't go the way you like

@10110111
Copy link
Contributor

Looks more like a test hanging than the compiler disliking gnu++11. The compiler seems to have finished by the time the timeout message is printed.

@fghoussen
Copy link
Collaborator

Yeah maybe. Can you have more verbose logs ? Or compile with make VERBOSE=1 ?

@10110111
Copy link
Contributor

I guess not. But from the log it seems that the first test in the MPI directory hangs. I suppose only adding some prints to the test would help debugging.

@dbeurle
Copy link
Contributor

dbeurle commented Jun 25, 2018

I'm not entirely sure about this, but in the CMake it links against the CXX and the C libraries for MPI. Is this likely a problem?

Other possibility is a dead lock :/

@fghoussen
Copy link
Collaborator

Just realized hang is after issue46 test... Which is not an icb one ?! No idea what's going on

@fghoussen
Copy link
Collaborator

@sylvestre: I guess the problem on armel/ppc64 may be fixed by 9742da7 !....

@fghoussen
Copy link
Collaborator

@sylvestre : if you're about to publish a new release, ckeckout if this issue is (should be !) fixed

@fghoussen
Copy link
Collaborator

May be fixed by f36eb6c

@fghoussen
Copy link
Collaborator

May be fixed by f36eb6c

@sylvestre: if #397 is released would be good to check if this problem is solved

@sylvestre
Copy link
Contributor Author

yeah, i will upload it once 3.9.1 is tagged :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants