-
Notifications
You must be signed in to change notification settings - Fork 126
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ci: Set openmpi and openmp parameters to make an efficient use of the CI #3908
ci: Set openmpi and openmp parameters to make an efficient use of the CI #3908
Conversation
f920f4e
to
5401f7c
Compare
Is this going to make the testing (and thus the CI) slower? Do we really want to keep the BP3 testing? Could we maybe separate it to only test BP3 in the nightly builds? |
I'm curious if this really fixes it and why. This isn't big data. If one of the compressors is taking more than 2 minutes to do something with it, that seems like a non-viable compressor. |
5401f7c
to
653606e
Compare
It is true, some of the instances of those tests takes <10s and others about 3 minutes. I was able to replicate this in my machine and I found that this only happens when we use same or more mpi tasks as cores available. Also here is the Backtrace which seems to show that the MPI tasks are not synchronized, some of them are waiting at MPI_GATHER and some others in MPI_BARRIER. I wonder if it can this be that mgard is calling this MPI routines in its own? @eisenhauer
|
e6b61f5
to
eba481e
Compare
61e2022
to
452e541
Compare
I found that this problem occurs on my laptop when connected to a ethernet network but not when connected to the wifi network. I could figure out that this is due to openmpi picking certain network interfaces, a easy workaround is to use the lo interface |
Well, maybe... But why only this test? MGard doesn't use MPI, but it does use OpenMP. I build MGard manually on my OSX laptop and these tests have no problems. I'm wondering if on whatever platform this is running OpenMP isn't playing well with MPI, resulting in the issues that you're seeing. (I could use the docker image and dig further, but I'm off at a conference, so little spare time for the moment.) |
e6f9790
to
77729e7
Compare
hmm, so two things happened here:
|
24f4610
to
4cbf92e
Compare
4cbf92e
to
954ecca
Compare
@scottwittenburg if this makes sense to you please approve so that it gets merged and we get a green dashboard 😎 |
954ecca
to
92ebfed
Compare
Yay, it's all green again :) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good.
I've rebased on this change, and now my ci reproducer builds are happening in serial, which seems like an unintended consequence. |
fixes: #3897