-
Notifications
You must be signed in to change notification settings - Fork 9
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Check for zero objects on a MPI rank/node case after LB runs #1217
Labels
Comments
cz4rs
added a commit
that referenced
this issue
Jan 18, 2021
cz4rs
added a commit
that referenced
this issue
Jan 25, 2021
cz4rs
added a commit
that referenced
this issue
Jan 26, 2021
cz4rs
added a commit
that referenced
this issue
Jan 26, 2021
cz4rs
added a commit
that referenced
this issue
Jan 26, 2021
cz4rs
added a commit
that referenced
this issue
Jan 26, 2021
cz4rs
added a commit
that referenced
this issue
Jan 26, 2021
cz4rs
added a commit
that referenced
this issue
Jan 26, 2021
cz4rs
added a commit
that referenced
this issue
Jan 26, 2021
cz4rs
added a commit
that referenced
this issue
Jan 27, 2021
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
What Needs to be Done?
After the load balancer runs, it's possible it generates a distributed where a rank/node has zero objects. That configuration might cause VT to hang (which it in the process of being fixed), but also may not be supported for some applications.
We are seeing a hang in EMPIRE after LB that might be that case, but it's impossible to know.
For now, let's check for that case after migrations and print out if that's the case. We could also reject migrations that cause that case to happen.
The text was updated successfully, but these errors were encountered: