-
Notifications
You must be signed in to change notification settings - Fork 220
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
add support for using Intel MPI(2019.7) and MVAPICH2 #283
Conversation
+ local minikube test pass + add new Spec "mpiDistribution" @ 2020/7/27
* change email address
Hi @milkybird98. Thanks for your PR. I'm waiting for a kubeflow member to verify that this patch is reasonable to test. If it is, they should reply with Once the patch is verified, the new status will be reflected by the I understand the commands that are listed here. Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll leave this open for others to comment as well. It might be better to think about how to make this framework-agnostic rather than having to maintain the logic for various frameworks.
pkg/apis/kubeflow/v1alpha2/types.go
Outdated
|
||
// MPIDistribution specifies name of the mpi framwork which is used | ||
// Deafults to "OpenMPI" | ||
// Option includes "OpenMPI", "IntelMPI" and "MPICH" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Option -> Options
Perhaps make them all lowercase and separate words with underscores?
pkg/apis/kubeflow/v1alpha2/types.go
Outdated
@@ -72,6 +72,11 @@ type MPIJobSpec struct { | |||
// active. The policies specified in `RunPolicy` take precedence over | |||
// the following fields: `BackoffLimit` and `ActiveDeadlineSeconds`. | |||
RunPolicy *common.RunPolicy `json:"runPolicy,omitempty"` | |||
|
|||
// MPIDistribution specifies name of the mpi framwork which is used |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
mpi -> MPI
pkg/apis/kubeflow/v1alpha2/types.go
Outdated
@@ -72,6 +72,11 @@ type MPIJobSpec struct { | |||
// active. The policies specified in `RunPolicy` take precedence over | |||
// the following fields: `BackoffLimit` and `ActiveDeadlineSeconds`. | |||
RunPolicy *common.RunPolicy `json:"runPolicy,omitempty"` | |||
|
|||
// MPIDistribution specifies name of the mpi framwork which is used | |||
// Deafults to "OpenMPI" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Defaults
kubexec = fmt.Sprintf(`#!/bin/sh | ||
set -x | ||
POD_NAME=$3 | ||
shift 3 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you add some notes in the code on the differences?
* update notes about hostfile generating
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for your contribution! 🎉 👍
pkg/apis/kubeflow/v1alpha2/types.go
Outdated
// MPIDistribution specifies name of the MPI framwork which is used | ||
// Defaults to "open_mpi" | ||
// Options includes "open_mpi", "intel_mpi" and "mpich" | ||
MPIDistribution string `json:"mpiDistribution,omitempty"` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we define it as a type instead of a string? (PS, personally, prefer OpenMPI, IntelMPI and MPICH here)
type MPIDistribution string
const (
MPIDistributionOpenMPI = "openMPI"
...
)
I'm sorry that, because for the convenience of generating docker images, I used two separate directories to generate mpi-operator and kebuctl-delivery images. |
+ move hosts sending and merging here * use special type instaed of string
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@gaocegege: GitHub didn't allow me to assign the following users: zw0610. Note that only kubeflow members, repo collaborators and people who have commented on this issue/PR can be assigned. Additionally, issues/PRs can only have 10 assignees at the same time. In response to this: Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
/ok-to-test |
@milkybird98 Could you please add some unit test to make coverage/coveralls happy?
|
No problem. I'll take care of it this weekend. |
/lgtm |
/lgtm |
/approve |
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: terrytangyuan The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
Ther major differences in the launching process between the ompi, intel-mpi and mvapich are:
For intel-mpi and mvapich, hydra process manager is used, instead of orted. However, there's no major difference in the ssh/rsh command that launches a remote process manager, with the only thing needed caring is the position of remote pod name arugment.
(Unfortunately, it seems that in the intel-mpi@2020 version, the ssh/rsh command has changed again and the prefix arguments "-q -x" have been abandoned. It's such a mess.)
As for the the envrionment variable name, both intel-mpi and mvapich have the samily functional one.
And as I have not found a effective way to auto-detect the mpi distribution, so a new argument is added to the api.
there're some docker images that I build and use to test the code.
https://hub.docker.com/repository/docker/milkybird98/ompi-osu
https://hub.docker.com/repository/docker/milkybird98/intel-mpi-osu
https://hub.docker.com/repository/docker/milkybird98/mvapich2-osu