Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add support for using Intel MPI(2019.7) and MVAPICH2 #283

Merged
merged 15 commits into from
Aug 3, 2020

Conversation

milkybird98
Copy link
Contributor

Ther major differences in the launching process between the ompi, intel-mpi and mvapich are:

  • the envrionment variable name
  • the ssh/rsh launching command and hostfile format

For intel-mpi and mvapich, hydra process manager is used, instead of orted. However, there's no major difference in the ssh/rsh command that launches a remote process manager, with the only thing needed caring is the position of remote pod name arugment.
(Unfortunately, it seems that in the intel-mpi@2020 version, the ssh/rsh command has changed again and the prefix arguments "-q -x" have been abandoned. It's such a mess.)

As for the the envrionment variable name, both intel-mpi and mvapich have the samily functional one.

And as I have not found a effective way to auto-detect the mpi distribution, so a new argument is added to the api.

there're some docker images that I build and use to test the code.
https://hub.docker.com/repository/docker/milkybird98/ompi-osu
https://hub.docker.com/repository/docker/milkybird98/intel-mpi-osu
https://hub.docker.com/repository/docker/milkybird98/mvapich2-osu

milkybird98 and others added 3 commits July 27, 2020 12:25
+ local minikube test pass
+ add new Spec "mpiDistribution"
@ 2020/7/27
* change email address
@kubeflow-bot
Copy link

This change is Reviewable

@k8s-ci-robot
Copy link

Hi @milkybird98. Thanks for your PR.

I'm waiting for a kubeflow member to verify that this patch is reasonable to test. If it is, they should reply with /ok-to-test on its own line. Until that is done, I will not automatically test new commits in this PR, but the usual testing commands by org members will still work. Regular contributors should join the org to skip this step.

Once the patch is verified, the new status will be reflected by the ok-to-test label.

I understand the commands that are listed here.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

Copy link
Member

@terrytangyuan terrytangyuan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll leave this open for others to comment as well. It might be better to think about how to make this framework-agnostic rather than having to maintain the logic for various frameworks.


// MPIDistribution specifies name of the mpi framwork which is used
// Deafults to "OpenMPI"
// Option includes "OpenMPI", "IntelMPI" and "MPICH"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Option -> Options

Perhaps make them all lowercase and separate words with underscores?

@@ -72,6 +72,11 @@ type MPIJobSpec struct {
// active. The policies specified in `RunPolicy` take precedence over
// the following fields: `BackoffLimit` and `ActiveDeadlineSeconds`.
RunPolicy *common.RunPolicy `json:"runPolicy,omitempty"`

// MPIDistribution specifies name of the mpi framwork which is used
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

mpi -> MPI

@@ -72,6 +72,11 @@ type MPIJobSpec struct {
// active. The policies specified in `RunPolicy` take precedence over
// the following fields: `BackoffLimit` and `ActiveDeadlineSeconds`.
RunPolicy *common.RunPolicy `json:"runPolicy,omitempty"`

// MPIDistribution specifies name of the mpi framwork which is used
// Deafults to "OpenMPI"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Defaults

kubexec = fmt.Sprintf(`#!/bin/sh
set -x
POD_NAME=$3
shift 3
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you add some notes in the code on the differences?

* update notes about hostfile generating
Copy link
Member

@gaocegege gaocegege left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for your contribution! 🎉 👍

// MPIDistribution specifies name of the MPI framwork which is used
// Defaults to "open_mpi"
// Options includes "open_mpi", "intel_mpi" and "mpich"
MPIDistribution string `json:"mpiDistribution,omitempty"`
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we define it as a type instead of a string? (PS, personally, prefer OpenMPI, IntelMPI and MPICH here)

type MPIDistribution string

const (
    MPIDistributionOpenMPI = "openMPI"
...
)

@milkybird98
Copy link
Contributor Author

I'm sorry that, because for the convenience of generating docker images, I used two separate directories to generate mpi-operator and kebuctl-delivery images.
I found that I did not commit the changed code used to generate the kebuctl-delivery image.
And some functions have been moved to the mpi-operator controller code in order to implement functions more efficiently.

+ move hosts sending and merging here
* use special type instaed of string
Copy link
Member

@gaocegege gaocegege left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/assign @carmark @zw0610

pkg/apis/kubeflow/v1alpha2/types.go Outdated Show resolved Hide resolved
pkg/apis/kubeflow/v1alpha2/types.go Show resolved Hide resolved
@k8s-ci-robot
Copy link

@gaocegege: GitHub didn't allow me to assign the following users: zw0610.

Note that only kubeflow members, repo collaborators and people who have commented on this issue/PR can be assigned. Additionally, issues/PRs can only have 10 assignees at the same time.
For more information please see the contributor guide

In response to this:

/assign @carmark @zw0610

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@gaocegege
Copy link
Member

/ok-to-test
/lgtm

@carmark
Copy link
Member

carmark commented Jul 31, 2020

@milkybird98 Could you please add some unit test to make coverage/coveralls happy?

coverage/coveralls — Coverage decreased (-4.04%) to 21.788%

@milkybird98
Copy link
Contributor Author

No problem. I'll take care of it this weekend.

@k8s-ci-robot k8s-ci-robot removed the lgtm label Aug 2, 2020
@gaocegege
Copy link
Member

/lgtm
/assign @terrytangyuan @carmark

@carmark
Copy link
Member

carmark commented Aug 3, 2020

/lgtm

@terrytangyuan
Copy link
Member

/approve

@k8s-ci-robot
Copy link

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: terrytangyuan

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants