Skip to content
This repository has been archived by the owner on Oct 9, 2023. It is now read-only.

Added mpi backend plugin #196

Merged
merged 14 commits into from
Oct 18, 2021
Merged

Added mpi backend plugin #196

merged 14 commits into from
Oct 18, 2021

Conversation

yindia
Copy link
Contributor

@yindia yindia commented Aug 12, 2021

TL;DR

Implements native k8s backend plugin for Kubeflow MPI operator.

Type

  • Bug Fix
  • Feature
  • Plugin

Are all requirements met?

  • Code completed
  • Smoke tested
  • Unit tests added
  • Code documentation added
  • Any pending items have an associated Issue

Complete description

MPI plugin implemented using kubeflow mpi operator . Flyte will create the MPI job and mpi operator will take care of execution.

MPI plugin implemented using latest MPI CRD i.e v2beta1. The user should have v2beta1 CRD installed in cluster.

Future plan is to replace MPI operator with a training operator.

Tracking Issue

Remove the 'fixes' keyword if there will be multiple PRs to fix the linked issue

fixes flyteorg/flyte#1543

Follow-up issue

NA
OR
https://github.com/flyteorg/flyte/issues/

@codecov
Copy link

codecov bot commented Aug 12, 2021

Codecov Report

Merging #196 (f2d820b) into master (e26fa93) will increase coverage by 0.62%.
The diff coverage is 79.31%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master     #196      +/-   ##
==========================================
+ Coverage   61.46%   62.08%   +0.62%     
==========================================
  Files         140      141       +1     
  Lines        8620     8736     +116     
==========================================
+ Hits         5298     5424     +126     
+ Misses       2837     2813      -24     
- Partials      485      499      +14     
Flag Coverage Δ
unittests 61.75% <78.70%> (+0.63%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files Coverage Δ
go/tasks/plugins/k8s/kfoperators/mpi/mpi.go 74.72% <74.72%> (ø)
.../plugins/k8s/kfoperators/common/common_operator.go 79.38% <96.00%> (+52.99%) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update e26fa93...f2d820b. Read the comment docs.

Signed-off-by: Yuvraj <[email protected]>
go.mod Outdated Show resolved Hide resolved
@yindia yindia marked this pull request as ready for review August 30, 2021 12:33
@yindia yindia force-pushed the feature/mpi branch 3 times, most recently from 7492373 to 45cea17 Compare August 30, 2021 17:15
Signed-off-by: Yuvraj <[email protected]>
Signed-off-by: Yuvraj <[email protected]>
Signed-off-by: Yuvraj <[email protected]>
Signed-off-by: Yuvraj <[email protected]>
Signed-off-by: Yuvraj <[email protected]>
@yindia yindia requested a review from katrogan October 10, 2021 16:08
Signed-off-by: Yuvraj <[email protected]>
Copy link
Contributor

@katrogan katrogan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

one comment, will LGTM after end to end is working - thank you!

Signed-off-by: Yuvraj <[email protected]>
@yindia yindia requested review from katrogan and kumare3 October 15, 2021 09:27
Signed-off-by: Yuvraj <[email protected]>
@kumare3
Copy link
Contributor

kumare3 commented Oct 18, 2021

@evalsocket lets improve the description for this PR. This is an important PR

taskPhaseInfo pluginsCore.TaskInfo) (pluginsCore.PhaseInfo, error) {
switch currentCondition.Type {
case commonKf.JobCreated:
return pluginsCore.PhaseInfoQueued(occurredAt, pluginsCore.DefaultPhaseVersion, "JobCreated"), nil
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Instead of JobCreated can we add a better message as this can be seen in the UI. For example
`New job name: %s submitted to KF Training operator

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't have a job name here that's why added a comment without name

@kumare3
Copy link
Contributor

kumare3 commented Oct 18, 2021

I have a few comments then we can merge

Signed-off-by: Yuvraj <[email protected]>
@yindia yindia requested a review from kumare3 October 18, 2021 16:42
Copy link
Contributor

@katrogan katrogan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

mind waiting on @kumare3's review too?

@kumare3 kumare3 merged commit 7d00aea into master Oct 18, 2021
eapolinario pushed a commit that referenced this pull request Sep 6, 2023
* Added mpi plugin

Signed-off-by: Yuvraj <[email protected]>

* Remove launcher log

Signed-off-by: Yuvraj <[email protected]>

* More changes

Signed-off-by: Yuvraj <[email protected]>

* Fix test and lint

Signed-off-by: Yuvraj <[email protected]>

* fix gomod issue

Signed-off-by: Yuvraj <[email protected]>

* More changes

Signed-off-by: Yuvraj <[email protected]>

* make lint

Signed-off-by: Yuvraj <[email protected]>

* Fix unit test

Signed-off-by: Yuvraj <[email protected]>

* Added more unit test

Signed-off-by: Yuvraj <[email protected]>

* Fix unit test

Signed-off-by: Yuvraj <[email protected]>

* Added documentation

Signed-off-by: Yuvraj <[email protected]>
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[Plugin] Kubeflow MPI Operator
4 participants