Skip to content
This repository has been archived by the owner on Oct 9, 2023. It is now read-only.

tensorflow plugin implementation #103

Merged
merged 8 commits into from
Feb 13, 2021
Merged

Conversation

pingsutw
Copy link
Member

@pingsutw pingsutw commented Jul 11, 2020

TL;DR

TensorFlow plugin for Flyte that leverages https://github.com/kubeflow/tf-operator to run distributed tensorflow jobs on k8s.

Type

  • Bug Fix
  • Feature
  • Plugin

Are all requirements met?

  • Code completed
  • Smoke tested
  • Unit tests added
  • Code documentation added
  • Any pending items have an associated Issue

Complete description

related to flyteorg/flyteidl#71

Tracking Issue

flyteorg/flyte#115

Follow-up issue

NA

@pingsutw
Copy link
Member Author

The test failure seems not related to PR.

@kumare3
Copy link
Contributor

kumare3 commented Jul 11, 2020

I will take a look at the test

@pingsutw
Copy link
Member Author

Thanks :)

@swiftdiaries
Copy link

Otherwise, LGTM :)

@codecov-commenter
Copy link

codecov-commenter commented Aug 13, 2020

Codecov Report

Merging #103 into master will increase coverage by 0.46%.
The diff coverage is 79.13%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master     #103      +/-   ##
==========================================
+ Coverage   58.70%   59.17%   +0.46%     
==========================================
  Files          99      100       +1     
  Lines        5936     6075     +139     
==========================================
+ Hits         3485     3595     +110     
- Misses       2103     2122      +19     
- Partials      348      358      +10     
Impacted Files Coverage Δ
...s/plugins/k8s/kfoperators/tensorflow/tensorflow.go 79.13% <79.13%> (ø)

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update dc4e8b5...c6f7763. Read the comment docs.

@pingsutw
Copy link
Member Author

@swiftdiaries sorry for the late reply.
I've updated and rebased the PR

swiftdiaries
swiftdiaries previously approved these changes Aug 13, 2020
@swiftdiaries
Copy link

LGTM
I don't have write access to this repo, I think @kumare3 might want to do a round of reviews before merging it

@swiftdiaries
Copy link

This is really cool 😬 Great to see this come along

@kumare3
Copy link
Contributor

kumare3 commented Aug 13, 2020

Hey I thought this was merged. One check seems
To be failing?

@pingsutw
Copy link
Member Author

Test failure seems unrelated.
btw, I'm going to write an example in flytekit.

@kumare3
Copy link
Contributor

kumare3 commented Aug 14, 2020

Cc @wild-endeavor @EngHabu @katrogan

return nil, flyteerr.Errorf(flyteerr.BadTaskSpecification, "Unable to create pod spec: [%v]", err.Error())
}

overrideDefaultContainerName(taskCtx, podSpec)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you need to do this?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Tensorflow operator will also force pod to have a container named tensorflow, I've updated the comment.

go/tasks/plugins/k8s/kfoperators/tensorflow/tensorflow.go Outdated Show resolved Hide resolved
go/tasks/plugins/k8s/kfoperators/tensorflow/tensorflow.go Outdated Show resolved Hide resolved
@kumare3
Copy link
Contributor

kumare3 commented Aug 18, 2020

hmm ya the test is not related, it is interesting to me that your branch has this error, but master does not?

@pingsutw
Copy link
Member Author

I've moved some common code between PyTorch and TensorFlow to common_opeartor

@pingsutw
Copy link
Member Author

hmm ya the test is not related, it is interesting to me that your branch has this error, but master does not?

I got the same error in master
https://github.com/pingsutw/flyteplugins/actions/runs/215382594

@kumare3
Copy link
Contributor

kumare3 commented Aug 20, 2020

Let me review today and we can merge soon

@kumare3 kumare3 self-requested a review September 14, 2020 16:48
kumare3
kumare3 previously approved these changes Sep 14, 2020
Copy link
Contributor

@kumare3 kumare3 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@kumare3
Copy link
Contributor

kumare3 commented Sep 14, 2020

Some files are not go-imported. Once they are done, the linter should be green and let us merge it afterwards. I do not have permissions to your fork.

@codecov-io
Copy link

codecov-io commented Feb 4, 2021

Codecov Report

Merging #103 (91cc3c8) into master (79bcea8) will decrease coverage by 1.97%.
The diff coverage is 46.55%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master     #103      +/-   ##
==========================================
- Coverage   61.98%   60.01%   -1.98%     
==========================================
  Files         109      129      +20     
  Lines        6256     6955     +699     
==========================================
+ Hits         3878     4174     +296     
- Misses       1984     2361     +377     
- Partials      394      420      +26     
Flag Coverage Δ
unittests 60.01% <46.55%> (?)

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files Coverage Δ
go/tasks/aws/config.go 9.09% <0.00%> (-40.91%) ⬇️
go/tasks/aws/config_flags.go 29.41% <ø> (-7.44%) ⬇️
...ks/pluginmachinery/core/allocationstatus_enumer.go 0.00% <0.00%> (ø)
go/tasks/pluginmachinery/core/resource_manager.go 0.00% <ø> (ø)
...o/tasks/pluginmachinery/internal/webapi/monitor.go 0.00% <0.00%> (ø)
go/tasks/plugins/hive/executor.go 10.29% <0.00%> (ø)
go/tasks/plugins/k8s/sagemaker/custom_training.go 69.60% <0.00%> (ø)
...sks/plugins/k8s/sagemaker/hyperparameter_tuning.go 61.49% <0.00%> (ø)
go/tasks/plugins/webapi/athena/config.go 0.00% <0.00%> (ø)
go/tasks/plugins/webapi/athena/plugin.go 3.00% <3.00%> (ø)
... and 41 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 8a49ee2...3518953. Read the comment docs.

@pingsutw
Copy link
Member Author

pingsutw commented Feb 4, 2021

cc @EngHabu @swiftdiaries @kumare3 Can you help me review this, Thanks.
I've fixed the conflict and the PR tests and lint already passed.

However, I don't why Build Co-Pilot Docker image failed.
Do you know how to fix it?

Removing intermediate container b9a18bc4bc63
 ---> 943a353f299a
Successfully built 943a353f299a
Successfully tagged my_awesome_image:latest
+ set +x

[Action Step] Tagging image...
Tag: ghcr.io/flyteorg/flytecopilot:1a52a8f80c80711ab2175ce3b58db715ad13af59

[Action Step] Pushing image...
ERROR: Can't push when not logged in to registry. Set "push_image_and_stages: false" if you don't want to push

Copy link
Contributor

@EngHabu EngHabu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Awesome work, @pingsutw. Just have a few comments and we can merge!

@pingsutw
Copy link
Member Author

pingsutw commented Feb 4, 2021

@EngHabu Thanks for your review. I will update ASAP.

@pingsutw
Copy link
Member Author

@EngHabu and @kumare3 I've updated it. I think it's ready to merge.

@kumare3
Copy link
Contributor

kumare3 commented Feb 12, 2021

I think I am +1 on the issue, will wait for @EngHabu to approve.
On the other hand @pingsutw we also need to add an example here - https://github.com/flyteorg/flytesnacks/tree/push-images/cookbook/plugins

@EngHabu
Copy link
Contributor

EngHabu commented Feb 12, 2021

Let's address this: #103 (comment) then let's get it merged!

@pingsutw
Copy link
Member Author

Let's address this: #103 (comment) then let's get it merged!

okay, I'm working on it.

@EngHabu EngHabu merged commit e84585e into flyteorg:master Feb 13, 2021
katrogan pushed a commit that referenced this pull request Feb 26, 2021
* Tensorflow plugin implementaion

* Fix checkstyle

* Update go/tasks/plugins/k8s/kfoperators/common/common_operator.go

Co-authored-by: Haytham Abuelfutuh <[email protected]>

* Address comments

* Fix lint error

* Address comment

* Fix lint

* Update to running

Co-authored-by: Haytham Abuelfutuh <[email protected]>
Signed-off-by: Katrina Rogan <[email protected]>
eapolinario pushed a commit that referenced this pull request Sep 6, 2023
* Tensorflow plugin implementaion

* Fix checkstyle

* Update go/tasks/plugins/k8s/kfoperators/common/common_operator.go

Co-authored-by: Haytham Abuelfutuh <[email protected]>

* Address comments

* Fix lint error

* Address comment

* Fix lint

* Update to running

Co-authored-by: Haytham Abuelfutuh <[email protected]>
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants