Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Move tensorflow/k8s to kubeflow/tf-operator #350

Closed
jlewi opened this issue Jan 29, 2018 · 36 comments
Closed

Move tensorflow/k8s to kubeflow/tf-operator #350

jlewi opened this issue Jan 29, 2018 · 36 comments

Comments

@jlewi
Copy link
Contributor

jlewi commented Jan 29, 2018

Now that we have the Kubeflow GitHub organization should we

  • Move tensorflow/k8s into the Kubeflow GitHub org?
  • Rename and rescope tensorflow/k8s to a TFJob CRD?

Reasons for moving the repo

  • tensorflow/k8s is owned/adminstered by the Kubeflow community
    • Putting the repo under Kubeflow should facilitate this (e.g. we can use Kubeflow GitHub teams to administer access to the tensorflow/k8s repo)
  • Avoid confusion about how tensorflow/k8s relates to Kubeflow
  • The original reason for not scoping tensorflow/k8s to the TFJobs CRD was so we would have a home for other code not related to the TFJobs CRD.
    • Now that we have our own GitHub org we can create new repositories as necessary

Reasons for not moving

  • Churn
@ConnorDoyle
Copy link
Contributor

ConnorDoyle commented Jan 30, 2018

+1 to move

Propose naming the repo kubeflow/tf-operator. The rationale for basing the name on the operator rather than the CRD it controls is that you could imagine more TF-related resources in the future.

@yupbank
Copy link
Member

yupbank commented Jan 30, 2018

+1 to move

@ScorpioCPH
Copy link
Member

+1
And how about naming it kubeflow/tf-controller :)

@gaocegege
Copy link
Member

+1, and +1 for the name kubeflow/tf-controller

@k82cn
Copy link
Collaborator

k82cn commented Jan 30, 2018

+1 to move :).

kubeflow/tf-controller seems better for k8s's style :).

@jlewi
Copy link
Contributor Author

jlewi commented Jan 30, 2018

Per #300
kubeflow/tf-operator seems better than kubeflow/tf-controller

@gaocegege
Copy link
Member

tf-operator SGTM, too.

@pineking
Copy link
Member

pineking commented Jan 30, 2018

+1

@jlewi what's the status of this repo (tensorflow/k8s) after moving to kubeflow/kubeflow? freezing? deleted?

@gaocegege
Copy link
Member

@pineking This repo would be moved to kubeflow/tf-operator, not kubeflow/kubeflow, I think. It will be transferred so when you visit github.com/tensorflow/k8s it will be redirected to kubeflow/tf-operator.

@gaocegege
Copy link
Member

gaocegege commented Jan 31, 2018

I think there are some tasks we should do:

  • Rename the import path in go files, python scripts, markdown files, YAML, Dockerfile, and other files.
  • Regenerate the client
  • Update the name in kubernetes/test-infra
  • Update kubeflow/kubeflow submodule and docs

And, there will be some conflicts with most of PRs, so I think it may be better to squash the mergeable PRs ASAP.

@jlewi
Copy link
Contributor Author

jlewi commented Jan 31, 2018

Does it matter whether we do the rename before or after we move the repo?

I suggest we don't update the name in kubernetes/test-infra until after the move so that our tests continue to work.

@gaocegege
Copy link
Member

@jlewi SGTM, I think it does no matter 😄

@jimexist
Copy link
Member

🚀 no churn no gain, early churn << later churn.

@jlewi
Copy link
Contributor Author

jlewi commented Jan 31, 2018

I've started working on #205 (replacing Airflow with Argo for CI). My suggestion is we wait until we've migrated off Airflow to Argo for testing to do the move. Debugging tests is so painful with Airflow that I'd rather migrate to Argo first as I think this will make dealing with the churn and breakages much easier.

@jlewi
Copy link
Contributor Author

jlewi commented Feb 4, 2018

I submitted #205.

I'll plan on moving this repo to kubeflow/tf-operator early this week

@wbuchwalter @DjangoPeng @gaocegege @ScorpioCPH @jimexist @zjj2wry as committers to tensorflow/k8s can you explicitly LGTM/+1 this change?

@DjangoPeng
Copy link
Member

+1 for moving it to kubeflow/tf-controller.

@gaocegege
Copy link
Member

gaocegege commented Feb 5, 2018

+1 Thanks for your work 😄

I'll plan on moving this repo to kubeflow/tf-operator early this week

There is one thing: we do not have the role committer in our governance proposal.

wbuchwalter DjangoPeng gaocegege ScorpioCPH Jimexist zjj2wry as committers to tensorflow/k8s can you explicitly LGTM/+1 this change?

@jlewi
Copy link
Contributor Author

jlewi commented Feb 6, 2018

@DjangoPeng @gaocegege Seems to be some disagreement over the name.

Based on the feedback in this issue kubeflow/tf-controller seems more popular but based on #300 kubeflow/tf-operator seems like the more accurate terminology.

Specifically, this comment says that an operator is a controller that is application specific; whereas controllers are not application specific.

In our case, we are very much application specific so operator seems like the better terminology (albeit its just a convention).

Technically, all operators are controllers but I think we should follow CoreOS and use operator to indicate we are application(TF) specific.

@DjangoPeng @gaocegege what do you think?

@gaocegege
Copy link
Member

I agree with the name tf-operator

@DjangoPeng
Copy link
Member

@jlewi @gaocegege
Operator is more powerful than controller in my mind. If we move it to kubeflow/tf-operator, we'd better clearly define the scope of tf-operator. For example, it can manage the whole lifecycle of distributed tensorflow jobs and provide many termination policies such as restart, restore, stop, etc.

BTW: Considering the development of kubeflow, there are mxnet-operator, caffe-operator and pytorch-opperator in the future. So, we'd better manage the corresponding CRDs in an unified repo such like kubeflow/clientset. I'm not sure it's the time to do that or not.

@gaocegege
Copy link
Member

gaocegege commented Feb 6, 2018

@DjangoPeng

Yeah, I think this repo is an operator, since it does manage the whole lifecycle of distributed TF jobs. I agree that it is better to place all clientset and CRD in one repo, but it is not in high priority, IMO.

@jlewi
Copy link
Contributor Author

jlewi commented Feb 6, 2018

It seems like we're ok with kubeflow/tf-operator for now. I'll add this to the community meeting agenda just to ensure there's no final objections.

I agree with @DjangoPeng that if we add more operators in the future we might need to rethink code organization but I say we cross that bridge when we come to it.

I opened #179 to gauge interest in PyTorch. Anecdotally PyTorch is the deep learning framework that seems most popular after TF.

@gaocegege
Copy link
Member

gaocegege commented Feb 6, 2018

@jlewi The issue should be kubeflow/kubeflow#179 😄

And FYI, there is a third party MXNet operator inspired by this repo: https://github.com/deepinsight/mxnet-operator. But it follows our old architecture, and I think we should refactor it to be event-driven or reimplement it if we want to support MXNet.

/cc @loadwiki

@DjangoPeng
Copy link
Member

Cool!

@jlewi
Copy link
Contributor Author

jlewi commented Feb 6, 2018

Decision at the morning meeting was to move the repo to kubeflow/tf-operator
I think I'd like to get an E2E test that verifies kubeflow is working (kubeflow/kubeflow#207) before we move things to try to avoid getting in the state where things are broken at head.

@jlewi jlewi changed the title Move tensorflow/k8s to kubeflow/tfjobs Move tensorflow/k8s to kubeflow/tf-operator Feb 11, 2018
@jlewi
Copy link
Contributor Author

jlewi commented Feb 12, 2018

@gaocegege I moved the repository it is now kubeflow/tf-operator.
Can you take on putting together a PR to update all the locations in the code?

@gaocegege
Copy link
Member

@jlewi I will do it.

@gaocegege
Copy link
Member

PTAL #386 And I think we should setup Travis and Prow for this repo.

jlewi pushed a commit that referenced this issue Feb 12, 2018
We have migrated the repo and we should replace the repo name with the new one.

* Related to #350
Signed-off-by: Ce Gao [email protected]
@jlewi
Copy link
Contributor Author

jlewi commented Feb 12, 2018

kubernetes/test-infra#6779 defines the prow jobs for the new repo. That should hopefully get merged at which point we can figure out what's broken and needs to be fixed.

@jlewi
Copy link
Contributor Author

jlewi commented Feb 12, 2018

https://prow.k8s.io/?repo=kubeflow%2Ftf-operator

It looks like the prow job was triggered correctly for kubeflow//tf-operator but the results weren't written back to the PR.

@jlewi
Copy link
Contributor Author

jlewi commented Feb 12, 2018

The k8s-ci-robot has write access on the repo which is what we have for kubeflow/kubeflow

@jlewi
Copy link
Contributor Author

jlewi commented Feb 12, 2018

Looks like the prow job can't pull the image

Log not found: response has status "400 Bad Request" and body "{"kind":"Status","apiVersion":"v1","metadata":{},"status":"Failure","message":"container \"4fea41a7-1041-11e8-b419-0a580a6c0122-0\" in pod \"4fea41a7-1041-11e8-b419-0a580a6c0122\" is waiting to start: image can't be pulled","reason":"BadRequest","code":400}

@gaocegege
Copy link
Member

It seems that there is only one thing to do:

  • Update the name in kubernetes/test-infra

@jlewi
Copy link
Contributor Author

jlewi commented Feb 23, 2018

@gaocegege What name are you referring to?

@gaocegege
Copy link
Member

@jlewi
Copy link
Contributor Author

jlewi commented Feb 26, 2018

We have already enabled it for the org
https://github.com/kubernetes/test-infra/blob/master/prow/plugins.yaml#L23

So I'm going to mark this bug as closed. It would still be good to clean up the reference to tensorflow/k8s
https://github.com/kubernetes/test-infra/blob/master/prow/plugins.yaml#L13

@jlewi jlewi closed this as completed Feb 26, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

9 participants