Update links for Kubeflow Training Operator #3002

andreyvelich · 2021-10-05T23:14:16Z

After this PR: kubeflow/training-operator#1348, we renamed the repo to training-operator.
I tried to update all corresponding links and update some legacy docs for Training.

Please take a look.

I didn't update References docs, do we want to keep them ?
We have some scripts to generate it: https://github.com/kubeflow/website/tree/master/gen-api-reference.

/assign @kubeflow/wg-training-leads @shannonbradshaw

shannonbradshaw · 2021-10-05T23:29:07Z

Thanks for submitting this PR, @andreyvelich! I will review first thing in my morning tomorrow.

terrytangyuan

/lgtm

shannonbradshaw

I suggested corrections in few files, otherwise:

/lgtm

shannonbradshaw · 2021-10-06T13:31:32Z

content/en/docs/components/training/job-scheduling.md

 With using volcano scheduler to apply gang-scheduling, a job can run only if there are enough resources for all the pods of the job. Otherwise, all the pods will be in pending state waiting for enough resources. For example, if a job requiring N pods is created and there are only enough resources to schedule N-2 pods, then N pods of the job will stay pending.

-**Note:** when in a high workload, if a pod of the job dies when the job is still running, it might give other pods chance to occupied the resources and cause deadlock. 
+**Note:** when in a high workload, if a pod of the job dies when the job is still running, it might give other pods chance to occupied the resources and cause deadlock.


There are a couple of typos here. Fixes in bold below
...when in a high workload, if a pod of the job dies when the job is still running, it might give other pods a chance to occupy the resources and cause deadlock.

shannonbradshaw · 2021-10-06T13:37:28Z

content/en/docs/components/training/mxnet.md

 ```

 Before you use the auto-tuning example, there is some preparatory work need to be finished in advance.
 To let TVM tune your network, you should create a docker image which has TVM module.
 Then, you need a auto-tuning script to specify which network will be tuned and set the auto-tuning parameters.
 For more details, please see [tutorials](https://docs.tvm.ai/tutorials/autotvm/tune_relay_mobile_gpu.html#sphx-glr-tutorials-autotvm-tune-relay-mobile-gpu-py).
-Finally, you need a startup script to start the auto-tuning program. In fact, mxnet-operator will set all the parameters as environment variables and the startup script need to reed these variable and then transmit them to auto-tuning script.
+Finally, you need a startup script to start the auto-tuning program. In fact, MXJob will set all the parameters as environment variables and the startup script need to reed these variable and then transmit them to auto-tuning script.


Fixes in bold below.
Finally, you need a startup script to start the auto-tuning program. In fact, MXJob will set all the parameters as environment variables and the startup script needs to read these variable and then transmit them to the auto-tuning script.

shannonbradshaw · 2021-10-06T13:40:07Z

content/en/docs/components/training/xgboost.md

-You can create XGBoost Job by defining a XGboostJob config file. See the manifests for the [IRIS example](https://github.com/kubeflow/tf-operator/blob/master/examples/xgboost/xgboostjob.yaml). You may change the config file based on your requirements. eg: add `CleanPodPolicy` in Spec to `None` to retain pods after job termination.
+You can create a training job by defining a `XGboostJob` config file. See the
+manifests for the [IRIS example](https://github.com/kubeflow/training-operator/blob/master/examples/xgboost/xgboostjob.yaml).
+You may change the config file based on your requirements. eg: add `CleanPodPolicy`


eg: -> E.g.,

andreyvelich · 2021-10-06T22:41:28Z

Thank you for the review @shannonbradshaw!
I made these changes.

thesuperzapper · 2021-10-07T02:44:24Z

FYI, the PR #3014 restructures the Components / Training Operators section to allow for moving References from the top level.

I don't mind who merges first, but someone will have to rebase.

shannonbradshaw · 2021-10-07T11:36:56Z

/lgtm

kimwnasptd · 2021-10-07T19:40:42Z

/lgtm

andreyvelich · 2021-10-07T20:21:44Z

FYI, the PR #3014 restructures the Components / Training Operators section to allow for moving References from the top level.

I don't mind who merges first, but someone will have to rebase.

I think we can remove reference for Training Operators in the future, since we keep them here: https://github.com/kubeflow/training-operator/tree/master/docs/api.

WDYT @kubeflow/wg-training-leads ?

terrytangyuan · 2021-10-08T02:53:54Z

I think we can remove reference for Training Operators in the future, since we keep them here: https://github.com/kubeflow/training-operator/tree/master/docs/api.

WDYT @kubeflow/wg-training-leads ?

Yes. It's easy to get outdated on the website. Also there's an issue to autogenerate the docs #1924.

andreyvelich · 2021-10-08T12:42:37Z

I think we can remove reference for Training Operators in the future, since we keep them here: https://github.com/kubeflow/training-operator/tree/master/docs/api.
WDYT @kubeflow/wg-training-leads ?

Yes. It's easy to get outdated on the website. Also there's an issue to autogenerate the docs #1924.

Sounds good.
@kubeflow/wg-training-leads If you are fine with these changes, I think we can merge this PR.

andreyvelich · 2021-10-08T23:46:54Z

@Bobgy @james-jwu @zijianjoy Please can you help with this PR approval ?

james-jwu · 2021-10-09T01:19:02Z

/lgtm
/approve

google-oss-robot · 2021-10-09T01:19:18Z

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: andreyvelich, james-jwu, shannonbradshaw, terrytangyuan

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

~~OWNERS~~ [james-jwu]

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

google-cla bot added the cla: yes label Oct 5, 2021

google-oss-robot added the size/L label Oct 5, 2021

google-oss-robot requested review from animeshsingh, jinchihe and knkski October 5, 2021 23:14

terrytangyuan approved these changes Oct 6, 2021

View reviewed changes

google-oss-robot assigned terrytangyuan Oct 6, 2021

google-oss-robot added the lgtm label Oct 6, 2021

shannonbradshaw approved these changes Oct 6, 2021

View reviewed changes

google-oss-robot removed the lgtm label Oct 6, 2021

google-oss-robot assigned shannonbradshaw Oct 7, 2021

google-oss-robot added the lgtm label Oct 7, 2021

google-oss-robot assigned kimwnasptd Oct 7, 2021

shannonbradshaw mentioned this pull request Oct 7, 2021

Tracking Issue - Kubeflow v1.4 docs #2879

Closed

andreyvelich mentioned this pull request Oct 7, 2021

Update kubeflow versioning policies table 1.4 #3017

Merged

google-oss-robot removed the lgtm label Oct 7, 2021

andreyvelich added 5 commits October 8, 2021 12:40

Update links for Kubeflow Training Operator

09a5ee9

Change XGBoost header

5c75b6b

Modify XGBoost install

84a0818

Address review

6dae3de

Update Training and Katib versions

4162db0

andreyvelich force-pushed the fix-training-operator-links branch from 3d46599 to 4162db0 Compare October 8, 2021 11:40

google-oss-robot assigned james-jwu Oct 9, 2021

google-oss-robot added the lgtm label Oct 9, 2021

google-oss-robot added the approved label Oct 9, 2021

google-oss-robot merged commit 63399c1 into kubeflow:master Oct 9, 2021

andreyvelich deleted the fix-training-operator-links branch October 9, 2021 01:19

thesuperzapper mentioned this pull request Oct 10, 2021

remove Reference section #3014

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Update links for Kubeflow Training Operator #3002

Update links for Kubeflow Training Operator #3002

andreyvelich commented Oct 5, 2021 •

edited

Loading

shannonbradshaw commented Oct 5, 2021

terrytangyuan left a comment

shannonbradshaw left a comment •

edited

Loading

shannonbradshaw Oct 6, 2021

shannonbradshaw Oct 6, 2021

shannonbradshaw Oct 6, 2021

andreyvelich commented Oct 6, 2021

thesuperzapper commented Oct 7, 2021

shannonbradshaw commented Oct 7, 2021

kimwnasptd commented Oct 7, 2021

andreyvelich commented Oct 7, 2021 •

edited

Loading

terrytangyuan commented Oct 8, 2021

andreyvelich commented Oct 8, 2021

andreyvelich commented Oct 8, 2021

james-jwu commented Oct 9, 2021

google-oss-robot commented Oct 9, 2021

Update links for Kubeflow Training Operator #3002

Update links for Kubeflow Training Operator #3002

Conversation

andreyvelich commented Oct 5, 2021 • edited Loading

shannonbradshaw commented Oct 5, 2021

terrytangyuan left a comment

Choose a reason for hiding this comment

shannonbradshaw left a comment • edited Loading

Choose a reason for hiding this comment

shannonbradshaw Oct 6, 2021

Choose a reason for hiding this comment

shannonbradshaw Oct 6, 2021

Choose a reason for hiding this comment

shannonbradshaw Oct 6, 2021

Choose a reason for hiding this comment

andreyvelich commented Oct 6, 2021

thesuperzapper commented Oct 7, 2021

shannonbradshaw commented Oct 7, 2021

kimwnasptd commented Oct 7, 2021

andreyvelich commented Oct 7, 2021 • edited Loading

terrytangyuan commented Oct 8, 2021

andreyvelich commented Oct 8, 2021

andreyvelich commented Oct 8, 2021

james-jwu commented Oct 9, 2021

google-oss-robot commented Oct 9, 2021

andreyvelich commented Oct 5, 2021 •

edited

Loading

shannonbradshaw left a comment •

edited

Loading

andreyvelich commented Oct 7, 2021 •

edited

Loading