From 965601f9ab20b9b5ead76d9a48cc286175f2de32 Mon Sep 17 00:00:00 2001 From: oshima Date: Fri, 16 Nov 2018 10:17:42 +0900 Subject: [PATCH 1/7] update katib doc use katib-ui (#295) Signed-off-by: YujiOshima --- .../docs/guides/components/hyperparameter.md | 29 +++++-------------- 1 file changed, 8 insertions(+), 21 deletions(-) diff --git a/content/docs/guides/components/hyperparameter.md b/content/docs/guides/components/hyperparameter.md index 92e291cf3e..26e44b611c 100644 --- a/content/docs/guides/components/hyperparameter.md +++ b/content/docs/guides/components/hyperparameter.md @@ -9,30 +9,17 @@ toc = true weight = 5 +++ -## Deploying Katib - -[Katib](https://github.com/kubeflow/katib) is a hyperparameter tuning framework, inspired by -[Google Vizier](https://static.googleusercontent.com/media/research.google.com/ja//pubs/archive/bcb15507f4b52991a0783013df4222240e942381.pdf). - -To deploy katib, -```shell -ks pkg install kubeflow/katib@master -ks generate katib katib -ks apply ${ENV} -c katib -``` - ## Using Katib -Create namespace `katib` as the service launches jobs in this namespace. +Currently we are using port-forwarding to access the katib services. +kubernetes version 1.9~ ``` -kubectl create namespace katib +kubectl -n kubeflow port-forward svc/katib-ui 8000:80 ``` - -Currently we are using port-forwarding to access the katib services. +~1.8 ``` -kubectl get pod -n kubeflow # Find your vizier-core and modedb-frontend pods -kubectl port-forward -n kubeflow [vizier-core pod] 6789:6789 & -kubectl port-forward -n kubeflow [modeldb-frontend pod] 3000:3000 & +kubectl get pod -n kubeflow # Find your katib-ui pods +kubectl port-forward -n kubeflow [katib-ui pod] 8000:80 & ``` ## Creating a Study Job You can create Study Job for Katib by defining a StudyJob config file. @@ -63,7 +50,7 @@ Check the study status. ``` $ kubectl -n katib describe studyjobs random-example Name: random-example -Namespace: katib +Namespace: kubeflow Labels: controller-tools.k8s.io=1.0 Annotations: kubectl.kubernetes.io/last-applied-configuration={"apiVersion":"kubeflow.org/v1alpha1","kind":"StudyJob","metadata":{"annotations":{},"labels":{"controller-tools.k8s.io":"1.0"},"name":"random-example"... API Version: kubeflow.org/v1alpha1 @@ -140,4 +127,4 @@ Events: It should start a study and run two jobs with different parameters. -Go to http://localhost:3000/katib to see the result. +Go to http://localhost:8000/katib to see the result. From 11931cdf6b0cac0e139b5629e9925c2c63abc19d Mon Sep 17 00:00:00 2001 From: Michelle Casbon Date: Thu, 15 Nov 2018 18:38:06 -0800 Subject: [PATCH 2/7] Add Kubecon Seattle detail (#296) Add video from Data@Scale --- content/docs/about/events.md | 8 +++++++- 1 file changed, 7 insertions(+), 1 deletion(-) diff --git a/content/docs/about/events.md b/content/docs/about/events.md index 7885726e4e..580a47c2f4 100644 --- a/content/docs/about/events.md +++ b/content/docs/about/events.md @@ -16,6 +16,11 @@ Please raise a [GitHub issue](https://github.com/kubeflow/website/issues/new) if * [Data Day Texas, Austin](https://datadaytexas.com/), 26 January, 2019 - Kubeflow: Portable Machine Learning on Kubernetes: Michelle Casbon * [KubeCon, Seattle](https://events.linuxfoundation.org/events/kubecon-cloudnativecon-north-america-2018/), 11-13 December, 2018 + - [Workshop: Kubeflow End-to-End: GitHub Issue Summarization](https://sched.co/GrWE): Amy Unruh, Michelle Casbon + - [Natural Language Code Search for GitHub Using Kubeflow](https://sched.co/GrVn): Jeremy Lewi, Hamel Husain + - [Eco-Friendly ML: How the Kubeflow Ecosystem Bootstrapped Itself](https://sched.co/GrTc): Peter McKinnon + - [Deep Dive: Kubeflow BoF](https://sched.co/Ha1X): Jeremy Lewi, David Aronchick + - [Machine Learning as Code](https://sched.co/GrVh): Jay Smith * Women in ML & Data Science, Melbourne, 5 December, 2018 - Panel: Juliet Hougland, Michelle Casbon * [YOW!, Melbourne](https://melbourne.yowconference.com.au/), 4-7 December, 2018 @@ -33,7 +38,8 @@ Please raise a [GitHub issue](https://github.com/kubeflow/website/issues/new) if - Kubeflow End to End: Amy Unruh * [Data@Scale, Boston](https://dataatscale2018.splashthat.com/), 25 October, 2018 - [Women in Engineering Panel](https://datascalewomensbreakfast.splashthat.com/): Michelle Casbon - - Kubeflow: Portable Machine Learning on Kubernetes: Michelle Casbon + - [Kubeflow: Portable Machine Learning on Kubernetes](https://code.fb.com/core-data/data-scale-boston/): Michelle Casbon + - [Video](https://www.facebook.com/atscaleevents/videos/114311602829170/) * [Kafka Summit, San Francisco](https://kafka-summit.org/), 16-17 October, 2018 * [O’Reilly AI Conference, London](https://conferences.oreilly.com/artificial-intelligence/ai-eu), 08-11 October, 2018 - [Machine Learning at Scale with Kubernetes](https://conferences.oreilly.com/artificial-intelligence/ai-eu/public/schedule/detail/69194): Chris Cho From 10790ca2b068e8a734f03309d5e0ed0db2529d80 Mon Sep 17 00:00:00 2001 From: Jeremy Lewi Date: Tue, 20 Nov 2018 13:22:24 -0800 Subject: [PATCH 3/7] Add click to deploy as an option to the website. (#298) * Add click to deploy as an option to the website. Fixes: #278 * Add note about permissions. --- content/docs/started/getting-started-gke.md | 11 ++++++++++- 1 file changed, 10 insertions(+), 1 deletion(-) diff --git a/content/docs/started/getting-started-gke.md b/content/docs/started/getting-started-gke.md index e6926000db..0849cbe4d5 100644 --- a/content/docs/started/getting-started-gke.md +++ b/content/docs/started/getting-started-gke.md @@ -72,8 +72,17 @@ Create an OAuth client ID to be used to identify Cloud IAP when requesting acces export CLIENT_ID= export CLIENT_SECRET= ``` +## Deploy Kubeflow on Kubernetes Engine Using The Ui -## Deploy Kubeflow on Kubernetes Engine +1. Open [https://deploy.kubeflow.cloud/](https://deploy.kubeflow.cloud/#/deploy) in your web browser + + * You will need to login in using a GCP account with admin privileges for your GCP project + +1. Fill out the form + +1. Click Create Deployment + +## Deploy Kubeflow on Kubernetes Engine Using The Command Line Run the following steps to deploy Kubeflow: From df164c4bbae68b6eb06aefca6a902774101c4f15 Mon Sep 17 00:00:00 2001 From: Michelle Casbon Date: Tue, 20 Nov 2018 21:40:33 -0800 Subject: [PATCH 4/7] Change namespace katib -> kubeflow (#294) --- content/docs/guides/components/hyperparameter.md | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/content/docs/guides/components/hyperparameter.md b/content/docs/guides/components/hyperparameter.md index 26e44b611c..58171d433d 100644 --- a/content/docs/guides/components/hyperparameter.md +++ b/content/docs/guides/components/hyperparameter.md @@ -42,13 +42,13 @@ In this demo, 3 hyper parameters are randomly generated. ``` -$ kubectl -n katib get studyjob +$ kubectl -n kubeflow get studyjob ``` Check the study status. ``` -$ kubectl -n katib describe studyjobs random-example +$ kubectl -n kubeflow describe studyjobs random-example Name: random-example Namespace: kubeflow Labels: controller-tools.k8s.io=1.0 @@ -60,7 +60,7 @@ Metadata: Creation Timestamp: 2018-08-15T01:29:13Z Generation: 0 Resource Version: 173289 - Self Link: /apis/kubeflow.org/v1alpha1/namespaces/katib/studyjobs/random-example + Self Link: /apis/kubeflow.org/v1alpha1/namespaces/kubeflow/studyjobs/random-example UID: 9e136400-a02a-11e8-b88c-42010af0008b Spec: Study Spec: From fca980567d112b6f58343767db319f1b1f36c942 Mon Sep 17 00:00:00 2001 From: silenceshell Date: Thu, 22 Nov 2018 02:01:55 +0800 Subject: [PATCH 5/7] should be CPUs (#301) Signed-off-by: silenceshell --- themes/kf/layouts/index.html | 2 +- themes/kf/sass/src/index.html | 2 +- 2 files changed, 2 insertions(+), 2 deletions(-) diff --git a/themes/kf/layouts/index.html b/themes/kf/layouts/index.html index 05f9409756..bf8dd83dfa 100644 --- a/themes/kf/layouts/index.html +++ b/themes/kf/layouts/index.html @@ -42,7 +42,7 @@

Notebooks

TensorFlow model training

-

A TensorFlow Training Controller that can be configured to use either CPU’s or GPUs and be dynamically adjusted to the size of a cluster with a single setting. We also provide a TensorFlow job operator.

+

A TensorFlow Training Controller that can be configured to use either CPUs or GPUs and be dynamically adjusted to the size of a cluster with a single setting. We also provide a TensorFlow job operator.

diff --git a/themes/kf/sass/src/index.html b/themes/kf/sass/src/index.html index fdacd80c86..6d9f2d400c 100755 --- a/themes/kf/sass/src/index.html +++ b/themes/kf/sass/src/index.html @@ -149,7 +149,7 @@

Notebooks

TensorFlow model training

-

A TensorFlow Training Controller that can be configured to use either CPU’s or GPUs and be dynamically adjusted to the size of a cluster with a single setting. We also provide a TensorFlow job operator.

+

A TensorFlow Training Controller that can be configured to use either CPUs or GPUs and be dynamically adjusted to the size of a cluster with a single setting. We also provide a TensorFlow job operator.

From c477a3084b57e4bacd95ec954a5f7ef86a589b6c Mon Sep 17 00:00:00 2001 From: Sarah Maddox Date: Thu, 22 Nov 2018 05:09:07 +1100 Subject: [PATCH 6/7] Fixes click-to-deploy instructions. (#299) --- content/docs/started/getting-started-gke.md | 14 ++++++-------- 1 file changed, 6 insertions(+), 8 deletions(-) diff --git a/content/docs/started/getting-started-gke.md b/content/docs/started/getting-started-gke.md index 0849cbe4d5..ba3519f5c6 100644 --- a/content/docs/started/getting-started-gke.md +++ b/content/docs/started/getting-started-gke.md @@ -72,17 +72,15 @@ Create an OAuth client ID to be used to identify Cloud IAP when requesting acces export CLIENT_ID= export CLIENT_SECRET= ``` -## Deploy Kubeflow on Kubernetes Engine Using The Ui -1. Open [https://deploy.kubeflow.cloud/](https://deploy.kubeflow.cloud/#/deploy) in your web browser +## Deploy Kubeflow on GKE using the UI - * You will need to login in using a GCP account with admin privileges for your GCP project +1. Open [https://deploy.kubeflow.cloud/](https://deploy.kubeflow.cloud/#/deploy) in your web browser. +1. Sign in using a GCP account with admin privileges for your GCP project. +1. Complete the form. +1. Click **Create Deployment**. -1. Fill out the form - -1. Click Create Deployment - -## Deploy Kubeflow on Kubernetes Engine Using The Command Line +## Deploy Kubeflow on GKE using the command line Run the following steps to deploy Kubeflow: From aecdccbfdc598b6b57093637298526633d259fad Mon Sep 17 00:00:00 2001 From: Sarah Maddox Date: Thu, 22 Nov 2018 08:04:01 +1100 Subject: [PATCH 7/7] Adds TFJob UI details. (#300) * Adds TFJob UI details. * Addressed review comments. --- content/docs/guides/components/tftraining.md | 32 ++++++++++++++++++-- 1 file changed, 29 insertions(+), 3 deletions(-) diff --git a/content/docs/guides/components/tftraining.md b/content/docs/guides/components/tftraining.md index 783f86a964..9635f77acf 100644 --- a/content/docs/guides/components/tftraining.md +++ b/content/docs/guides/components/tftraining.md @@ -1,5 +1,6 @@ +++ -title = "TensorFlow Training" +title = "TensorFlow Training (TFJob)" +linkTitle = "TensorFlow Training" description = "" weight = 10 toc = true @@ -147,7 +148,7 @@ the [`TFJob` custom resource](https://github.com/kubeflow/tf-operator) is availa We treat each TensorFlow job as a [component](https://ksonnet.io/docs/tutorial#2-generate-and-deploy-an-app-component) in your APP. -### Run the TfCnn example +### Running the TfCnn example Kubeflow ships with a [ksonnet prototype](https://ksonnet.io/docs/concepts#prototype) suitable for running the [TensorFlow CNN Benchmarks](https://github.com/tensorflow/benchmarks/tree/master/scripts/tf_cnn_benchmarks). @@ -203,8 +204,33 @@ Typically you will want to change the following values * For example, you might need to configure various environment variables to talk to datastores like GCS or S3 -1. Attach PV's if you want to use PVs for storage. +1. Attach PVs if you want to use PVs for storage. +### Accessing the TFJob dashboard + +The TFJob dashboard is available at `/tfjobs/ui/`. Specifically: + +* If you're using the central Kubeflow UI, you can access the TFJob dashboard + by clicking **TFJOB DASHBOARD**: + + ![Central UI](/docs/images/central-ui.png) + +* If you followed the + [guide for GKE](/docs/started/getting-started-gke), you can + access the TFJob dashboard at the following URL: + + ``` + https://.endpoints..cloud.goog/tfjobs/ui/ + ``` + +* If you're using portforwarding, you can access the TFJob dashboard at the + following URL: + + ``` + http://localhost:8080/tfjobs/ui/ + ``` + +See more details about [accessing the Kubeflow UIs](/docs/guides/accessing-uis). ## Using GPUs