Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Dashboard V1 #125

Merged
merged 35 commits into from
Dec 23, 2017
Merged
Show file tree
Hide file tree
Changes from 20 commits
Commits
Show all changes
35 commits
Select commit Hold shift + click to select a range
dc9116c
basic dashboard structure
wbuchwalter Oct 31, 2017
f3fa5f7
move to flexbox, add routing
wbuchwalter Nov 2, 2017
8d9b88e
move to materialUI
wbuchwalter Nov 3, 2017
8b2190d
add build scripts
wbuchwalter Nov 4, 2017
37c7655
add tensorboard && attached pods
wbuchwalter Nov 5, 2017
f9e5ef5
Update dialogs titles
wbuchwalter Nov 5, 2017
be85cb3
Merge branch 'master' into dashboard
wbuchwalter Nov 6, 2017
95a36ea
kubeflow -> tensorflow/k8s
wbuchwalter Nov 7, 2017
37cf983
Add basic field validation for creation
wbuchwalter Nov 14, 2017
f5a1ab9
Add volume spec for tensorboard
wbuchwalter Nov 15, 2017
1933e2f
add envVars, cmd and args
wbuchwalter Nov 16, 2017
7056b3f
Single docker image for operator and dashboard
wbuchwalter Nov 16, 2017
3a0b35d
fix build scripts for dashboard and update chart
wbuchwalter Nov 16, 2017
a3d3efb
Linting, reformating
wbuchwalter Nov 19, 2017
6f3f2e9
remove dashboard deploy.yaml
wbuchwalter Nov 19, 2017
f89bbde
revert some changes to examples
wbuchwalter Nov 19, 2017
8d09f08
Merge branch 'master' into dashboard
wbuchwalter Nov 22, 2017
2c547fd
update chart and k8sutil usage
wbuchwalter Nov 22, 2017
7ac5407
add dashboard instructions to README and don't install by default
wbuchwalter Nov 22, 2017
7661d2f
tfjobs -> TfJobs
wbuchwalter Nov 22, 2017
79bf0ef
implement reviewers comments
wbuchwalter Nov 28, 2017
fbc2bf3
update release script
wbuchwalter Nov 28, 2017
18aee39
/opt/mlkube -> /opt/tensorflow_k8s
wbuchwalter Nov 29, 2017
48eaa0e
Fix typo in the comment.
jlewi Nov 30, 2017
eeaa478
add yarn to airflow dockerfile
wbuchwalter Nov 30, 2017
bd1cb24
Merge branch 'dashboard' of github.com:wbuchwalter/k8s into dashboard
wbuchwalter Nov 30, 2017
5fd7d9a
add nodejs dependency
wbuchwalter Nov 30, 2017
e8aca7e
update deployment
wbuchwalter Nov 30, 2017
7ef434b
Merge branch 'master' into dashboard
wbuchwalter Dec 20, 2017
5fb6564
add vendor/
wbuchwalter Dec 22, 2017
fc63c29
Merge branch 'master' into dashboard
wbuchwalter Dec 22, 2017
38c6de1
fix dependencies
wbuchwalter Dec 22, 2017
c2f8b39
client-go/pkg/api -> api/core
wbuchwalter Dec 22, 2017
62965c7
Merge branch 'master' into dashboard
jlewi Dec 23, 2017
89aec3f
Merge branch 'master' into dashboard
jlewi Dec 23, 2017
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
3 changes: 2 additions & 1 deletion .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,8 @@
# only so we exclude them.
bin/
vendor/
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you remove this? Now that we check in vendor we don't want in gitignoe.


node_modules/
build/
.vscode/

# Compiled python files.
Expand Down
19 changes: 19 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -95,6 +95,25 @@ TfJob requires Kubernetes >= 1.8
PASSED: tf-job-tfjob-test-pqxkwk
```

### Installing `tensorflow/k8s`'s Dashboard

> **Caution: the dashboard is in very early development stage!**

`tensorflow/k8s` also includes a dashboard allowing you to monitor and create `TfJobs` through a web UI.
To deploy the dashboard, set `dashboard.install` to `true`.
Note that by default the dashboard will only be accessible from within the cluster or by proxying, as the default `ServiceType` is `ClusterIP`.
If you wish to expose the dashboard through an external IP, set `dashboard.serviceType` to `LoadBalancer`.

So, for example, if you want to enable the dashboard, and also want to expose it externally, you would do:

```
CHART=https://storage.googleapis.com/tf-on-k8s-dogfood-releases/latest/tf-job-operator-chart-latest.tgz
helm install ${CHART} -n tf-job --wait --replace --set cloud=<gke or azure>,dashboard.install=true,dashboard.serviceType=LoadBalancer
```

This sould create a service named `tf-job-dashboard` as well as an additional deployment named `tf-job-dashboard`.


### Configuring the CRD

The CRD must be configured properly to work with your specific Kubernetes cluster.
Expand Down
Empty file added dashboard/README.md
Empty file.
45 changes: 45 additions & 0 deletions dashboard/backend/client/manager.go
Original file line number Diff line number Diff line change
@@ -0,0 +1,45 @@
package client
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a package comment somewhere?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Resolved


import (
"github.com/tensorflow/k8s/pkg/util/k8sutil"
"k8s.io/client-go/kubernetes"
"k8s.io/client-go/rest"
)

const (
CRDGroup = "mlkube.io"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This isn't right. We renamed it to tensorflow.org and v1alpha1. Is there a reason you are redefining these constants here rather than using the existing constants defined in pkg/spec?

Copy link
Contributor Author

@wbuchwalter wbuchwalter Nov 28, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a reason you are redefining these constants here

No, these constants aren't use anymore, forgot to delete them earlier.

CRDVersion = "v1beta1"
)

type ClientManager struct {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Public variables, functions should have a doc string.

restCfg *rest.Config
ClientSet *kubernetes.Clientset
TfJobClient *k8sutil.TfJobRestClient
}

func (c *ClientManager) init() {
restCfg, err := k8sutil.GetClusterConfig()
if err != nil {
panic(err)
}
c.restCfg = restCfg

clientset, err := kubernetes.NewForConfig(c.restCfg)
if err != nil {
panic(err.Error())
}
c.ClientSet = clientset

tfJobClient, err := k8sutil.NewTfJobClient()
if err != nil {
panic(err)
}
c.TfJobClient = tfJobClient
}

func NewClientManager() (ClientManager, error) {
cm := ClientManager{}
cm.init()

return cm, nil
}
170 changes: 170 additions & 0 deletions dashboard/backend/handler/apihandler.go
Original file line number Diff line number Diff line change
@@ -0,0 +1,170 @@
package handler
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Package comment?


Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit I think we've been using underscores in file names; so api_handler.go.

import (
"fmt"
"net/http"

restful "github.com/emicklei/go-restful"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Any particular reason you picked this package?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When I started the UI, my initial thought was that in the end it might end up integrated with the kubernetes dashboard once they have a better plan to support third parties integration, so the backend architecture is pretty close to https://github.com/kubernetes/dashboard/tree/master/src/app/backend. go-restful is what is used there.
Since it seems unlikely that we will go this route anyway, I can use something else if you like.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

SGTM thanks.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Resolved.

"github.com/tensorflow/k8s/dashboard/backend/client"
"github.com/tensorflow/k8s/pkg/spec"
metav1 "k8s.io/apimachinery/pkg/apis/meta/v1"
"k8s.io/client-go/pkg/api/v1"
)

type APIHandler struct {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Needs a doc string.

cManager client.ClientManager
}

//add tensorboard ips, add azure file / gs links
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

space after "//"

Needs a doc string; see https://golang.org/doc/effective_go.html#commentary

The first sentence should begin with the name of the thing being described; e.g.

"TfJobDetail is ...."

type TfJobDetail struct {
TfJob *spec.TfJob `json:"tfJob"`
TbService *v1.Service `json:"tbService"`
Pods []v1.Pod `json:"pods"`
}

type TfJobList struct {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Comment please.

tfJobs []spec.TfJob `json:"TfJobs"`
}

func CreateHTTPAPIHandler(client client.ClientManager) (http.Handler, error) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Comment please.

apiHandler := APIHandler{
cManager: client,
}

wsContainer := restful.NewContainer()
wsContainer.EnableContentEncoding(true)

cors := restful.CrossOriginResourceSharing{
ExposeHeaders: []string{"X-My-Header"},
AllowedHeaders: []string{"Content-Type", "Accept"},
AllowedMethods: []string{"GET", "POST", "DELETE"},
CookiesAllowed: false,
Container: wsContainer,
}
wsContainer.Filter(cors.Filter)
wsContainer.Filter(wsContainer.OPTIONSFilter)

apiV1Ws := new(restful.WebService)

apiV1Ws.Path("/api").
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How do these API methods differ from the API methods that the K8s APIServer provides for our TfJob CRD?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some of them (such as DELETE /tfjob/{namespace}/{tfjob} are basically just forwarded to the APIServer as is.
Others such as GET /tfjob/{namespace}/{tfjob} are different and return more data that the APIServer would (such as related pods, service details for TensorBoard etc.)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there any particular reason why you need to define methods that just forward to the APIServer rather than having callers of those methods just call the APIServer directly?

I think this has the potential of creating confusion down the line. e.g if someone is writing a program to create TfJobs programmatically, should they be using the APIServer or the backend server?

Should the extra data (e.g. service details for TensorBoard) be included in the TfJob status?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This API is called by the dashboard's frontend only.
Since the frontend is just a piece of JS running on the client's end (out of cluster), there is no way to access the APIServer from there securely.
So the frontend makes a call to the dashboard's backend who is running in-cluster and is authenticated with the APIServer.
This is similar to what is happening in k8s dashboard: https://github.com/kubernetes/dashboard/blob/master/src/app/backend/handler/apihandler.go#L248

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The APIServer speaks oauth. So couldn't the JS app running on the client just obtain an OAuth token for the user and then include that in the https request to the APIServer?

That's probably beyond the scope of this PR. However, if that seems like a promising approach then it might be worth opening an issue or TODO to track.

Consumes(restful.MIME_JSON).
Produces(restful.MIME_JSON)

apiV1Ws.Route(
apiV1Ws.GET("/tfjob").
To(apiHandler.handleGetTfJobs).
Writes(TfJobList{}))

apiV1Ws.Route(
apiV1Ws.GET("/tfjob/{namespace}/{tfjob}").
To(apiHandler.handleGetTfJobDetail).
Writes(TfJobDetail{}))

apiV1Ws.Route(
apiV1Ws.POST("/tfjob").
To(apiHandler.handleDeploy).
Reads(spec.TfJob{}).
Writes(spec.TfJob{}))

apiV1Ws.Route(
apiV1Ws.DELETE("/tfjob/{namespace}/{tfjob}").
To(apiHandler.handleDeleteTfJob))

apiV1Ws.Route(
apiV1Ws.GET("/logs/{namespace}/{podname}").
To(apiHandler.handleGetPodLogs).
Writes([]byte{}))

wsContainer.Add(apiV1Ws)
return wsContainer, nil
}

func (apiHandler *APIHandler) handleGetTfJobs(request *restful.Request, response *restful.Response) {

//TODO: namespace handling
jobs, err := apiHandler.cManager.TfJobClient.List("default")

if err != nil {
panic(err)
}

response.WriteHeaderAndEntity(http.StatusOK, jobs)
}

func (apiHandler *APIHandler) handleGetTfJobDetail(request *restful.Request, response *restful.Response) {
namespace := request.PathParameter("namespace")
name := request.PathParameter("tfjob")

job, err := apiHandler.cManager.TfJobClient.Get(namespace, name)
if err != nil {
panic(err)
}

tfJobDetail := TfJobDetail{
TfJob: job,
}

if job.Spec.TensorBoard != nil {
tbSpec, err := apiHandler.cManager.ClientSet.CoreV1().Services(namespace).List(metav1.ListOptions{
LabelSelector: fmt.Sprintf("tensorflow.org=,app=tensorboard,runtime_id=%s", job.Spec.RuntimeId),
})
if err != nil {
panic(err)
}

if len(tbSpec.Items) > 0 {
// Should never be more than 1 service that matched, handle error
// Handle case where no tensorboard is found
tfJobDetail.TbService = &tbSpec.Items[0]
} else {
fmt.Println(fmt.Sprintf("Couldn't find a TensorBoard service for TfJob %s", job.Metadata.Name))
}
}

// Get associated pods
pods, err := apiHandler.cManager.ClientSet.CoreV1().Pods(namespace).List(metav1.ListOptions{
LabelSelector: fmt.Sprintf("tensorflow.org=,runtime_id=%s", job.Spec.RuntimeId),
})
if err != nil {
panic(err)
}
tfJobDetail.Pods = pods.Items

response.WriteHeaderAndEntity(http.StatusOK, tfJobDetail)
}

func (apiHandler *APIHandler) handleDeploy(request *restful.Request, response *restful.Response) {
client := apiHandler.cManager.TfJobClient
spec := new(spec.TfJob)
if err := request.ReadEntity(spec); err != nil {
panic(err)
}
j, err := client.Create(spec.Metadata.Namespace, spec)
if err != nil {
panic(err)
}
response.WriteHeaderAndEntity(http.StatusCreated, j)
}

func (apiHandler *APIHandler) handleDeleteTfJob(request *restful.Request, response *restful.Response) {
namespace := request.PathParameter("namespace")
name := request.PathParameter("tfjob")
client := apiHandler.cManager.TfJobClient
err := client.Delete(namespace, name)
if err != nil {
panic(err)
}
response.WriteHeader(http.StatusOK)
}

func (apiHandler *APIHandler) handleGetPodLogs(request *restful.Request, response *restful.Response) {
namespace := request.PathParameter("namespace")
name := request.PathParameter("podname")

logs, err := apiHandler.cManager.ClientSet.CoreV1().Pods(namespace).GetLogs(name, &v1.PodLogOptions{}).Do().Raw()
if err != nil {
panic(err)
}

response.WriteHeaderAndEntity(http.StatusOK, string(logs))
}
32 changes: 32 additions & 0 deletions dashboard/backend/main.go
Original file line number Diff line number Diff line change
@@ -0,0 +1,32 @@
package main

import (
"fmt"
"log"
"net/http"
"os"

"github.com/tensorflow/k8s/dashboard/backend/client"
"github.com/tensorflow/k8s/dashboard/backend/handler"
)

func main() {
log.SetOutput(os.Stdout)
cm, err := client.NewClientManager()
if err != nil {
log.Fatalf("Error while initializing connection to Kubernetes apiserver: %v", err)
}
apiHandler, err := handler.CreateHTTPAPIHandler(cm)
if err != nil {
log.Fatalf("Error while creating the API Handler: %v", err)
}

http.Handle("/api/", apiHandler)
http.Handle("/", http.StripPrefix("/", http.FileServer(http.Dir("/opt/mlkube/dashboard/frontend/build/"))))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Don't use /opt/mlkube since we are trying to remove references to mlkube.
I'd suggest
/opt/tensorflow_k8s


p := ":8080"
fmt.Println("Listening on", p)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why fmt and not log?


http.ListenAndServe(p, nil)

}
Empty file added dashboard/frontend/README.md
Empty file.
22 changes: 22 additions & 0 deletions dashboard/frontend/package.json
Original file line number Diff line number Diff line change
@@ -0,0 +1,22 @@
{
"name": "tensorflowk8s-dashboard",
"version": "0.1.0",
"description": "Dashboard for tensorflow/k8s.",
"private": true,
"dependencies": {
"lodash": "^4.17.4",
"material-ui": "^0.19.4",
"react": "^16.0.0",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

add prop-types if you are feeling good :-)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

and license?

Copy link
Member

@jimexist jimexist Nov 7, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

personal suggestion: prettier.js and standard.js

"react-dom": "^16.0.0",
"react-router-dom": "^4.2.2",
"react-scripts": "^1.0.17"
},
"scripts": {
"start": "react-scripts start",
"build": "react-scripts build",
"test": "react-scripts test --env=jsdom",
"eject": "react-scripts eject",
"toolbox": "react-toolbox-themr"
},
"license": "Apache-2.0"
}
Binary file added dashboard/frontend/public/favicon.ico
Binary file not shown.
39 changes: 39 additions & 0 deletions dashboard/frontend/public/index.html
Original file line number Diff line number Diff line change
@@ -0,0 +1,39 @@
<!doctype html>
<html lang="en">
<head>
<meta charset="utf-8">
<meta name="viewport" content="width=device-width, initial-scale=1, shrink-to-fit=no">
<meta name="theme-color" content="#000000">
<!--
manifest.json provides metadata used when your web app is added to the
homescreen on Android. See https://developers.google.com/web/fundamentals/engage-and-retain/web-app-manifest/
-->
<link rel="shortcut icon" href="favicon.ico">
<!--
Notice the use of %PUBLIC_URL% in the tags above.
It will be replaced with the URL of the `public` folder during the build.
Only files inside the `public` folder can be referenced from the HTML.

Unlike "/favicon.ico" or "favicon.ico", "%PUBLIC_URL%/favicon.ico" will
work correctly both with client-side routing and a non-root public URL.
Learn how to configure a non-root public URL by running `npm run build`.
-->
<title>TensorFlow/k8s</title>
</head>
<body>
<noscript>
You need to enable JavaScript to run this app.
</noscript>
<div id="root"></div>
<!--
This HTML file is a template.
If you open it directly in the browser, you will see an empty page.

You can add webfonts, meta tags, or analytics to this file.
The build step will place the bundled scripts into the <body> tag.

To begin the development, run `npm start` or `yarn start`.
To create a production bundle, use `npm run build` or `yarn build`.
-->
</body>
</html>
8 changes: 8 additions & 0 deletions dashboard/frontend/src/components/App.css
Original file line number Diff line number Diff line change
@@ -0,0 +1,8 @@
html {
font-family: roboto, sans-serif;
}

body {
font-family: 'Roboto', sans-serif;
background-color: #eee !important;
}
Loading