Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

TfJobs dashboard doesn't work with K8s API server proxy or envoy proxy #323

Closed
jlewi opened this issue Jan 17, 2018 · 18 comments
Closed

TfJobs dashboard doesn't work with K8s API server proxy or envoy proxy #323

jlewi opened this issue Jan 17, 2018 · 18 comments

Comments

@jlewi
Copy link
Contributor

jlewi commented Jan 17, 2018

I have the dashboard running and tried accessing it over the K8s API server proxy. For a brief second I see the dashboard but then I just get a grey screen.

When I connect via kubectl port-forward to the pod it works.

is this expected?

@wbuchwalter wbuchwalter changed the title TfJobs dashboard doesn't with K8s API server proxy TfJobs dashboard doesn't work with K8s API server proxy Jan 18, 2018
@wbuchwalter
Copy link
Contributor

This is because using kubectl proxy adds a prefix to the path, but the front-end try making request to the root.

see: https://kubernetes.io/docs/tasks/access-application-cluster/access-cluster/

Some web apps may not work, particularly those with client side javascript that construct urls in a way that is unaware of the proxy path prefix.

Prometheus and Kashti for example only work with kubectl port-forward. Iirc airflow also has issues with proxy but not port-forward?
Would you be ok with explicitly directing toward port-forward in the documentation?
If not I can investigate how kubernetes/dashboard solves this.

@jimexist
Copy link
Member

FYI i work with Docker for Mac locally, with support for Kubernetes. port-forwarding works for me, haven't tried the other way.

@jlewi
Copy link
Contributor Author

jlewi commented Jan 25, 2018

I think we should try to design our UIs to support having a path prefix.
The motivation is that to allow easy secure access from outside the cluster we want to have a single ingress point with a reverseproxy see kubeflow/kubeflow#11.

I think kubectl port-forward will be a source of friction so I'd like to avoid it. We will have many UIs (Argo, TFJobs, TensorBoard) etc...

I think an alternative solution would be to give services their own hostname e.g.
http://tfjobs-ui.mydomain.com
htttp://argo-ui.mydomain.com
etc...

I think we can use External DNS but a lot of questions with that approach still remain for me e.g
* How do we generate SSL certs dynamically for each hostname? kube-lego?
* Can we setup a secure proxy (e.g. IAP on GKE, or oauth-proxy) automatically for each?

@jlewi jlewi added this to the Kubecon Europe milestone Jan 25, 2018
@jlewi jlewi changed the title TfJobs dashboard doesn't work with K8s API server proxy TfJobs dashboard doesn't work with K8s API server proxy or envoy proxy Jan 26, 2018
@jlewi
Copy link
Contributor Author

jlewi commented Jan 26, 2018

So as part of creating a generic reverse proxy solution (kubeflow/kubeflow#11) I tried out TB and the dashboard UI behind an Envoy based reverse proxy.

TensorBoard works but the jobs UI doesn't.

My reverse proxy is rooting the app at

https://kubeflow.kubeflow-rl.io/tfjobs/ui/

When I navigate to that page chrome developer console reports a 200 for the main page

Request URL:https://kubeflow.kubeflow-rl.io/tfjobs/ui/
Request Method:GET
Status Code:200 
Remote Address:130.211.28.133:443
Referrer Policy:no-referrer-when-downgrade

But then there are failed requests to some resources because the request path isn't rooted correctly

Request URL:https://kubeflow.kubeflow-rl.io/static/css/main.e1b6790e.css
Request Method:GET
Status Code:404 
Remote Address:130.211.28.133:443
Referrer Policy:no-referrer-when-downgrade
Request URL:https://kubeflow.kubeflow-rl.io/static/js/main.e7655508.js
Request Method:GET
Status Code:404 
Remote Address:130.211.28.133:443
Referrer Policy:no-referrer-when-downgrade

The correct paths should be

https://kubeflow.kubeflow-rl.io/tfjobs/ui/static/css/main.e1b6790e.css
https://kubeflow.kubeflow-rl.io/tfjobs/ui/static/js/main.e7655508.js

Both of which return 200.

I'm guessing this is an issue in the app/html using something like an absolute path as opposed to relative paths.

@kflynn
Copy link

kflynn commented Feb 22, 2018

What does the Ambassador annotation added for this service look like?

@swiftdiaries
Copy link
Member

This is the generated TFJob service YAML:

apiVersion: v1
kind: Service
metadata:
  annotations:
    getambassador.io/config: |-
      ---
      apiVersion: ambassador/v0
      kind:  Mapping
      name: tfjobs-ui-mapping
      prefix: /tfjobs/ui/
      rewrite: /
      service: tf-job-dashboard.kubeflow
  name: tf-job-dashboard
  namespace: kubeflow
spec:
  ports:
  - port: 80
    targetPort: 8080
  selector:
    name: tf-job-dashboard
  type: ClusterIP

@kflynn
Copy link

kflynn commented Feb 22, 2018

If I'm reading things correctly in the report above, then switching the annotation to use

rewrite: /tfjobs/ui

seems like it should probably fix the issue.

@swiftdiaries
Copy link
Member

Let me try it out. Thank you for your comments @kflynn.

@kflynn
Copy link

kflynn commented Feb 22, 2018

Mea culpa -- left out a /:

rewrite: /tfjobs/ui/

@swiftdiaries
Copy link
Member

Ah I made the same mistake, last time as well.
This is the corrected service YAML:

apiVersion: v1
kind: Service
metadata:
  annotations:
    getambassador.io/config: |-
      ---
      apiVersion: ambassador/v0
      kind:  Mapping
      name: tfjobs-ui-mapping
      prefix: /tfjobs/ui/
      rewrite: /tfjobs/ui/
      service: tf-job-dashboard.kubeflow
  creationTimestamp: 2018-02-22T21:05:18Z
  name: tf-job-dashboard
  namespace: kubeflow
  resourceVersion: "3670"
  selfLink: /api/v1/namespaces/kubeflow/services/tf-job-dashboard
  uid: 163eda24-1814-11e8-9a3c-025000000001
spec:
  clusterIP: 10.107.237.136
  ports:
  - port: 80
    protocol: TCP
    targetPort: 8080
  selector:
    name: tf-job-dashboard
  sessionAffinity: None
  type: ClusterIP
status:
  loadBalancer: {}

This is the curl output:

$ curl -v http://localhost:8080/tfjobs/ui/
$ curl -v http://localhost:8080/tfjobs/ui/
*   Trying ::1...
* TCP_NODELAY set
* Connection failed
* connect to ::1 port 8080 failed: Connection refused
*   Trying 127.0.0.1...
* TCP_NODELAY set
* Connected to localhost (127.0.0.1) port 8080 (#0)
> GET /tfjobs/ui/ HTTP/1.1
> Host: localhost:8080
> User-Agent: curl/7.54.0
> Accept: */*
> 
< HTTP/1.1 404 Not Found
< content-type: text/plain; charset=utf-8
< x-content-type-options: nosniff
< date: Thu, 22 Feb 2018 21:36:29 GMT
< content-length: 19
< x-envoy-upstream-service-time: 5
< server: envoy
< 
404 page not found
* Connection #0 to host localhost left intact

@kflynn
Copy link

kflynn commented Feb 22, 2018

This is where I suggest dropping into the Ambassador Gitter channel so we can get this sorted out. :)

@swiftdiaries
Copy link
Member

Sure thing 👍

@kflynn
Copy link

kflynn commented Feb 23, 2018

To summarize the conversation from the Ambassador Gitter channel: I misunderstood what @jlewi said a month ago about where the application was expecting to be rooted vs where the Ambassador Mapping was trying to root it. Oops.

Given that the application thinks it's rooted at /, and that it uses absolute paths without any capability for configuration, pretty much the only way to do this is with an Ambassador Mapping that specifies

prefix: /
rewrite: /

If this is the only such application that needs to live behind this Ambassador, that should be enough -- Ambassador knows how to sort Mappings such that very broad matches come later for exactly this reason.

If this is not the only such application, you'll need another constraint in the Mapping. The simplest thing to do is probably to add host, so that only requests to a particular domain name will match (we use this a lot at Datawire, actually).

@swiftdiaries
Copy link
Member

Just a thought. We could change the dashboard service slightly.
This is from https://github.com/kubeflow/tf-operator/blob/master/dashboard/backend/main.go
Current:

p := ":8080"
log.Println("Listening on", p)
http.ListenAndServe(p, nil)

Proposed change:

p := ":8080"
log.Println("Listening on", p+"/tfjobs/ui/")
http.ListenAndServe(p+"/tfjobs/ui/", nil)

And then the ambassador service annotation to be,

prefix: /tfjobs/ui/
rewrite: /tfjobs/ui/

Would this be a viable solution? I don't know about the ramifications of this change.
cc/ @jimexist @wbuchwalter @jlewi

@wbuchwalter
Copy link
Contributor

@swiftdiaries I don't see any issue with your approach, however this would need a bit of testing.

@kflynn

Given that the application thinks it's rooted at /, and that it uses absolute paths without any capability for configuration

I haven't been very active as I am finishing a round of non-stop travel for work, but if you think this issue can be more cleanly solved by modifying the front-end let me know and I will try to find some time to do that.

@jlewi
Copy link
Contributor Author

jlewi commented Feb 23, 2018

@swiftdiaries per @kflynn's comment above

Given that the application thinks it's rooted at /

Can we fix the APP (its our code :)) so it will use the correct root?

I believe we want to configure Ambassador using

prefix: /tf-jobs/ui/
rewrite: /tf-jobs/ui/

If we do this then I'd expect the request that hits the UI has in it somewhere the URL

https://my-domain/tf-jobs/ui/

So we'd like the UI to use https://my-domain/tf-jobs/ui/ as the root for the application and then all paths like "main.e7655508.js" should be computed in terms of that root.

So can we update our REACT app to correctly compute the ROOT and then use that as a prefix for URLs?

@swiftdiaries
Copy link
Member

I'll take a look at this.

@kkasravi
Copy link
Contributor

kkasravi commented Feb 24, 2018

try adding homepage to package.json in tf-operator/dashboard/frontend

{
  "name": "tensorflowk8s-dashboard",
  "version": "0.1.0",
  "description": "Dashboard for kubeflow/tf-operator.",
  "private": true,
  "homepage": ".",
  "dependencies": {

See facebook/create-react-app#527 and https://github.com/facebook/create-react-app/blob/master/packages/react-scripts/template/README.md#building-for-relative-paths

I wanted to test this out myself - i'm a bit unclear how to build/release - it looks like you prep a docker image to do the build and then build within that docker image? The developer instructions seem to assume you're on a linux machine or vm.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

7 participants