Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Eclipse Che Installation in AWS EC2, failure to start a workspace #13690

Closed
svkr2k opened this issue Jul 4, 2019 · 27 comments
Closed

Eclipse Che Installation in AWS EC2, failure to start a workspace #13690

svkr2k opened this issue Jul 4, 2019 · 27 comments
Assignees
Labels
kind/bug Outline of a bug - must adhere to the bug report template. status/analyzing An issue has been proposed and it is currently being analyzed for effort and implementation approach status/info-needed More information is needed before the issue can move into the “analyzing” state for engineering.

Comments

@svkr2k
Copy link

svkr2k commented Jul 4, 2019

Description

Trying to run che-server:7.0.0-RC-2.0. che-server installs and runs fine, able to view dashboard and stacks/devfiles.

But, when i try to create a workspace, it fails. The logs are given below.

Some of the experts in a forum feel that the che master pod is unable to communicate with the theia pod.

In local PC everyting works fine and able to create workspaces.
But, in AWS EC2, dashboard comes up, but creating workspace fails.

Reproduction Steps

minikube start --cpus 2 --memory 4096 --extra-config=apiserver.authorization-mode=RBAC --vm-driver=none
kubectl create clusterrolebinding add-on-cluster-admin --clusterrole=cluster-admin --serviceaccount=kube-system:default
kubectl create serviceaccount tiller --namespace kube-system
kubectl apply -f ./tiller-rbac.yaml
helm init --service-account tiller --wait
minikube addons enable ingress
helm dependencies update --skip-refresh ./
minikube ip
export CHE_DOMAIN=$(minikube ip)
helm upgrade --install che --force --namespace che --set global.ingressDomain=${CHE_DOMAIN}.nip.io --set cheImage=eclipse/che-server:7.0.0-RC-2.0 ./

OS and version:
Ubuntu 18.04
AWS EC2 instance

Diagnostics:
Here is the log:

2019-07-02 13:34:48,148[nio-8080-exec-5]  [INFO ] [o.e.c.a.w.s.WorkspaceRuntimes 433]   - Starting workspace 'che/wksp-0m6t' with id 'workspaceh47e7hjup6tqzgqd' by user 'che'
2019-07-02 13:35:22,846[nio-8080-exec-2]  [WARN ] [a.c.w.i.BasicWebSocketEndpoint 145]  - Web socket session error
2019-07-02 13:35:22,847[nio-8080-exec-2]  [WARN ] [a.c.w.i.BasicWebSocketEndpoint 129]  - Closing unidentified session`
2019-07-02 13:39:18,077[aceSharedPool-1]  [WARN ] [.i.k.KubernetesInternalRuntime 245]  - Failed to start Kubernetes runtime of workspace workspaceh47e7hjup6tqzgqd. Cause: Server 'theia' in machine 'theia-ide3dy' not available.
2019-07-02 13:39:18,834[aceSharedPool-1]  [INFO ] [o.e.c.a.w.s.WorkspaceRuntimes 856]   - Workspace 'che:wksp-0m6t' with id 'workspaceh47e7hjup6tqzgqd' start failed
@tsmaeder tsmaeder added the status/info-needed More information is needed before the issue can move into the “analyzing” state for engineering. label Jul 4, 2019
@tsmaeder
Copy link
Contributor

tsmaeder commented Jul 4, 2019

@svkr2k how are you creating and starting the workspace?

@benoitf
Copy link
Contributor

benoitf commented Jul 4, 2019

I think it's a duplicate of #13435

@tsmaeder
Copy link
Contributor

tsmaeder commented Jul 5, 2019

@benoitf Looks likey, yes. The logs here and on the other bug look uninteresting. What additional info could we request that would help us pin down the issue?

@tsmaeder tsmaeder added this to the 7.0.0 milestone Jul 5, 2019
@tsmaeder
Copy link
Contributor

tsmaeder commented Jul 5, 2019

Tentatively marking this as 7.0.0, not starting is not good.

@l0rd
Copy link
Contributor

l0rd commented Jul 8, 2019

I suspect that minikube ip doesn't provide the good IP address in the case of EC2. I mean it provides the internal IP but not the public one. See EC2 user guide for more details.

@skabashnyuk @sleshchenko do you think that this may be the problem? Do you have other ideas about why wsmaster and theia cannot communicate?

@l0rd l0rd added status/analyzing An issue has been proposed and it is currently being analyzed for effort and implementation approach team/platform and removed status/info-needed More information is needed before the issue can move into the “analyzing” state for engineering. labels Jul 8, 2019
@l0rd l0rd removed this from the 7.0.0 milestone Jul 8, 2019
@l0rd l0rd added the severity/P1 Has a major impact to usage or development of the system. label Jul 8, 2019
@svkr2k
Copy link
Author

svkr2k commented Jul 12, 2019

@svkr2k how are you creating and starting the workspace?

@tsmaeder , thank you for your question.
I was able to open http://che-che.ip-address.nip.io and see the che dashboard.
In the dashboard, i selected a stack from the list and clicked on Create workspace button to create the workspace.

Hope this helps. Thank you for looking into this. I would really be happy if i get a solution for the issue.
I see the same issue described above using the version 7.0.0-rc-3.0 too.
(Also, kindly add this to 7.0.0 milestone)

@skabashnyuk
Copy link
Contributor

@svkr2k just to clarify your environment. Are you running Che locally with minikube or on AWS?

@andy316x
Copy link

I experience this issue if running in multi-user mode, interestingly if I deploy Che in single-user mode Theia loads successfully, however it is then not able to create terminals. I am also using Che in AWS.

@benoitf
Copy link
Contributor

benoitf commented Jul 12, 2019

@andy316x with which browser for terminals ? There is an issue with firefox

@andy316x
Copy link

@benoitf I am using Chrome so not sure if that is the problem

@svkr2k
Copy link
Author

svkr2k commented Jul 16, 2019

@skabashnyuk ,

@svkr2k just to clarify your environment. Are you running Che locally with minikube or on AWS?

I'm, running che on AWS (ubuntu 16.04) in single-user mode and created workspace using any of the already available stack/devfile from the dashboard.

(Kindly note that when i run che in locally in ubuntu or windows PC, it works fine.)

@skabashnyuk
Copy link
Contributor

@svkr2k I have no AWS EC2 environment available to test. Can you try to install che with https://github.com/che-incubator/chectl ?

@svkr2k
Copy link
Author

svkr2k commented Jul 16, 2019

@svkr2k just to clarify your environment. Are you running Che locally with minikube or on AWS?

Hi @skabashnyuk , to clarify further: I'm running Che on AWS "with minikube".

@svkr2k
Copy link
Author

svkr2k commented Jul 16, 2019

Hi all,
I have a question regarding the above issue:
While creating a workspace from dashboard, will the che-host try to communicate with workspace container using .nip.io url?

@slemeur slemeur added this to the 7.0.0 milestone Jul 16, 2019
@slemeur slemeur added the kind/bug Outline of a bug - must adhere to the bug report template. label Jul 16, 2019
@skabashnyuk
Copy link
Contributor

@svkr2k can you try the way to install che how @l0rd suggested here? #13838 (comment)

@svkr2k
Copy link
Author

svkr2k commented Jul 16, 2019

@skabashnyuk , sure, i shall try that : #13838 (comment)

Also, please note that in AWS EC2 instance, i start minikube using the following:
sudo minikube start --memory=4096 --vm-driver=none

@svkr2k
Copy link
Author

svkr2k commented Jul 16, 2019

@svkr2k can you try the way to install che how @l0rd suggested here? #13838 (comment)

Thank you very much for providing inputs.

  • Tried to run che using the following commands:

sudo minikube start --memory=4096 --vm-driver=none

sudo kubectl create secret generic che-tls [email protected]

sudo chectl server:start --installer=helm --multiuser --platform=k8s --tls --self-signed-cert --domain=10.9.2.247.nip.io --cheimage=eclipse/che-server:7.0.0-rc-3.0

  • Here is the error i got:

~$ sudo chectl server:start --installer=helm --multiuser --platform=k8s --tls --self-signed-cert --domain=10.9.2.247.nip.io --cheimage=eclipse/che-server:7.0.0-rc-3.0
sudo: unable to resolve host ip-10-9-2-247
✈️ Kubernetes preflight checklist
✔ Verify if kubectl is installed
✔ Verify remote kubernetes status...done.
✔ Verify domain is set...set to 10.9.2.247.nip.io.
❯ 🏃‍ Running Helm to install Che
✔ Verify if helm is installed
✔ Check for TLS prerequisites che-tls secret exist.
✔ Create Tiller Role Binding...it already exist.
✔ Create Tiller Service Account...it already exist.
✔ Create Tiller RBAC
✔ Create Tiller Service...it already exist.
✔ Preparing Che Helm Chart...done.
✔ Updating Helm Chart dependencies...done.
✖ Deploying Che Helm Chart
→ Unable to execute helm command helm upgrade --install che --force --namespace che --set global.ingressDomain=10.9.2.247.nip.io --set global.cheDomain=10.9.2.247.nip.io --set cheImage=eclipse/che-server:7.0.0-rc-3.0 --set global.c

Error: Unable to execute helm command helm upgrade --install che --force --namespace che --set global.ingressDomain=10.9.2.247.nip.io --set global.cheDomain=10.9.2.247.nip.io --set cheImage=eclipse/che-server:7.0.0-rc-3.0 --set global.cheWorkspacesNamespace=che -f /home/ubuntu/.cache/chectl/templates/kubernetes/helm/che/values/multi-user.yaml -f /home/ubuntu/.cache/chectl/templates/kubernetes/helm/che/values/tls.yaml /home/ubuntu/.cache/chectl/templates/kubernetes/helm/che/ / Error: validation failed: [unable to recognize "": no matches for kind "Certificate" in version "certmanager.k8s.io/v1alpha1", unable to recognize "": no matches for kind "ClusterIssuer" in version "certmanager.k8s.io/v1alpha1"]
at HelmHelper. (/snapshot/chectl/lib/installers/helm.js:0:0)
at Generator.next ()
at fulfilled (/snapshot/chectl/node_modules/tslib/tslib.js:107:62)

  • Additional information:
    sudo helm version
    Client: &version.Version{SemVer:"v2.14.1", GitCommit:"5270352a09c7e8b6e8c9593002a73535276507c0", GitTreeState:"clean"}
    Server: &version.Version{SemVer:"v2.14.1", GitCommit:"5270352a09c7e8b6e8c9593002a73535276507c0", GitTreeState:"clean"}

@l0rd l0rd mentioned this issue Jul 16, 2019
85 tasks
@benoitf
Copy link
Contributor

benoitf commented Jul 16, 2019

$ kubectl create namespace cert-manager
$ kubectl label namespace cert-manager certmanager.k8s.io/disable-validation=true
$ kubectl apply -f https://github.com/jetstack/cert-manager/releases/download/v0.8.1/cert-manager.yaml --validate=false

@svkr2k
Copy link
Author

svkr2k commented Jul 17, 2019

Greetings, @benoitf , @skabashnyuk . Thank you very much !

Here are the steps i followed:

$ sudo minikube start --memory=4096 --vm-driver=none
$ sudo kubectl create secret generic che-tls [email protected]

$ kubectl create namespace cert-manager
$ kubectl label namespace cert-manager certmanager.k8s.io/disable-validation=true
$ kubectl apply -f https://github.com/jetstack/cert-manager/releases/download/v0.8.1/cert-manager.yaml --validate=false

$ sudo chectl server:start --installer=helm --multiuser --platform=k8s --tls --self-signed-cert --domain=10.9.2.247.nip.io --cheimage=eclipse/che-server:7.0.0-rc-3.0

Here are the other errors that i got:

~$ sudo chectl server:start --installer=helm --multiuser --platform=k8s --tls --self-signed-cert --domain=10.9.2.247.nip.io --cheimage=eclipse/che-server:7.0.0-rc-3.0
sudo: unable to resolve host ip-10-9-2-247
  ✔ ✈️  Kubernetes preflight checklist
    ✔ Verify if kubectl is installed
    ✔ Verify remote kubernetes status...done.
    ✔ Verify domain is set...set to 10.9.2.247.nip.io.
  ✔ 🏃‍  Running Helm to install Che
    ✔ Verify if helm is installed
    ✔ Check for TLS prerequisites che-tls secret exist.
    ✔ Create Tiller Role Binding...it already exist.
    ✔ Create Tiller Service Account...it already exist.
    ✔ Create Tiller RBAC
    ✔ Create Tiller Service...it already exist.
    ✔ Preparing Che Helm Chart...done.
    ✔ Updating Helm Chart dependencies...done.
    ✔ Deploying Che Helm Chart...done.
  ❯ ✅  Post installation checklist
    ❯ PostgreSQL pod bootstrap
      ✖ scheduling
        → ERR_TIMEOUT: Timeout set to pod wait timeout 300000. podExist: false, currentPhase: undefined
        downloading images
        starting
      Keycloak pod bootstrap
      Che pod bootstrap
      Retrieving Che Server URL
      Che status check
Error: ERR_TIMEOUT: Timeout set to pod wait timeout 300000. podExist: false, currentPhase: undefined
    at KubeHelper.<anonymous> (/snapshot/chectl/lib/api/kube.js:0:0)
    at Generator.next (<anonymous>)
    at fulfilled (/snapshot/chectl/node_modules/tslib/tslib.js:107:62)
~$ sudo kubectl get pod --namespace che
sudo: unable to resolve host ip-10-9-2-247
NAME                        READY   STATUS    RESTARTS   AGE
che-67777bbb5b-r6r4h        0/1     Running   21         102m
keycloak-5cf455c44d-lx4hl   1/1     Running   0          102m
postgres-6c4d6c764c-5vqw2   1/1     Running   0          102m
~$ sudo kubectl get events --namespace che  -o custom-columns=TIMESTAMP:lastTimestamp,TYPE:type,MESSAGE:message -w
sudo: unable to resolve host ip-10-9-2-247
TIMESTAMP              TYPE      MESSAGE
2019-07-17T04:58:26Z   Normal    Pulling image "eclipse/che-server:7.0.0-rc-3.0"
2019-07-17T05:13:56Z   Warning   Readiness probe failed: HTTP probe failed with statuscode: 500
2019-07-17T04:38:25Z   Warning   Liveness probe failed: HTTP probe failed with statuscode: 500
2019-07-17T05:08:31Z   Warning   Back-off restarting failed container

@svkr2k
Copy link
Author

svkr2k commented Jul 17, 2019

Hi @benoitf , @skabashnyuk .
I believe that the urls used for communicating with the containers could be based on .nip.io.
As an alternative, could you kindly send me a link to the documentation or steps to setup che to use domain name based urls instead of .nip.io ?
Something similar to "http://che-che.myapp.mydomain.com"?
Kindly let me know how to setup like that, so that i can test that too... Thank you for your help.

@skabashnyuk
Copy link
Contributor

@svkr2k I believe that this is -b parameter of chectl https://www.eclipse.org/che/docs/che-7/che-quick-starts.html

@skabashnyuk
Copy link
Contributor

@benoitf when you made your test on AWS did you use minikube aswell?

@l0rd
Copy link
Contributor

l0rd commented Jul 23, 2019

@svkr2k when you first opened this issue you tried to install single user Che with TLS disabled. At some point though you tested with TLS and multiuser. Have you been able to solve your original problem (Che single user without tls on AWS)?

I am asking because we have some open issues about deploying Che with TLS using helm charts (hence this would be a duplicate) and I would close this one if it your original issue was fixed.

@l0rd l0rd added the status/info-needed More information is needed before the issue can move into the “analyzing” state for engineering. label Jul 23, 2019
@l0rd l0rd removed this from the 7.0.0 milestone Jul 23, 2019
@l0rd
Copy link
Contributor

l0rd commented Jul 23, 2019

@rhopp @slemeur @tsmaeder @nickboldt I have removed the priority and the milestone here because we don't know yet if that's a duplicate, an issue that has been solved or a brand new one.

@l0rd l0rd removed the severity/P1 Has a major impact to usage or development of the system. label Jul 23, 2019
@svkr2k
Copy link
Author

svkr2k commented Jul 24, 2019

Hi @l0rd , Thank you.
the original issue was due to the .nip.io urls being blocked in my domain. Now, i'm able to create a workspace as we have 'unblocked' nip.io based url.

But, my goal is to use hostname-based url for che. we could achieve this using defaulthost.yaml while running the helm upgrade.
sudo helm upgrade --install che --namespace che -f ./values/default-host.yaml --set global.ingressDomain=myapp.mydomain.com ./

It would be nice if we could get the recommended practise to host che in Single-user and Multi-user modes in AWS. When we used host-based url, i could not create the workspace as the communication with the created workspace container failed.

@l0rd
Copy link
Contributor

l0rd commented Jul 24, 2019

@svkr2k what you are describing is an existing issue about default-host/single-host: #12971. Unfortunately that's something that we haven't solved yet. We will need a few weeks to fix it properly.

But, if multi-host is an option (i.e. you are ok to use a wildcard SSL certificate), it's possible to configure it with a workaround. We have an ongoing PR to avoid the workaround and make it easier to deploy Che on AWS and GCP: #12971.

And in the meantime we are writing the documentation as well but we are kind of blocked by the issues above.

@mshaposhnik
Copy link
Contributor

So I'm closing current issue since it will be fixed by #12971

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/bug Outline of a bug - must adhere to the bug report template. status/analyzing An issue has been proposed and it is currently being analyzed for effort and implementation approach status/info-needed More information is needed before the issue can move into the “analyzing” state for engineering.
Projects
None yet
Development

No branches or pull requests

8 participants