Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[ws-manager] Wait for workspace pod to be ready #8258

Closed
wants to merge 4 commits into from
Closed

Conversation

sagor999
Copy link
Contributor

@sagor999 sagor999 commented Feb 16, 2022

Description

Make sure to wait for container to start up, in case of out of memory or other errors.

Related Issue(s)

Fixes #8253

How to test

Spin up new cluster in workspace preview env:
./new-vm.sh -v aledbf-wait.10 -z us-west1-c
then try to start up workspaces in it.

Release Notes

Improve handling of "Out of Memory" error when starting up workspaces

Documentation

@sagor999
Copy link
Contributor Author

sagor999 commented Feb 17, 2022

/werft with-clean-slate-deployment

👎 unknown command: with-clean-slate-deployment
Use /werft help to list the available commands

@sagor999
Copy link
Contributor Author

sagor999 commented Feb 17, 2022

/werft help

👍 You can interact with werft using: /werft command <args>.
Available commands are:

  • /werft run [annotation=value] which starts a new werft job from this context.
    You can optionally pass multiple whitespace-separated annotations.
  • /werft help displays this help

@sagor999
Copy link
Contributor Author

sagor999 commented Feb 17, 2022

/werft run

👍 started the job as gitpod-build-aledbf-wait.10

@sagor999 sagor999 marked this pull request as ready for review February 17, 2022 01:53
@sagor999 sagor999 requested a review from a team February 17, 2022 01:53
@github-actions github-actions bot added the team: workspace Issue belongs to the Workspace team label Feb 17, 2022
@csweichel
Copy link
Contributor

This would help with the problem we're trying to solve, but most likely deteriorate the feedback mechanism towards the user by all but removing the Pending phase. I.e. now, users would see a very long "preparing workspace" phase until they finally get redirected.

How much of a concern that is is something we'd see in practice.
Certainly this is another bit of motivation to move towards a CRD based solution.

@sagor999
Copy link
Contributor Author

I reduced exponential backoff factor, as otherwise in worst case scenario it would have been waiting for hours there to retry.

@@ -199,7 +199,7 @@ func (m *Manager) StartWorkspace(ctx context.Context, req *api.StartWorkspaceReq
backoff := wait.Backoff{
Steps: 10,
Duration: 100 * time.Millisecond,
Factor: 5.0,
Factor: 2.5,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would it make sense to leverage the cap property for backoff? I believe it's intent is to ensure the duration does't exceed a certain time limit.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, we could do that as well. What would be a reasonable cap? 5 min?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think 5 minutes will do.

We can always bring it down if we need, but this way it's a predictable experience.

W/o accounting for sleep (and jitter), that would get users to iteration 8 (but not beyond) here: https://go.dev/play/p/dtUxS6TwTrw

@sagor999
Copy link
Contributor Author

Continue working in new PR (due to werft issues in this one): #8289

@sagor999 sagor999 closed this Feb 17, 2022
@aledbf aledbf deleted the aledbf/wait branch August 13, 2022 13:03
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
release-note size/M team: workspace Issue belongs to the Workspace team
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Starting new workspace causes it to fail with OOM error from kubelet
7 participants