-
Notifications
You must be signed in to change notification settings - Fork 1.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Calling StopWorkspace on a workspace that is not fully up yet #11453
Comments
@sagor999 is this issue for prebuilds, or are you considering this to also include regular workspaces and image builds? I'm going to assume prebuilds (at least for now). It looks like we'd want the phase to be stopped, and failed to be considered false [1][2]. But that might be naive. For example, if we indicate the prebuild stopped and did not fail, that could signal that there is a valid backup to create a new workspace from (given the prebuild indicates it was successful) - and there may not be depending on how far the prebuild got. We should get together later this week (after the deploy of gen55) to sort out a couple options/ideas. |
@kylos101 yes, for prebuilds. Good point about phase. 🤔 |
@geropl Would it make sense to add a Cancelled phase? For example, to handle prebuilds which are interrupted because a newer commit triggered a webhook and subsequent prebuild. |
@kylos101 I think closet to your current model - where we have About the "has valid backup" signal: Not sure it's out-of-scope here, or irrelevant with the PVC effort, but it would be awesome to have a clear, separate signal for that. A condition |
@geropl we've updated the expected behavior and scheduled this, thank you for your input! 👍 |
@geropl would like to pick your brain in terms of implementation details of this from webapp perspective. gitpod/components/ws-manager-api/core.proto Lines 112 to 119 in e437e18
I would suggest to add additional StopWorkspacePolicy: This policy would be similar to You would then need to specify this policy when you are cancelling already running prebuild if there is a newer prebuild running already. I will also add a condition to the workspace that would show that workspace was aborted\cancelled, so that UI wise we can signal this to the user. So workspace would be marked as failed, and will have wasAborted condition set to true as well. Why failed? To ensure that any existing code will filter that workspace out and will not attempt to use it to restore anything from it. gitpod/components/ws-manager-api/core.proto Line 389 in e437e18
Will add this into that struct:
WDYT? |
@sagor999 That sounds perfect, thank you! 💯 From that we can derive all states clearly, plus are backwards compatible. 🧘 ☁️ One small nit/some 🚲 shedding: from the previous values I'd expect the condition to be called |
@sagor999 moving from scheduled to in-progress as this has a draft PR 🙏 |
@sagor999 As far as I understood your PR is independent and backwards compatible. Do you agree? |
@geropl yes, it is. sounds good, I will mark that PR as ready for review then. |
re-open as this requires a fix from webapp team now. |
@sagor999 I assume the thought is to leave this in Awaiting Deployment, until the webapp change ships, and then move this to in-validation? |
@kylos101 yes. |
Bug description
Currently if calling StopWorkspace on a workspace that has not fully started up yet will cause that workspace state to become failed.
We need to better handle this case. This is currently used by webapp to stop already running prebuild if newer prebuild has started (commit is obsolete, etc).
It is negatively impacting our workspace success rate (~5-10% depending on prebuild volume).
Steps to reproduce
https://gitpod.slack.com/archives/C02EN94AEPL/p1657828853695599
Workspace affected
No response
Expected behavior
conditions
object is a child ofstatus
column ind_b_workspace_instance
table.On the workspace side, for
ws-manager
:Add a
wasCancelled
attribute to theconditions
object, and inspect the field when setting the phase on stop workspace.On the webapp side, for
server
:If the
wasCancelled
is true, set the phase tostopped
(instead offailed
). For workspace start, we then have to make sure we do not try and start a new workspace instance from a prior workspace instance where its conditions havewasCancelled
set to true.Example repository
No response
Anything else?
No response
The text was updated successfully, but these errors were encountered: