-
Notifications
You must be signed in to change notification settings - Fork 1.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[server][dashboard] Allow cancelling Prebuilds #5865
Conversation
8bc80a1
to
b034f03
Compare
b034f03
to
79fd61c
Compare
@@ -107,7 +110,8 @@ export class PrebuildManager { | |||
|
|||
span.setTag("starting", true); | |||
await this.workspaceStarter.startWorkspace({ span }, workspace, user, [], {excludeFeatureFlags: ['full_workspace_backup']}); | |||
return { wsid: workspace.id, done: false }; | |||
const prebuild = (await prebuildPromise)!; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm just slightly worried about this code:
- Logically,
findPrebuildByWorkspaceID(workspace.id)
should always find aprebuild
, given that the workspace was created from aStartPrebuildContext
context - However, what if there is an edge case, or a new bug in the future 😱
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
indeed, it seems to lack a contract here, i.e. it seems easy to break because it relies on assumptions, though it's not changed frequently.
perhaps a dedicated workspaceFactory.createPrebuild(..., context): { ws: Workspace, pb: Prebuild }
would document the requirements more maintainable? just thinking out loud...
if (!workspace) { | ||
console.error('Unknown workspace id.', { workspaceId }); | ||
throw new Error('Unknown workspace ' + workspaceId); | ||
} | ||
const prebuild = (await prebuildPromise)!; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same here (see previous comment)
@@ -265,6 +265,7 @@ export class WorkspaceStarter { | |||
await this.workspaceDb.trace({ span }).storePrebuiltWorkspace(prebuild) | |||
await this.messageBus.notifyHeadlessUpdate({span}, workspace.ownerId, workspace.id, <HeadlessWorkspaceEvent>{ | |||
type: HeadlessWorkspaceEventType.Aborted, | |||
// TODO: `workspaceID: workspace.id` not needed here? (found in ee/src/prebuilds/prebuild-queue-maintainer.ts and ee/src/bridge.ts) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@csweichel We've never added a workspaceID
here, but I see it in all other occurrences of notifyHeadlessUpdate
. Is it really okay to omit it here? (Couldn't find who listens to this.)
f8ca4d1
to
1763be7
Compare
/werft run 👍 started the job as gitpod-build-jx-cancel-prebuild.16 |
prebuild.state = 'aborted'; | ||
prebuild.error = 'Cancelled manually'; | ||
await this.workspaceDB.trace({ span }).storePrebuiltWorkspace(prebuild); | ||
await this.messageBus.notifyHeadlessUpdate({ span }, user.id, prebuild.buildWorkspaceId, <HeadlessWorkspaceEvent>{ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
how does this play together with state transitions driven by the bridge?
the bridge is receiving status updates from ws manager and does update the prebuild workspace "in parallel". there could be races of storePrebuiltWorkspace
calls, potentially leading to override the Aborted
message?
shouldn't the cancellation be a request to the wsman, for which in turn the bridge would to the notifications and updates from this change?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good question, thanks! To be honest, I simply copied the prebuild cancellation logic from bridge.ts
:
gitpod/components/ws-manager-bridge/ee/src/bridge.ts
Lines 77 to 96 in 6bc80eb
} else if (!!status.conditions!.failed) { | |
prebuild.state = "aborted"; | |
prebuild.error = status.conditions!.failed; | |
headlessUpdateType = HeadlessWorkspaceEventType.Aborted; | |
} else if (!!status.conditions!.headlessTaskFailed) { | |
prebuild.state = "available"; | |
prebuild.error = status.conditions!.headlessTaskFailed; | |
prebuild.snapshot = status.conditions!.snapshot; | |
headlessUpdateType = HeadlessWorkspaceEventType.FinishedButFailed; | |
} else { | |
prebuild.state = "available"; | |
prebuild.snapshot = status.conditions!.snapshot; | |
headlessUpdateType = HeadlessWorkspaceEventType.FinishedSuccessfully; | |
} | |
await this.workspaceDB.trace({span}).storePrebuiltWorkspace(prebuild); | |
await this.messagebus.notifyHeadlessUpdate({span}, userId, workspaceId, <HeadlessWorkspaceEvent>{ | |
type: headlessUpdateType, | |
workspaceID: workspaceId, | |
}); |
and also from how it was implemented in the (now-deleted) PrebuildQueueMaintainer
:
gitpod/components/server/ee/src/prebuilds/prebuild-queue-maintainer.ts
Lines 144 to 175 in 2aa2db6
protected async discardPrebuild(ctx: TraceContext, prebuild: PrebuiltWorkspace) { | |
const span = TraceContext.startSpan("PrebuildQueueMaintainer.discardPrebuild", {}) | |
try { | |
prebuild.state = 'aborted'; | |
this.workspaceDB.trace({span}).storePrebuiltWorkspace(prebuild); | |
let userId: string; | |
const workspace = await this.workspaceDB.trace({span}).findById(prebuild.buildWorkspaceId); | |
if (workspace) { | |
userId = workspace.ownerId; | |
} else { | |
// this is a bit unorthodox but should still work: the userId is used to build the workspace exchange topic | |
// on which we send the message. We hardly every (if ever at all) listen for user-specific updates, the prebuild | |
// status maintainer certainly doesn't. Thus, even if userId is unknown, most parts of the system should still work. | |
// | |
// Mind, this branch is really the exception. Other parts of the system will have broken before we get here, i.e. | |
// see a prebuild workspace without an actual workspace attached. | |
userId = 'unknown'; | |
} | |
await this.messagebus.notifyHeadlessUpdate({span}, userId, prebuild.buildWorkspaceId, <HeadlessWorkspaceEvent>{ | |
type: HeadlessWorkspaceEventType.Aborted, | |
workspaceID: prebuild.buildWorkspaceId, | |
}); | |
} catch (err) { | |
TraceContext.logError({span}, err); | |
throw err; | |
} finally { | |
span.finish(); | |
} | |
} |
I don't quite understand the full state machine yet, but it sounds like I should look more deeply into it. 👀
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@jankeromnes, first of https://github.com/gitpod-io/gitpod/pull/5835/files
then, you see it's state propagation from the ws-manager-bridge
which runs outside of server. which is the whole point: ws-manager running in region A and propagating event from there, but then you want to cancel a prebuild from server in region B. in such cases I'm puzzled about the stream of states a frontend would receive, thus the message for the user would be confusing.
in general, if we want something to happen with the workspace, we're sending requests to the ws-manager of the according region, which is already taken care of by the client connection as far as know. btw, this is what happens for the stop part of this new composite action called cancelPrebuild. I think what needs to be done here is to introduce a CancelPrebuild request in ws-manager, which basically does the stopping but also signals the action to the bridge to do the rest, i.e. update db and push notifications.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
After our sync discussion, it seems we'll want to remove the new prebuildManager.cancelPrebuild
call, and instead make stopWorkspace
also cancel the prebuild (inside the correct ws-manager
, e.g. with a new flag on the StopWorkspaceRequest
if necessary).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Idea: When stopping a headless workspace manually, set conditions.headlessTaskFailed
to something like 'Cancelled manually
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
technically that would work.
// headless_task_failed indicates that a headless workspace task failed
just the semantics seems to be different. this value is currently derived from Message regarding the last termination of the container
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@csweichel, what's the best way to signal a cancelled prebuild?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@AlexTugarev Note: See also this Slack thread (internal)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The way this builds it out, incoming updates from ws-manager-bridge could overwrite the status.
What about making cancelPrebuild
call StopWorkspace
on the prebuild workspace and have the supervisor write a message to the termination log when its stopped while the tasks are still running?
If you wanted to provide feedback that cancelation is happening, you could introduce a state to the prebuild that indicates this. This way you could also add a reconciliation loop in bridge that calls StopWorkspace
if need be.
@@ -1554,6 +1554,28 @@ export class GitpodServerEEImpl extends GitpodServerImpl<GitpodClient, GitpodSer | |||
return prebuild; | |||
} | |||
|
|||
async cancelPrebuild(projectId: string, prebuildId: string): Promise<void> { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
in theory we could get the projectId
from the workspace of that prebuild.
what's the advantage of adding it to the list of args, besides remove a couple of db queries?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what's the advantage of adding it to the list of args, besides remove a couple of db queries?
None. Removing a few unnecessary DB queries sounds great, and this is information we typically already have when making the call, so we might as well not omit it. 🙂 Do you see it otherwise?
Really enjoying to get some control over the running prebuild! That's an awesome feature 🎉 |
/werft run 👍 started the job as gitpod-build-jx-cancel-prebuild.17 |
This comment has been minimized.
This comment has been minimized.
1763be7
to
9c5a4b9
Compare
1d15e1a
to
4661ddf
Compare
Okay, ready for review again, now using the A few screenshots at random: @AlexTugarev @gtsiolis please take a look when you have time. 🙏 |
4661ddf
to
1ccc425
Compare
1ccc425
to
be09e48
Compare
/werft run 👍 started the job as gitpod-build-jx-cancel-prebuild.24 |
/lgtm this works really nicely and improves iterations on the config. 👍🏻 |
LGTM label has been added. Git tree hash: 33a115dabe21107b63953cca6e21d038f5902fee
|
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: AlexTugarev Associated issue: #5612 The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
Description
Implement a server API method to cancel prebuilds, and point all the Prebuilds controls in the UI to it.
Also show cancelled Prebuilds with a new
CANCELED
state (in gray).Related Issue(s)
Fixes #5612
Fixes #6153
How to test
Release Notes
/uncc