Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[server][dashboard] Allow cancelling Prebuilds #5865

Merged
merged 1 commit into from
Oct 18, 2021
Merged

Conversation

jankeromnes
Copy link
Contributor

@jankeromnes jankeromnes commented Sep 24, 2021

Description

Implement a server API method to cancel prebuilds, and point all the Prebuilds controls in the UI to it.

Also show cancelled Prebuilds with a new CANCELED state (in gray).

Related Issue(s)

Fixes #5612
Fixes #6153

How to test

  1. In the Project Configurator, run a Prebuild, and then cancel it. Repeat once.
  2. Trigger a Prebuild, then cancel it from the Prebuilds page.
  3. Same thing, but from the Project a.k.a. Branches page.

Release Notes

Make it possible to cancel pending or running Prebuilds

/uncc

@jankeromnes jankeromnes force-pushed the jx/cancel-prebuild branch 7 times, most recently from 8bc80a1 to b034f03 Compare October 1, 2021 12:32
@roboquat roboquat added size/L and removed size/M labels Oct 1, 2021
@jankeromnes jankeromnes marked this pull request as ready for review October 1, 2021 12:59
@@ -107,7 +110,8 @@ export class PrebuildManager {

span.setTag("starting", true);
await this.workspaceStarter.startWorkspace({ span }, workspace, user, [], {excludeFeatureFlags: ['full_workspace_backup']});
return { wsid: workspace.id, done: false };
const prebuild = (await prebuildPromise)!;
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm just slightly worried about this code:

  • Logically, findPrebuildByWorkspaceID(workspace.id) should always find a prebuild, given that the workspace was created from a StartPrebuildContext context
  • However, what if there is an edge case, or a new bug in the future 😱

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

indeed, it seems to lack a contract here, i.e. it seems easy to break because it relies on assumptions, though it's not changed frequently.

perhaps a dedicated workspaceFactory.createPrebuild(..., context): { ws: Workspace, pb: Prebuild } would document the requirements more maintainable? just thinking out loud...

if (!workspace) {
console.error('Unknown workspace id.', { workspaceId });
throw new Error('Unknown workspace ' + workspaceId);
}
const prebuild = (await prebuildPromise)!;
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same here (see previous comment)

@@ -265,6 +265,7 @@ export class WorkspaceStarter {
await this.workspaceDb.trace({ span }).storePrebuiltWorkspace(prebuild)
await this.messageBus.notifyHeadlessUpdate({span}, workspace.ownerId, workspace.id, <HeadlessWorkspaceEvent>{
type: HeadlessWorkspaceEventType.Aborted,
// TODO: `workspaceID: workspace.id` not needed here? (found in ee/src/prebuilds/prebuild-queue-maintainer.ts and ee/src/bridge.ts)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@csweichel We've never added a workspaceID here, but I see it in all other occurrences of notifyHeadlessUpdate. Is it really okay to omit it here? (Couldn't find who listens to this.)

@jankeromnes jankeromnes force-pushed the jx/cancel-prebuild branch 3 times, most recently from f8ca4d1 to 1763be7 Compare October 1, 2021 14:26
@AlexTugarev
Copy link
Member

AlexTugarev commented Oct 4, 2021

/werft run

👍 started the job as gitpod-build-jx-cancel-prebuild.16

prebuild.state = 'aborted';
prebuild.error = 'Cancelled manually';
await this.workspaceDB.trace({ span }).storePrebuiltWorkspace(prebuild);
await this.messageBus.notifyHeadlessUpdate({ span }, user.id, prebuild.buildWorkspaceId, <HeadlessWorkspaceEvent>{
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

how does this play together with state transitions driven by the bridge?
the bridge is receiving status updates from ws manager and does update the prebuild workspace "in parallel". there could be races of storePrebuiltWorkspace calls, potentially leading to override the Aborted message?

shouldn't the cancellation be a request to the wsman, for which in turn the bridge would to the notifications and updates from this change?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good question, thanks! To be honest, I simply copied the prebuild cancellation logic from bridge.ts:

} else if (!!status.conditions!.failed) {
prebuild.state = "aborted";
prebuild.error = status.conditions!.failed;
headlessUpdateType = HeadlessWorkspaceEventType.Aborted;
} else if (!!status.conditions!.headlessTaskFailed) {
prebuild.state = "available";
prebuild.error = status.conditions!.headlessTaskFailed;
prebuild.snapshot = status.conditions!.snapshot;
headlessUpdateType = HeadlessWorkspaceEventType.FinishedButFailed;
} else {
prebuild.state = "available";
prebuild.snapshot = status.conditions!.snapshot;
headlessUpdateType = HeadlessWorkspaceEventType.FinishedSuccessfully;
}
await this.workspaceDB.trace({span}).storePrebuiltWorkspace(prebuild);
await this.messagebus.notifyHeadlessUpdate({span}, userId, workspaceId, <HeadlessWorkspaceEvent>{
type: headlessUpdateType,
workspaceID: workspaceId,
});

and also from how it was implemented in the (now-deleted) PrebuildQueueMaintainer:

protected async discardPrebuild(ctx: TraceContext, prebuild: PrebuiltWorkspace) {
const span = TraceContext.startSpan("PrebuildQueueMaintainer.discardPrebuild", {})
try {
prebuild.state = 'aborted';
this.workspaceDB.trace({span}).storePrebuiltWorkspace(prebuild);
let userId: string;
const workspace = await this.workspaceDB.trace({span}).findById(prebuild.buildWorkspaceId);
if (workspace) {
userId = workspace.ownerId;
} else {
// this is a bit unorthodox but should still work: the userId is used to build the workspace exchange topic
// on which we send the message. We hardly every (if ever at all) listen for user-specific updates, the prebuild
// status maintainer certainly doesn't. Thus, even if userId is unknown, most parts of the system should still work.
//
// Mind, this branch is really the exception. Other parts of the system will have broken before we get here, i.e.
// see a prebuild workspace without an actual workspace attached.
userId = 'unknown';
}
await this.messagebus.notifyHeadlessUpdate({span}, userId, prebuild.buildWorkspaceId, <HeadlessWorkspaceEvent>{
type: HeadlessWorkspaceEventType.Aborted,
workspaceID: prebuild.buildWorkspaceId,
});
} catch (err) {
TraceContext.logError({span}, err);
throw err;
} finally {
span.finish();
}
}

I don't quite understand the full state machine yet, but it sounds like I should look more deeply into it. 👀

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jankeromnes, first of https://github.com/gitpod-io/gitpod/pull/5835/files

then, you see it's state propagation from the ws-manager-bridge which runs outside of server. which is the whole point: ws-manager running in region A and propagating event from there, but then you want to cancel a prebuild from server in region B. in such cases I'm puzzled about the stream of states a frontend would receive, thus the message for the user would be confusing.

in general, if we want something to happen with the workspace, we're sending requests to the ws-manager of the according region, which is already taken care of by the client connection as far as know. btw, this is what happens for the stop part of this new composite action called cancelPrebuild. I think what needs to be done here is to introduce a CancelPrebuild request in ws-manager, which basically does the stopping but also signals the action to the bridge to do the rest, i.e. update db and push notifications.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

After our sync discussion, it seems we'll want to remove the new prebuildManager.cancelPrebuild call, and instead make stopWorkspace also cancel the prebuild (inside the correct ws-manager, e.g. with a new flag on the StopWorkspaceRequest if necessary).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Idea: When stopping a headless workspace manually, set conditions.headlessTaskFailed to something like 'Cancelled manually.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

technically that would work.

// headless_task_failed indicates that a headless workspace task failed

just the semantics seems to be different. this value is currently derived from Message regarding the last termination of the container

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@csweichel, what's the best way to signal a cancelled prebuild?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@AlexTugarev Note: See also this Slack thread (internal)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The way this builds it out, incoming updates from ws-manager-bridge could overwrite the status.
What about making cancelPrebuild call StopWorkspace on the prebuild workspace and have the supervisor write a message to the termination log when its stopped while the tasks are still running?

If you wanted to provide feedback that cancelation is happening, you could introduce a state to the prebuild that indicates this. This way you could also add a reconciliation loop in bridge that calls StopWorkspace if need be.

@@ -1554,6 +1554,28 @@ export class GitpodServerEEImpl extends GitpodServerImpl<GitpodClient, GitpodSer
return prebuild;
}

async cancelPrebuild(projectId: string, prebuildId: string): Promise<void> {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

in theory we could get the projectId from the workspace of that prebuild.
what's the advantage of adding it to the list of args, besides remove a couple of db queries?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what's the advantage of adding it to the list of args, besides remove a couple of db queries?

None. Removing a few unnecessary DB queries sounds great, and this is information we typically already have when making the call, so we might as well not omit it. 🙂 Do you see it otherwise?

@AlexTugarev
Copy link
Member

AlexTugarev commented Oct 4, 2021

Really enjoying to get some control over the running prebuild! That's an awesome feature 🎉

@gtsiolis
Copy link
Contributor

gtsiolis commented Oct 4, 2021

/werft run

👍 started the job as gitpod-build-jx-cancel-prebuild.17

@jankeromnes

This comment has been minimized.

@roboquat roboquat added the team: webapp Issue belongs to the WebApp team label Oct 15, 2021
@jankeromnes jankeromnes removed the request for review from csweichel October 15, 2021 07:55
@jankeromnes jankeromnes force-pushed the jx/cancel-prebuild branch 3 times, most recently from 1d15e1a to 4661ddf Compare October 15, 2021 13:24
@jankeromnes jankeromnes marked this pull request as ready for review October 15, 2021 13:24
@jankeromnes
Copy link
Contributor Author

jankeromnes commented Oct 15, 2021

Okay, ready for review again, now using the conditions.stoppedByRequest added in #6218 ❤️

A few screenshots at random:

Screenshot 2021-10-15 at 15 37 18 Screenshot 2021-10-15 at 15 37 34 Screenshot 2021-10-15 at 15 38 42
Screenshot 2021-10-15 at 15 38 27 Screenshot 2021-10-15 at 15 49 38
Screenshot 2021-10-15 at 15 42 46 Screenshot 2021-10-15 at 15 53 53

@AlexTugarev @gtsiolis please take a look when you have time. 🙏

@AlexTugarev
Copy link
Member

AlexTugarev commented Oct 18, 2021

/werft run

👍 started the job as gitpod-build-jx-cancel-prebuild.24

@AlexTugarev
Copy link
Member

AlexTugarev commented Oct 18, 2021

/lgtm

this works really nicely and improves iterations on the config. 👍🏻

@roboquat
Copy link
Contributor

LGTM label has been added.

Git tree hash: 33a115dabe21107b63953cca6e21d038f5902fee

@roboquat
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: AlexTugarev

Associated issue: #5612

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@roboquat roboquat merged commit bad8372 into main Oct 18, 2021
@roboquat roboquat deleted the jx/cancel-prebuild branch October 18, 2021 08:12
@roboquat roboquat added deployed: webapp Meta team change is running in production deployed Change is completely running in production labels Oct 31, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved deployed: webapp Meta team change is running in production deployed Change is completely running in production release-note size/L team: webapp Issue belongs to the WebApp team
Projects
None yet
5 participants