workspaces which cannot be scheduled to a cluster get stuck in Pending #11397

kylos101 · 2022-07-15T02:36:16Z

Bug description

Workspaces which cannot be scheduled to a cluster get stuck in Pending.

Steps to reproduce

In a preview environment, kubectl cordon the node. Try starting a workspace. It'll stay stuck in Pending for-ev-er!

Workspace affected

n/a

Expected behavior

The workspace should fail, and land in a permanent Failed phase

Example repository

n/a

Anything else?

This becomes a problem when you're trying to delete a cluster, because ws-manager-bridge reports that there are still active workspaces in the database for the cluster, even though there are no workspace pods in the governed workspace cluster.

This prevented the delete of us53 (which was empty, and had no workspace clusters), until we ran the following queries:

-- find pending workspaces that aren't really pending
SELECT dbwi.id
FROM gitpod.d_b_workspace dbw  
inner join gitpod.d_b_workspace_instance dbwi 
on dbw.id = dbwi.workspaceId  
WHERE 
dbwi.phasePersisted not in ('stopped','failed') 
# change to match a region you're interested in
and dbwi.region = 'us53'
ORDER by dbwi.creationTime asc
;

-- force stop the pending workspaces
UPDATE gitpod.d_b_workspace_instance AS wsi
SET
			wsi.stoppedTime = IF(wsi.startedTime = '', wsi.creationTime, wsi.startedTime),
			wsi.stoppingTime = IF(wsi.startedTime = '', wsi.creationTime, wsi.startedTime),
			wsi.phasePersisted = 'stopped',			
			wsi.status = '{"phase": "stopped", "conditions": {}}'
WHERE wsi.id IN ('fb52614b-927b-4412-b583-dd3cc7bc3087',
'04eedb5b-1f35-4e4b-a8b3-56ea13c0f2dd',
'9b3d3928-59dd-46f0-9876-cc2bab905a2e',
'261bbde5-100f-4a19-8085-125ef82a0f0a',
'7878cc2a-544e-4f24-82ee-62a24c4f9c34',
'5dda17bf-1fe9-4cc6-b5ae-82aa6603049c',
'ad40bce6-a74c-47a9-b9a4-d9490afbfd41',
'028d826e-49b2-45ce-b829-8b4f693e93d3',
'99b74f2b-0a49-45c6-9c59-c4339a6dcb48',
'3695365a-0b4c-4df2-96f3-700598fe3baf')
;

Delete this page once resolved

The text was updated successfully, but these errors were encountered:

kylos101 · 2022-07-15T02:37:48Z

note: this may not be a problem with ws-manager, in hindsight, it might be a problem on the webapp side, but needs work to confirm.

jenting · 2022-07-29T12:51:04Z

Similar to #11673.

utam0k · 2022-07-31T23:15:32Z

@jenting Has this issue already been solved by #11673?

Similar to #11673.

jenting · 2022-08-01T02:14:48Z

@jenting Has this issue already been solved by #11673?

Similar to #11673.

No, the #11673 is to make a retry within 30 mins (we wait for the workspace pod to be running with 7 mins timeout).

The good news is the workspace pod be removed after 30 mins, and the bad news is the dashboard displays the workspace in a Pending state.

kylos101 · 2022-08-05T17:51:21Z

I believe Pending is a phase that indicates webapp has not selected a cluster (yet) for the workspace to land on. @geropl reassigning for now. wdyt?

kylos101 · 2022-08-23T15:50:52Z

👋 @geropl hey, we had to do some workspace clean-up in the database last week when decommissioning to gen61, as we shifted to gen62. May we ask for your help in scheduling this for the next couple weeks?

geropl · 2022-09-09T13:50:21Z

@kylos101 Done, set to next week.

geropl · 2022-09-26T07:52:38Z

Linked to: #6770

kylos101 added the type: bug Something isn't working label Jul 15, 2022

kylos101 added this to 🌌 Workspace Team Jul 15, 2022

kylos101 moved this to Scheduled in 🌌 Workspace Team Jul 15, 2022

kylos101 added the component: ws-manager label Jul 15, 2022

utam0k self-assigned this Jul 26, 2022

utam0k moved this from Scheduled to In Progress in 🌌 Workspace Team Jul 26, 2022

utam0k removed their assignment Jul 26, 2022

utam0k moved this from In Progress to Scheduled in 🌌 Workspace Team Jul 26, 2022

kylos101 moved this from Scheduled to Backlog in 🌌 Workspace Team Aug 1, 2022

kylos101 added component: ws-manager-bridge and removed component: ws-manager labels Aug 5, 2022

kylos101 removed this from 🌌 Workspace Team Aug 5, 2022

kylos101 added this to 🍎 WebApp Team Aug 5, 2022

kylos101 mentioned this issue Aug 5, 2022

[ws-manager-bridge] the number of workspace instances remaining seems to be wrong #11399

Closed

geropl moved this to Clarification in 🍎 WebApp Team Aug 8, 2022

kylos101 changed the title ~~[ws-manager] workspaces which cannot be scheduled to a cluster get stuck in Pending~~ workspaces which cannot be scheduled to a cluster get stuck in Pending Aug 11, 2022

kylos101 mentioned this issue Aug 11, 2022

[server] Improve maintenance of workspace instance state #12067

Closed

kylos101 removed the status in 🍎 WebApp Team Aug 23, 2022

geropl moved this to Scheduled in 🍎 WebApp Team Aug 23, 2022

geropl removed the status in 🍎 WebApp Team Sep 12, 2022

geropl moved this to Scheduled in 🍎 WebApp Team Sep 12, 2022

geropl assigned laushinka Sep 12, 2022

laushinka mentioned this issue Sep 12, 2022

[bridge] Check workspace instance phase duration #12879

Closed

1 task

laushinka mentioned this issue Sep 12, 2022

[bridge] Stop stuck workspace instances #12894

Closed

1 task

laushinka moved this from Scheduled to In Progress in 🍎 WebApp Team Sep 13, 2022

laushinka mentioned this issue Sep 13, 2022

[bridge] Logs stuck workspace instances to validate fix #12902

Merged

1 task

geropl unassigned laushinka Sep 26, 2022

geropl removed the status in 🍎 WebApp Team Sep 26, 2022

geropl moved this to Scheduled in 🍎 WebApp Team Sep 26, 2022

laushinka self-assigned this Sep 27, 2022

laushinka moved this from Scheduled to In Progress in 🍎 WebApp Team Sep 27, 2022

laushinka mentioned this issue Sep 27, 2022

[bridge] Marks stuck stopping and pending instances as stopped #13350

Merged

3 tasks

roboquat closed this as completed in #13350 Sep 29, 2022

Repository owner moved this from In Progress to In Validation in 🍎 WebApp Team Sep 29, 2022

laushinka moved this from In Validation to Done in 🍎 WebApp Team Oct 25, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

workspaces which cannot be scheduled to a cluster get stuck in Pending #11397

workspaces which cannot be scheduled to a cluster get stuck in Pending #11397

kylos101 commented Jul 15, 2022 •

edited

Loading

kylos101 commented Jul 15, 2022 •

edited

Loading

jenting commented Jul 29, 2022

utam0k commented Jul 31, 2022

jenting commented Aug 1, 2022 •

edited

Loading

kylos101 commented Aug 5, 2022

kylos101 commented Aug 23, 2022

geropl commented Sep 9, 2022

geropl commented Sep 26, 2022

workspaces which cannot be scheduled to a cluster get stuck in Pending #11397

workspaces which cannot be scheduled to a cluster get stuck in Pending #11397

Comments

kylos101 commented Jul 15, 2022 • edited Loading

Bug description

Steps to reproduce

Workspace affected

Expected behavior

Example repository

Anything else?

kylos101 commented Jul 15, 2022 • edited Loading

jenting commented Jul 29, 2022

utam0k commented Jul 31, 2022

jenting commented Aug 1, 2022 • edited Loading

kylos101 commented Aug 5, 2022

kylos101 commented Aug 23, 2022

geropl commented Sep 9, 2022

geropl commented Sep 26, 2022

kylos101 commented Jul 15, 2022 •

edited

Loading

kylos101 commented Jul 15, 2022 •

edited

Loading

jenting commented Aug 1, 2022 •

edited

Loading