Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Workspace hangs in "Building Image" #10972

Closed
svenefftinge opened this issue Jun 28, 2022 · 13 comments
Closed

Workspace hangs in "Building Image" #10972

svenefftinge opened this issue Jun 28, 2022 · 13 comments
Labels
meta: stale This issue/PR is stale and will be closed soon type: bug Something isn't working

Comments

@svenefftinge
Copy link
Member

svenefftinge commented Jun 28, 2022

Bug description

My workspace hangs in "Building Image". I could see the log output until the exit command initially.
But then nothing happened. On reload I just see the empty log screen:

Screenshot 2022-06-28 at 13 55 35

Steps to reproduce

Can be reproduced in a preview environment:

  • Start an image build, e.g. https://<your-preview-env>.preview.gitpod-dev.com/#imagebuild/github.com/gitpod-io/empty
  • While the image build is in-progress (workspace is in a building state), restart the server pod
  • The workspace should get stuck in a building state

Workspace affected

sveneffting-gitpod1pass-em3ni2zmoi3

Example repository

https://github.com/svenefftinge/gitpod-1password

@svenefftinge svenefftinge added the type: bug Something isn't working label Jun 28, 2022
@stale
Copy link

stale bot commented Sep 28, 2022

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

@stale stale bot added the meta: stale This issue/PR is stale and will be closed soon label Sep 28, 2022
@LarisaLG
Copy link

The same problem and still building something
image

@stale stale bot removed the meta: stale This issue/PR is stale and will be closed soon label Oct 15, 2022
@will-beta
Copy link

also having this issue. expecting updates.

@kylos101
Copy link
Contributor

@WVerlaek is the behavior in this issue ☝ what you saw prior to implementing #15586? I ask to gauge whether we should schedule this issue.

@WVerlaek
Copy link
Member

WVerlaek commented Jan 13, 2023

@kylos101 #15586 is different, that solved image build logs not streaming when the build start is just delayed (and only by an order of minutes), here it looks like the build actually gets stuck in the building phase and never exits this phase 🤔

@kylos101 kylos101 moved this to Breakdown in 🌌 Workspace Team Jan 13, 2023
@kylos101
Copy link
Contributor

Thanks, @WVerlaek , adding this to Breakdown so we refine it, so we at least check metrics/logs/traces for when this last occurred, and how frequent it may be.

@Furisto
Copy link
Member

Furisto commented Jan 17, 2023

  • Check the database for image builds that are stuck to see how often this occurs in a week
  • Check traces/logs for imagebuild id
  • Check the repository from where the image build was triggered and try to reproduce the issue

@Furisto Furisto moved this from Breakdown to Scheduled in 🌌 Workspace Team Jan 17, 2023
@WVerlaek
Copy link
Member

Accidentally reproduced this issue 2x in a workspace-preview cluster, by restarting the server during some image builds. The image build pods completed, but the workspaces got stuck in a building state.

So potential theory is that maybe these image builds coincided with a server restart (e.g. a rollout?)

@WVerlaek
Copy link
Member

Was able to reproduce in a preview env, updated the issue with the steps to reproduce.

Not sure which team this issue would belong to, it looks like an issue in server where it doesn't pick up in-progress image builds after a restart? cc @kylos101

@kylos101 kylos101 moved this from Scheduled to Breakdown in 🌌 Workspace Team Jan 26, 2023
@kylos101
Copy link
Contributor

Nice find, @WVerlaek ! I've moved back to Breakdown and added a blocked label so that we can socialize with the WebApp team.

@geropl how would you like to handle this issue?

For example, if dashboard is showing a stream of image build logs, and then server restarts or crashes, when it comes back online again, I wonder if we can list and find the in-progress image builds, and potentially re-serve to dashboard? Does dashboard have enough context about the related session to ask server to find related image builds? Or is this problem deeper, because we lack an exclusive DB entry for image build workspace instances?

Would you recommend starting in sugar.ts?

Is this something WebApp team would be interested in taking on and prioritizing given related goals and results?

Code owners seems a wee bit out-of-date for image-buidler-api.

@geropl
Copy link
Member

geropl commented Jan 26, 2023

@kylos101 The fundamental problem is that the application so far does not persist state for image-builds. If we restart a server pod, the knowledge about that is gone. The only original way to avoid this was to "retrigger" the image-build, which includes checks for "is iamge build running?" (identified by Dockerfile hash), so we often re-connect to the exact same build - or trigger a fresh, parallel one (should not matter, honestly).

For some reason I think we're lagging that re-triggering here, on the frontend (1), but the more fundamental problem (2, more mid--term) is to store imagebuild state in the WorkspaceInstance table, and use that to find the ongoing workspace start/related image build, and attach to it again.

@geropl geropl moved this to Scheduled in 🍎 WebApp Team Jan 26, 2023
@kylos101
Copy link
Contributor

kylos101 commented Feb 2, 2023

@geropl given our team's available bandwidth, we're going to remove this from groundwork for now. I see you have this Scheduled, please mention me when you'd like us to consider again? For example, if you'd like to collaborate on any changes to Workspace components, like image-builder-api, etc.

@kylos101 kylos101 removed the status in 🌌 Workspace Team Feb 2, 2023
@kylos101 kylos101 removed the blocked label Feb 2, 2023
@geropl geropl removed the status in 🍎 WebApp Team Feb 6, 2023
@stale
Copy link

stale bot commented May 9, 2023

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

@stale stale bot added the meta: stale This issue/PR is stale and will be closed soon label May 9, 2023
@stale stale bot closed this as completed Jun 11, 2023
@github-project-automation github-project-automation bot moved this to Awaiting Deployment in 🌌 Workspace Team Jun 11, 2023
@github-project-automation github-project-automation bot moved this to In Validation in 🍎 WebApp Team Jun 11, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
meta: stale This issue/PR is stale and will be closed soon type: bug Something isn't working
Projects
Status: In Validation
Status: Awaiting Deployment
Development

No branches or pull requests

7 participants