Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Cannot generate a workspace snapshot #5862

Closed
shaal opened this issue Sep 24, 2021 · 25 comments · Fixed by #6144
Closed

Cannot generate a workspace snapshot #5862

shaal opened this issue Sep 24, 2021 · 25 comments · Fixed by #6144
Assignees
Labels

Comments

@shaal
Copy link
Contributor

shaal commented Sep 24, 2021

Bug description

After opening DrupalPod in Gitpod, I open the hamburger menu on top-left corner, and choose Gitpod: Share Workspace Snapshot

I see Capturing workspace snapshot dialog -
image

After a few minutes, it displays an error -
image

Errors in console:
image

(Using Gitpod's VScode v1.59)

Steps to reproduce

  1. Open DrupalPod in gitpod:
    https://gitpod.io/#https://github.com/shaal/DrupalPod
  2. Open the hamburger menu on top-left corner
  3. Choose Gitpod: Share Workspace Snapshot

Expected behavior

I was able to take snapshots of DrupalPod workspace in the past, but somehow I can no longer able to.

Example repository

https://github.com/shaal/DrupalPod

Anything else?

I am able to create workspace snapshot on 'simpler' repos, such as - https://gitpod.io/#https://github.com/gitpod-io/template-typescript-node

@shaal
Copy link
Contributor Author

shaal commented Sep 28, 2021

I tested this also by opening https://github.com/gitpod-io/gitpod in Gitpod, and coose from the menu: Gitpod: Share Workspace Snapshot, same error.

@pawlean
Copy link
Contributor

pawlean commented Sep 28, 2021

I could not replicate this. 🤔 This is what I see:

2021-09-28 at 09 59 00 - Google Chrome

What version of VS Code are you on?

@shaal
Copy link
Contributor Author

shaal commented Sep 28, 2021

Vscode 1.59
I tried various browsers - Chrome, edge, Firefox. Still see the same error on both Gitpod repo and drupalpod repo.

@shaal
Copy link
Contributor Author

shaal commented Sep 28, 2021

I updated the description of this issue, to clarify this bug is happening only on some complex repos (DrupalPod, Gitpod, etc.), while on simpler repo (typescript demo workspace) I am able to create a snapshot.

@akosyakov
Copy link
Member

I updated the description of this issue, to clarify this bug is happening only on some complex repos (DrupalPod, Gitpod, etc.), while on simpler repo (typescript demo workspace) I am able to create a snapshot.

GLB drops web socket connection each 10mins. Maybe it is bad luck that it happens in between.

@shaal
Copy link
Contributor Author

shaal commented Sep 28, 2021

@akosyakov I tested this multiple times, one after the other, different repos, etc.
Are you able to take a snapshot of Gitpod or drupalpod repos?

@akosyakov
Copy link
Member

akosyakov commented Sep 28, 2021

takeSnapshot server api hangs for very long time and eventually it gets dropped by GCP via closing the web socket in 10 mins. I think meta team should look why it is. cc @JanKoehnlein @gtsiolis

If it cannot be optimised then we should reconsider api, with notification from server instead of long running RPC call.

@akosyakov akosyakov added the team: webapp Issue belongs to the WebApp team label Sep 28, 2021
@gtsiolis
Copy link
Contributor

I could reproduce the issue. Thanks @shaal for reporting and @akosyakov for the ping. 🏓

@csweichel
Copy link
Contributor

/schedule

@roboquat
Copy link
Contributor

@csweichel: Issue scheduled in the meta team (WIP: 0)

In response to this:

/schedule

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@geropl
Copy link
Member

geropl commented Oct 5, 2021

I'm not sure this will be an issue anymore with the recent changes to websocket (to be deployed). The MessageConnection stays intact depsite the underlying transport (websocket) is reconnecting, so we should still receive the response after reconnect and have the long-running call return. Will try to test and verify.

Update: Nope, still broken 😬

@zuernBernhard
Copy link

Experiencing the same with a Lando/Drupal Project in Gitpod and a larger Database. Can we help with (for example Debugging-Output / information) ?

@geropl
Copy link
Member

geropl commented Oct 7, 2021

Thx @zuernBernhard . We can reliably reproduce, and are currently investigating ways to mitigate this.

The trigger is indeed the size of the repo, so a possible mitigation/workaround would be to (remporarily) reduce that.

@geropl
Copy link
Member

geropl commented Oct 7, 2021

@zuernBernhard Could you share a link to the project you have these problems with?

@shaal
Copy link
Contributor Author

shaal commented Oct 8, 2021

I created a minimal repo to replicate this bug.
When /workspace storage space is 4GB or larger - snapshot fails.
When /workspace storage space is 3GB or smaller - snapshot successful.
https://github.com/shaal/gitpod-test-failing-snapshot

@geropl
Copy link
Member

geropl commented Oct 12, 2021

@akosyakov We have multiple options how to solve this; none of which I find super-nice because they're rather involving. Happy to re-iterate in sync, but here's the problem:

We have to somehow make server.takeSnapshot re-entrant, so clients don't have to rely on the websocket connection to stay alive for the entire upload. To do so, we could:

  1. immediately return an ID (snapshotId from DB?)
  2. offer a 2nd API call server.waitForSnaphot(snapshotId) which is re-entrant and blocks until the snapshot is there
  3. (also needed: make wsDaemon.takeSnapshot return early)

This leaves us with the question how we detect the state of a snapshot in server, and how we update said state:

  • introduce content-service API to detect state of an snapshot and poll that (:grimacing: )
  • somehow stream snapshot state from ws-manager and feed into the DB in parallel to instance updates
  • make snapshots part of the instance update
  • ...

@csweichel Would be interested in your thoughts as well.

@geropl
Copy link
Member

geropl commented Oct 19, 2021

Just had a discussion with @csweichel . Results:

  • introduce snapshot.state: 'pending' | 'available' | 'error'
  • drive that from
    1. server.waitForSnapshot
    2. a fallback mechanism with timeout (to convert to error eventually)
  • extend content-service to "read" the state of the snapshot

@geropl
Copy link
Member

geropl commented Nov 11, 2021

No, roboquat! 🙃

@geropl geropl reopened this Nov 11, 2021
@JanKoehnlein
Copy link
Contributor

@geropl Which state is this issue in now?

@geropl
Copy link
Member

geropl commented Nov 11, 2021

in progress. Waiting 1 day before merging the IDE PR, and then waiting for it being deployed: gitpod-io/openvscode-server#169

lessons for me after this issue: try hard to have 1 issue per PR; everything else is cumbersome.

@geropl
Copy link
Member

geropl commented Nov 19, 2021

IDE part got merged: gitpod-io/openvscode-server#169 (review)
Available on "insiders" per tomorrow.

@shaal
Copy link
Contributor Author

shaal commented Nov 19, 2021

@geropl Thank you! cannot wait to test it out!

@gtsiolis
Copy link
Contributor

@shaal
Copy link
Contributor Author

shaal commented Nov 26, 2021

Thank you for fixing this!
I can confirm I am able to make snapshots again.

@geropl
Copy link
Member

geropl commented Nov 26, 2021

Thx @gtsiolis , missed the deployment.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging a pull request may close this issue.

9 participants