Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

"Cannot restore backup" due to multiple very large core files. #13102

Closed
david-bakin opened this issue Sep 19, 2022 · 1 comment
Closed

"Cannot restore backup" due to multiple very large core files. #13102

david-bakin opened this issue Sep 19, 2022 · 1 comment
Labels
meta: stale This issue/PR is stale and will be closed soon type: bug Something isn't working

Comments

@david-bakin
Copy link

david-bakin commented Sep 19, 2022

Bug description

A user (not me) reports frustration due to not being able to restore a backup, as per the following message:

cannot-restore-core-files

Notice that the problem complained of is "no space left on device" and the problem appears to be attempting to restore multiple core dump files.

Core dump files can become very large, taking up gigabytes, without a user being aware of it at all. It's happened to me, filling up a physical device (not on Gitpod) and I know what core dumps are. I've seen multiple reports on the discord forum of people asking "what are these core files?" where the user doesn't even know what core dumps are, and it turns out they were dropped by failed start tasks. And because of that: there's a lot of them if it turns out they were dropped during a before or command phase on each workspace startup.

It is very bad behavior to fail to restore a workspace just because it has multiple useless core dump files. It leads to user frustration, and, without Gitpod operator intervention, loss of valid user data. (Uncommitted/unpushed/unsaved stuff in his workspace.) Even with Gitpod operator intervention it leads to user delays and frustration, and Gitpod $$$$ spent on support.

Something should be done about this! It needs to be considered properly, but as examples:

  1. Periodic task notices a bunch of coredump files in /home/gitpod (or elsewhere) and notifies user. Perhaps as part of a vscode/jetbrains plugin.
  2. If the workspace backup doesn't restore then the restore script checks for the existence of core dump files in the tarchive and if so retries the restore using the command line option of tar to exclude restoring files according to a pattern.
  3. They're not backed up at all on workspace backup using the command line option of tar to exclude archiving files according to a pattern.
  4. They're just nuked on workspace backup if they exceed some threshold, perhaps oldest first, until below threshold.
  5. They're just nuked on workspace backup, unconditionally.

Extra credit:

Can also look for other things such as: *.log files, other files known to contain metrics/telemetry. Don't back up (or don't restore) if greater than some size limit.

Steps to reproduce

Run a whole bunch of programs that crash (or the same one repeatedly). Check for existence of core dumps. When you get ~30Gb of them or whatever, stop your workspace and try to restart it.

Workspace affected

No response

Expected behavior

The workspace should restore (without the offending large coredump files).

Example repository

No response

Anything else?

Core dump files are almost always useless, especially when the user doesn't know what they are and doesn't expect them. They're only useful in certain debugging/troubleshooting scenarios. It could, for example, be documented behavior that all core files matching the typical pattern core\.[0-9]+ are nuked. Then, if a user is working with a core file and wants it persisted, he can rename it to something else.

See also #12453 and #12814.

@david-bakin david-bakin added the type: bug Something isn't working label Sep 19, 2022
@david-bakin david-bakin changed the title "Cannot restore backup" where root cause is too large archive caused by multiple core files. "Cannot restore backup" due to multiple very large core files. Sep 19, 2022
@stale
Copy link

stale bot commented Dec 20, 2022

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

@stale stale bot added the meta: stale This issue/PR is stale and will be closed soon label Dec 20, 2022
@stale stale bot closed this as completed Jan 3, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
meta: stale This issue/PR is stale and will be closed soon type: bug Something isn't working
Projects
None yet
Development

No branches or pull requests

1 participant