Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Restore Backup failings #6227

Closed
jimi3 opened this issue Nov 23, 2020 · 17 comments
Closed

Restore Backup failings #6227

jimi3 opened this issue Nov 23, 2020 · 17 comments
Labels
C: core hardware support P: default Priority: default. Default priority for new issues, to be replaced given sufficient information. R: cannot reproduce Resolution: Attempts to replicate the problem have not been reliably successful enough to proceed.

Comments

@jimi3
Copy link

jimi3 commented Nov 23, 2020

Qubes OS version
R4.0.3 & R4.0.4rc1

Affected component(s) or functionality
Restore Backup

Brief summary

failed to decrypt /var/tmp/restorej3aq3bth/vm37/root.img.001.enc:b'scrypt:Passphrase is incorrect\n
Partially restored files left in /var/tmp/restore
*,investigate them and/or clean them up_

  1. Fresh install in R4.0.4rc1, run updates prior to restore backup, when i saw this error.
  2. Wiped Qubes and installed Qubes R4.0.3 but same error.
  3. First restored everything except dom0, and the default Templates and AppVMs, i do need a AppVM to cifs mount NAS drive to restore the Backup.
  4. vm37 was a debian-10-printer-dvm Template
  5. All VM's are listed as restored in Qubes Manager
  6. Turns out that some restored Appvm's were blank as in nothing of my previous doings was restored, others 'seem' to have everything.
  7. Some VM's are not even able to start up, "failed to start:qrexec-daemon startup failed: Connection to the VM failed"
  8. Deleted one of these broken restored AppVM's and this time restored just this one again with success and data showing up this time.
  9. Restoring just vm37 does not succeed though, this one didn't make the backup at all.

How Reproducible
not sure

Expected behavior
Restore with Users generated AppVM content if Passphrase is correct

Actual behavior
Partially restored with Users generated AppVM content and wrongly message User that Passphrase is incorrect.

Additional context
Verify backup function?

Solutions you've tried
fresh new installs of r4.0.3 & r4.0.04rc1, kernel-latest

@jimi3 jimi3 added P: default Priority: default. Default priority for new issues, to be replaced given sufficient information. T: bug labels Nov 23, 2020
@andrewdavidwong andrewdavidwong added C: core needs diagnosis Requires technical diagnosis from developer. Replace with "diagnosed" or remove if otherwise closed. labels Nov 23, 2020
@andrewdavidwong andrewdavidwong added this to the Release 4.0 updates milestone Nov 23, 2020
@andrewdavidwong
Copy link
Member

You omitted the steps to reproduce, which makes certain details unclear. Please provide the exact steps you performed.


vm37 was a debian-10-printer-dvm Template

I wonder if it might be due to the printer device assigned to this VM (similar to #6083, perhaps). Were the backup and restore done on the exact same hardware?


Verify backup function?

Yes, there is one. In the restore GUI, it's called "Verify backup integrity, do not restore the data".

@jimi3
Copy link
Author

jimi3 commented Nov 23, 2020

You omitted the steps to reproduce, which makes certain details unclear. Please provide the exact steps you performed.

Not sure what say here, it was a full Qubes OS Backup that i used that lead to this problem. The Backup was taken with r4.0.3 kernel-latest Qubes OS two weeks ago.
I would need to create a new full OS Backup and try to restore it on a fresh install to maybe reproduce this problem.
Please advice if i misunderstood you.

I wonder if it might be due to the printer device assigned to this VM (similar to #6083, perhaps). Were the backup and restore done on the exact same hardware?

Same PC, only change in hardware is from 500GB ssd to an 500GB nvme drive.

Yes, there is one. In the restore GUI, it's called "Verify backup integrity, do not restore the data".

It would be great if this was a default option to use right after writing the backup. Somehow i was under the impression a verify was being done with each backup.

@andrewdavidwong
Copy link
Member

It would be great if this was a default option to use right after writing the backup. Somehow i was under the impression a verify was being done with each backup.

There's an open issue for this: #1454

@jimi3
Copy link
Author

jimi3 commented Nov 24, 2020

Used a previous Backup and was able to restore the debian-10-printer-dvm template, worked like a charm.
Made a new encrypted backup to my local NAS which i just finished verifying successfully, i was not able to reproduce this error.

There's an open issue for this: #1454

Thank you.

@SvenSemmler
Copy link

SvenSemmler commented May 21, 2021

[I edited this post and removed several others to condense the information and spare you my ramblings.]

I see same issue "b'scrypt:Passphrase is incorrect" when migrating to another machine.

  • multiple tries, with multiple backups, multiple qubes, multiple media
  • the error occurs randomly at different chunks when doing the same restore (same machine/backup file)
  • in the meantime I have moved all my linux based qubes to the target, by running bash scripts that create all the templates and then re-creating all app qubes by hand and moving the user data manually
  • Critical for me: there is one qube (Win 10 HVM corporate install) that I cannot recreate without spending weeks with tickets, explanations and general misery.

So this is all efforts trying to move this HVM from the one to the other machine:

  • it's not Nirokey fails to mount on Debian AppVMs #6038 ... removed all PCI devices and rebooted the respective qube multiple times before backing up
  • also tried to move backup file into dom0 first (using the qvm-run --pass-io "cat ..." method) -- makes no difference
  • --verify on the source machine passes (root: 1061 chunks, 1 chunk private and 1 for firewall)
  • actual restore on source machine successful (it's not the backup file, nor the media ... the problem exists on the target machine)
  • --verify on the target machine fails: once at chunk 361 with a slightly different message: "b'scrypt: Input is not a valid scrypt-encrypted block" and once at chunk 133 with the usual "b'scrypt:Passphrase is incorrect"
  • I am using the target machine as main PC at this time and have not noticed any other issues that would indicate hardware issues.
  • both source and target machine dom0 are fully up-to-date

@brendanhoar
Copy link

If your systems are lvm based (the default) this might be a workable alternative:

https://github.com/tasket/wyng-backup

B

@andrewdavidwong andrewdavidwong added P: critical Priority: critical. Between "major" and "blocker" in severity. and removed P: default Priority: default. Default priority for new issues, to be replaced given sufficient information. labels May 21, 2021
@SvenSemmler
Copy link

SvenSemmler commented May 21, 2021

@SvenSemmler
Copy link

SvenSemmler commented May 22, 2021

source: ThinkPad P51 (UEFI) running R4.0.4
target: ThinkPad T430 (Heads) running R4.0.4

In summary:

  1. created backup of HVM onto external SSD (using sys-usb)
  2. copied backup from external SSD into dom0 of the source machine
  3. successfully verified on source machine using backup in dom0
  4. removed original HVM from source machine (made a clone first)
  5. successfully restored on the source machine using backup in dom0
  6. copied backup from the SSD into dom0 of the target machine
  7. verify using backup in dom0 on target machine fails at chunk 361 with "b'scrypt: Input is not a valid scrypt-encrypted block"
  8. second verify (same as step 7) fails at chunk 133 with "b'scrypt:Passphrase is incorrect"

Conclusion:

  • backup file itself is good (md5sum on source and target machine match)
  • software on both machines up-to-date
  • failure happens only on target machine
  • failure appears random (different chunk and sometimes different message)

The backup consists of 1061 chunks for root, 1 for private and 1 for firewall (seen when restoring with --verbose on source machine).

@tasket
Copy link

tasket commented May 22, 2021

@SvenSemmler You could try running a quick md5sum on both copies of the backup file (on source and destination machines) to see if the sum matches. If it doesn't, then the archive was damaged in the copy process.

@SvenSemmler
Copy link

Hi @tasket thank you for the hint -- the md5 sums match on source and target machine.

@DemiMarie
Copy link

Hi @tasket thank you for the hint -- the md5 sums match on source and target machine.

For future reference: sha256sum is a better choice here, as MD5 is vulnerable to cheap chosen-prefix collision attacks.

@marmarek
Copy link
Member

@SvenSemmler you may be hitting #4791 - running out of space in /var/tmp in dom0 during restore process. It is fixed in R4.1, but not in R4.0. Fully backporting the fix is tricky (but not impossible), but if you care about restoring from a backup archive that is transferred into dom0 already, you can try manually applying this change to /usr/lib/python3.5/site-packages/qubesadmin/backup/restore.py in dom0.

@SvenSemmler
Copy link

@marmarek thank you. I saw the remains in /var/tmp/restore* and cleaned them out before each new try. Also the target machine has a 2TB SSD and currently less than 20% are used. If you think it makes sense anyway, I can apply the change and retry (for debugging purposes).

I was able to move the qube using the dd method @tasket pointed out, but I am happy to keep debugging this. My next step is to backup some qubes on the target machine and do a verify to see if the issue has something to do with the original backup originating on another machine.

If there are any other experiments or investigations that could help track this down, please let me know.

@SvenSemmler
Copy link

SvenSemmler commented May 24, 2021

TLDR: I no longer think this is a bug in Qubes OS rather than a specific hardware issue with my new computer. I will continue to debug and try and find the issue, but I will do so in the forum and not here. I am OK with this bug being closed (although I am not the original reporter).

Backup/Verify off all qubes on the old machine: no issue
Backup/Verify off all qubes on the new machine: fails again with various errors. Also when moving the windows qube with gzip/dd as @tasket showed me the machine frooze twice before it finally worked the third time when uncompromising the data and writing it. Also, just watching some Netflix yesterday evening had the browser tab crash 3 times ... never seen that before with identical qube / memory / software versions on old machine.

Conclusion: hardware issue.

I plan to exchange the CPU within a week anyway. Also suspect it could be RAM. SSD seems less likely but is a distant third possible root cause. I will ask in the forum for ways to debug/check on this. Thank you for your attention and help.

@andrewdavidwong andrewdavidwong added P: default Priority: default. Default priority for new issues, to be replaced given sufficient information. and removed P: critical Priority: critical. Between "major" and "blocker" in severity. labels May 24, 2021
@andrewdavidwong
Copy link
Member

Given that @jimi3 said he unable to reproduce this (#6227 (comment)) and @SvenSemmler has concluded that he is not actually experiencing the bug reported here, I'm closing this for now as "cannot reproduce." If you believe this is a mistake, or if anyone can reproduce the issue, please leave a comment, and we'll be happy to reopen this. Thank you.

@andrewdavidwong andrewdavidwong added R: cannot reproduce Resolution: Attempts to replicate the problem have not been reliably successful enough to proceed. and removed needs diagnosis Requires technical diagnosis from developer. Replace with "diagnosed" or remove if otherwise closed. labels May 24, 2021
@SvenSemmler
Copy link

SvenSemmler commented Jun 2, 2021

@andrewdavidwong in case you want to update labels .... final update from my side to benefit future readers of this issue: the "b'scrypt:Passphrase is incorrect" was caused by the CPU in the target machine. I do not know whether it was a hardware issue or a software issue with this particular CPU (i7-3520M).

After exchanging it (with i7-3740QM) I repeated all experiments outlined above with multiple media and computers. The issue gone.

@andrewdavidwong
Copy link
Member

I wonder if this was actually a duplicate of #4493.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
C: core hardware support P: default Priority: default. Default priority for new issues, to be replaced given sufficient information. R: cannot reproduce Resolution: Attempts to replicate the problem have not been reliably successful enough to proceed.
Projects
None yet
Development

No branches or pull requests

7 participants