Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Issues with backup preparation times #78

Open
HeatherComputer opened this issue Jul 1, 2024 · 3 comments
Open

Issues with backup preparation times #78

HeatherComputer opened this issue Jul 1, 2024 · 3 comments

Comments

@HeatherComputer
Copy link
Owner

Okay, so here's the deal :

  • In order to calculate which files to backup in an incremental / differential partial, we need to make a hash of each file.

  • This is... slow, because it effectively means reading the entire world from disk as if we were making a full backup, before we even start making one. We then have to re-read the files we want to backup from disk when the backup actually starts.

  • On larger worlds, this can lead to a long time sat on the Backup Starting message before any progress updates are sent, which can make a backup look stalled even when it wasn't.

  • It also extends the backup time pretty significantly, of course, because you're reading the entire world just to figure out what to backup.

Problem is... how can one solve this?

  • We kinda have to use hashes, because minecraft does not properly update file modification dates. We cannot use dates to tell if a file has changed. See Differential backups not working #33 for more info.
  • We could remove the apparent stall by backing up a file as soon as we know how to back it up or not. However, this introduces another issue - the "smart chain reset" feature wouldn't work here - because we'd have finished a backup before we'd know if the chain should be reset or not.
    • This wouldn't actually speed up backups at all, but it'd remove the apparent stall.
  • We could, in theory, only hash parts of the file. This however has problems because we risk just skipping over the only part of a file that has changed.. thus not backing up a file that we should backup.
    • This wouldn't outright remove the apparent stall. However, it would significantly speed this stage up, and thus speed up backups as a whole.
@MuteTiefling
Copy link

Hide the stall.

Keep track of the time to hash the last few times take an average, and use that to display a progress while hashing. 😎

@Kanzaji
Copy link

Kanzaji commented Jul 1, 2024

We kinda have to use hashes, because minecraft does not properly update file modification dates.

How about, we combine the two methods! If file modification date didn't change, check the file hash, if it did change, it means the file was for sure modified, so we can skip calculating hash and just back it up 😄

@HoldYourWaffle
Copy link
Contributor

We kinda have to use hashes, because minecraft does not properly update file modification dates.

For MCA files, which I'd assume are the bulk of the processing load, we could probably use the chunk update timestamps in the header.
If that for some reason doesn't work we might be able to use the LastUpdate values of individual chunks. NBT can only be parsed linearly, but as soon as one LastUpdate has changed we can skip the rest of the file, and thanks to the MCA header each chunk could be parsed in parallel.

However, I'd strongly recommend adding a config option to disable 'smartness' like this, just in case a server admin is aware of anything that might screw with these indicators...

Keep track of the time to hash the last few times take an average, and use that to display a progress while hashing. 😎

As far as I can tell we wouldn't even need to estimate, since we know (1) how many files need to be hashed, (2) how big they are and (3) how many bytes of the current file have been processed so far.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants