Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Significantly speed up creating backups with isal via zlib-fast #4843

Merged
merged 3 commits into from
Jan 29, 2024
Merged

Conversation

bdraco
Copy link
Member

@bdraco bdraco commented Jan 27, 2024

Proposed change

isal via python-isal has a drop in replacement for zlib with the cavet that the compression level mappings are different. zlib-fast is a tiny compat layer to convert the standard zlib compression levels to isal compression levels to allow for drop-in replacement

I considered a few other approaches here such as using pigz or switching to another format but this seems like the simplest and most compatible resulting in only a time code change here.

Wheels are built for all supported platforms https://wheels.home-assistant.io/musllinux/

https://github.com/bdraco/zlib-fast/releases/tag/v0.2.0 https://github.com/pycompression/python-isal

Differences

  • Compression for backups is ~5x faster than the baseline. My backup time went from 2m2s to 24s with a ~1400MiB backup.
  • Restoring backups is ~2.5x faster (but not as noticable since restores were already ~5.5x faster than compress).
  • YMMV on the compression ratio a bit depending on the content. On one system it was 2% better, and on another it was 7% worse. securetar uses compress level 6 by default so it seems like the goal there was to balance speed and compress ratio. We could turn up the compress level later, but I left it at the default for now as it seemed most consistent with the securetar choice.
  • Cpu usage never hits 100% anymore on the either of my primary production system as I/O is the limitation now. It now hovers around 80% instead
  • numbers will be far less impressive on 32 bit platforms and systems that are I/O constrained (reading the data off the sd card can be the bottleneck instead of the compression overhead for these).

powturbo/TurboBench#43

Hopefully this means we won't see any more database backup failures where core gives up the lock on the SQLite database because it can't hold any more events in memory when the backup takes too long. I was especially keen on fixing this since when that happens it may be that user's only hint that their backup is corrupt is in the core log and they find out the hard way and open issue after it's already too late. https://github.com/home-assistant/core/blob/a793a5445f4a9f33f2e1c334c0d569ec772335fe/homeassistant/components/recorder/core.py#L1015

We can likely supplement the logger warning with a repair issue after this change since most cases should be the result of a hardware or overload problem with the system instead of the compression taking too long. Currently issues like home-assistant/core#105987 end up going stale since there was no viable solution before. I have been waiting to add the repair issue since the answer of telling them to wait until we can make the backups faster seemed like it wasn't going to go over well and the result was the same with the issue eventually going stale since there was no solution

Testing:

  • x86_64 ~1400MB 2m2s -> 24s
  • aarch64 validation of backup and restore only (test data was small)
  • aarch64 (second machine, still a bit I/O constrained) - ~1100MB 9m29s -> 5m58s
  • armv7l (32 bit, I/O constrained due to SD card and not expected to change much.. I wanted to make sure it worked on on 32bit so I tested it anyways) ~500MB 5m16s -> 4m42s

A future improvement for I/O constrained system could be to use https://docs.python.org/3/library/tarfile.html#tarfile.TarFile.addfile to add each gziped tar file to the final archive, rewind the stream, and rewrite the size over it, and jump back to the end so we wouldn't have to write all the tgz files to disk and they can be streamed into the final result. This would cut the writes in half (and increase storage lifetime) and likely make more difference for these systems than this change.

Type of change

  • Dependency upgrade
  • Bugfix (non-breaking change which fixes an issue)
  • New feature (which adds functionality to the supervisor)
  • Breaking change (fix/feature causing existing functionality to break)
  • Code quality improvements to existing code or addition of tests

Additional information

  • This PR fixes or closes issue: fixes #
  • This PR is related to issue:
  • Link to documentation pull request:
  • Link to cli pull request:

Checklist

  • The code change is tested and works locally.
  • Local tests pass. Your PR cannot be merged unless tests pass
  • There is no commented out code in this PR.
  • I have followed the development checklist
  • The code has been formatted using Black (black --fast supervisor tests)
  • Tests have been added to verify that the new code works.

If API endpoints of add-on configuration are added/changed:

isal is a drop in replacement for zlib with the
cavet that the compression level mappings are different.
zlib-fast is a tiny piece of middleware to convert
the standard zlib compression levels to isal compression
levels to allow for drop-in replacement

https://github.com/bdraco/zlib-fast/releases/tag/v0.1.0
https://github.com/pycompression/python-isal

Compression for backups is ~5x faster than the baseline

powturbo/TurboBench#43
@bdraco bdraco added the performance A code change that improves performance label Jan 27, 2024
@bdraco
Copy link
Member Author

bdraco commented Jan 27, 2024

I tested this locally and verified it was loaded

28e75fc68efe:/usr/src/supervisor# ulimit -n 1000
28e75fc68efe:/usr/src/supervisor# lsof -p 78
python3  78 root  mem       REG              259,8            3677071 /usr/local/lib/python3.11/site-packages/isal/igzip_lib.cpython-311-x86_64-linux-musl.so (path dev=0,51)
python3  78 root  mem       REG              259,8            3677074 /usr/local/lib/python3.11/site-packages/isal/isal_zlib.cpython-311-x86_64-linux-musl.so (path dev=0,51)
python3  78 root  mem       REG              259,8            3677070 /usr/local/lib/python3.11/site-packages/isal/_isal.cpython-311-x86_64-linux-musl.so (path dev=0,51)

@bdraco
Copy link
Member Author

bdraco commented Jan 27, 2024

On my local its much faster but I tried it on production and there is no change so I've got to be doing something wrong/different

@bdraco
Copy link
Member Author

bdraco commented Jan 27, 2024

I see the issue. my test uses zlib and I need to map the gzip path to IGzipFile as well

@bdraco bdraco closed this Jan 28, 2024
@bdraco bdraco reopened this Jan 28, 2024
@bdraco bdraco closed this Jan 28, 2024
@bdraco bdraco reopened this Jan 28, 2024
@bdraco bdraco closed this Jan 28, 2024
@bdraco bdraco reopened this Jan 28, 2024
@bdraco bdraco marked this pull request as ready for review January 28, 2024 00:52
Copy link

@codyc1515 codyc1515 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this pose any issue with restores? Otherwise, LGTM.

@bdraco
Copy link
Member Author

bdraco commented Jan 28, 2024

Does this pose any issue with restores? Otherwise, LGTM.

Restores work the same. decompression is faster as well. ~2.5x faster

@codyc1515
Copy link

2024.2 is the month of the back-up! Loooking forward to seeing this, alongside the other changes we have made to reliability.

@agners
Copy link
Member

agners commented Jan 29, 2024

YMMV on the compression ratio a bit depending on the content. On one system it was 2% better, and on another it was 7% worse. securetar uses compress level 6 by default so it seems like the goal there was to balance speed and compress ratio.

I've changed that somewhat recently with pvizeli/securetar#22. It helped already dropping backup time by almost 2x. I remember I considered going to a even lower default setting as the compression numbers in gz do not influence compression ratio drastically (at least not as drastic as others, at least according to this somewhat older benchmark). However, I then thought let's move the needle slowly to not run into surprises. I've not heard any negative feedback, so from my point of view we can continue to change the standard.

That said, I was actually consider starting to look into another compression algorithm, e.g. zstd. It would mean to introduce a backup format version, but that shouldn't be that big of a deal.

One thing I was wondering: Is this library making use of newer instruction set extensions/is the code written such that it still will work on old x86-64 systems without those extensions?

@bdraco
Copy link
Member Author

bdraco commented Jan 29, 2024

One thing I was wondering: Is this library making use of newer instruction set extensions/is the code written such that it still will work on old x86-64 systems without those extensions?

I've tested the library and it works on older x86_64 systems as well as 32 bit arm. Its just not as fast because it uses fallbacks (but still faster than baseline)

@bdraco
Copy link
Member Author

bdraco commented Jan 29, 2024

I didn't test it on every old x86_64 chip (only the synology box with the old chip we had problem with other libs) Given its an intel origin https://github.com/intel/isa-l I expect they have tested it a bit more on older hardware as they have a bit more access to it

@codyc1515
Copy link

That said, I was actually consider starting to look into another compression algorithm, e.g. zstd. It would mean to introduce a backup format version, but that shouldn't be that big of a deal.

Messaging was added to the supervisor recently stating that backups are not backwards compatible (instead of just discreetly failing), so this should not be a blocking issue.

@agners
Copy link
Member

agners commented Jan 29, 2024

I didn't test it on every old x86_64 chip (only the synology box with the old chip we had problem with other libs) Given its an intel origin https://github.com/intel/isa-l I expect they have tested it a bit more on older hardware as they have a bit more access to it

I mean if they use hand written assembler, they probably just thought about it and implemented fallbacks.

Sometimes it is also the compilers optimization steps which select some instructions from an extension, but that then will depend on flags at build time.

Since you already tested on older systems I am not too concerned indeed. Just something to keep an eye out.

I do have a Fujitsu machine with a AMD Geode here which also has a quite limited set of instructions available. I'll do a backup once this is on dev.

@bdraco
Copy link
Member Author

bdraco commented Jan 29, 2024

I do have a Fujitsu machine with a AMD Geode here which also has a quite limited set of instructions available. I'll do a backup once this is on dev.

Nice. Thankfully this is trivial to revert if there is an issue. I'm going to merge it now and update everything I have once dev is built

@bdraco bdraco merged commit d1851fa into main Jan 29, 2024
48 of 65 checks passed
@bdraco bdraco deleted the zlib_fast branch January 29, 2024 20:25
@bdraco
Copy link
Member Author

bdraco commented Jan 29, 2024

Bumped everyone of my production and dev systems. Backups completed successfully on all of them

@agners
Copy link
Member

agners commented Jan 29, 2024

Just tested on my Fujitsu AMD GX-222GC based system, works like a charm! 👍

@github-actions github-actions bot locked and limited conversation to collaborators Jan 31, 2024
@bdraco
Copy link
Member Author

bdraco commented Feb 11, 2024

A future improvement for I/O constrained system could be to use docs.python.org/3/library/tarfile.html#tarfile.TarFile.addfile to add each gziped tar file to the final archive, rewind the stream, and rewrite the size over it, and jump back to the end so we wouldn't have to write all the tgz files to disk and they can be streamed into the final result. This would cut the writes in half (and increase storage lifetime) and likely make more difference for these systems than this change.

I implemented that in #4884

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
cla-signed performance A code change that improves performance
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants