Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Compression #14

Open
Michael-F-Bryan opened this issue Jun 7, 2017 · 6 comments
Open

Compression #14

Michael-F-Bryan opened this issue Jun 7, 2017 · 6 comments

Comments

@Michael-F-Bryan
Copy link
Owner

Michael-F-Bryan commented Jun 7, 2017

To avoid bloating binaries too much, let's introduce a feature flag which will allow data to be compressed when it is embedded in the binary, then lazily decompressed when it is accessed.

I imagine this feature flag would alter the include_dir::File type from this...

pub struct File<'a> {
path: &'a str,
contents: &'a [u8],
#[cfg(feature = "metadata")]
metadata: Option<crate::Metadata>,
}

... to something like this:

pub struct File<'a> {
    path: &'a str,
    contents: FileContents<'a>,
    #[cfg(feature = "metadata")]
    metadata: Option<crate::Metadata>,
}

impl<'a> File<'a> {
  fn contents(&self) -> &[u8] { self.contents.get() }
}

struct FileContents<'a> {
  compressed: &'a [u8],
  uncompressed: OnceCell<Vec<u8>>,
}

impl<'a> FileContents<'a> {
  fn get(&self) -> &[u8] {
    self.uncompressed.get_or_init(|| decompress(self.compressed))
  }
}

fn decompress(compressed: &[u8]) -> Vec<u8> { todo!() }

Some things that need to be considered are:

  • Which compression algorithm do we use?
  • Compression support needs to be added to both the macro and the main crate
  • Decompression should be done lazily without the user knowing (i.e. use &self and interior mutability)
@anton-dutov
Copy link

Ideally, also need to provide several types of compression to choose from

@joshtriplett
Copy link

In isolation, I'd say zstd would probably serve almost every purpose you'd need: you can spend a lot of time compressing very well, or a tiny amount of time to get a decent amount of compression.

The one reason to support multiple compression algorithms: if a program already needs a specific algorithm for some other purpose, it'd be nice to use the same one and avoid having two decompression libraries present.

@Michael-F-Bryan
Copy link
Owner Author

The one reason to support multiple compression algorithms: if a program already needs a specific algorithm for some other purpose, it'd be nice to use the same one and avoid having two decompression libraries present.

One of my goals for the project is to not pull in unnecessary dependencies, so this lines up well.

However, one thing I'd like to ensure is that the include_dir crate is portable and will Just Work out of the box.

Without having read the zstd-sys build script too closely and just speaking in general terms, most crates that bind to native libraries will work fine for a Windows/Linux/MacOS host, but then be impossible to cross-compile. This especially the case when the target is something like iOS, Android, or WebAssembly.

We've wasted more engineering hours than I'd care to admit at work just because of C build systems 😞

@NHodgesVFX
Copy link

NHodgesVFX commented Apr 30, 2022

https://crates.io/crates/snap might also be a good choice. Problem area might be the license.

@LordRatte
Copy link

I used the lz4_compression crate. It seems lightweight and it's written in pure Rust (which I presume means it will run on multiple targets).

Even if this isn't the library you want to go with @Michael-F-Bryan , I thought I'd get the ball rolling.

@zombiepigdragon
Copy link
Contributor

My few thoughts on this:

  • The feature shouldn't replace the default types, but should instead create a separate set of types (that way, if a dependency opts into compression it would still be possible to avoid it when runtime speed is a higher priority other crates).
  • Being able to use multiple algorithms would be beneficial, so the feature flags could be (e.g.) compression-zstd, compression-lz4.
  • To enable this, I suggest creating a Compression trait and making most of this crate generic over it, then providing a None implementation without any feature flags that doesn't do preprocessing. On the macros side, it will probably be best to make a separate macro for each algorithm (e.g. include_dir_zstd!, include_dir_lz4!, and the existing include_dir! returns a Dir<compression::None>), since compression would have to happen at build time and the main crate's trait won't be available for use.

In pseudo-Rust, this becomes

pub struct Dir<'a, C: Compression> { /* ... */ }
pub struct File<'a, C: Compression> { /* ... */ }
pub enum DirEntry<'a, C: Compression> { /* ... */ }

pub trait Compression {
    // using a `Cow<[u8]>` benefits the performance of `compress::None`
    fn decompress(data: &[u8]) -> Cow<[u8]>;
}

pub mod compress {
    pub enum None {}
    impl Compression for None { /* ... */ }

    #[cfg(feature = "compress-foo")]
    pub enum Foo {}
    #[cfg(feature = "compress-foo")]
    impl Compression for Foo { /* ... */ }
}

pub include_dir!;
#[cfg(feature = "compress-foo")]
pub include_dir_foo!;

It may be possible to do Dir<'a, C: Compression = compress::None> to simplify the basic case.

An unrelated open question is whether to compress each file as one unit (presumably better compression for directories with many small files) or individually (presumably better seek performance), or even leaving it to Compress implementations (compress-zip?)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

6 participants