Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Enhancement] Achieve better audio compression using variable block size #639

Open
T-3B opened this issue Jul 24, 2023 · 5 comments
Open

Comments

@T-3B
Copy link

T-3B commented Jul 24, 2023

I would like to know if variable block size encoding in flac is planned or not.
If not, I'm writing this issue to demonstrate its benefits (the main one being better compression).

Introduction

Currently, flac is the encoder providing the smallest files when used with the right parameters.
I tested many FLAC encoders: FLACOut.exe, CUETools.Flake.exe, justinruggles' Flake, CUETools.FLACCL.cmd.exe, and xiph's flac (ask for more details about the args tested if needed).
The files used for the test: different genres, 44.1 and 96 kHz, 16 and 24-bit, stereo.
Only the first 3 encoders allows variable block size encoding.

Even comparing flac (which currently only supports constant blocksize encoding) with other encoders supporting variable block size encoding, flac still managed to beat all others (in term of size).
Command used: flac in.wav -smepb 4096 -P 0 -r 15 -l 32 --lax -A subdivide_tukey\(6\)

The great advantage flac has compared to others (if I'm not misunderstanding), is its good predictors like subdivide_tukey(). Increasing the number given to this function always result in a smaller output file (but takes much more time to encode).

Why allow variable block size encoding?

The answer is simple: better compression (smaller output file).
Intelligent selection of different block sizes within a single FLAC stream can reduce file size (at the cost of longer encoding times).

What to expect?

It is a bit difficult to answer that one.
I ran quite many benchmarks, ending up with testing different constant block sizes with flac - 40 short files (PCM s16 44.1kHz, from 5 to 30 seconds). I don't remember from where I got them, ask them if you want.
We are using the flac command from above (only blocksize is changed)
These are the results (y-axis = filesize divided by filesize at 4096 blocksize; x-axis = blocksize; x-step = 32):
tukey6_all

For easier reading (samples splitted in multiple graphs), see tukey6.tar.gz.
We can see that with in best cases, we can reach 99.5% of the file size of the 4096-constant blocksize.

This is quite small (0.5% better), but it shows that there is room for improvements.
Also, we are encoding FLACs with almost best options (those providing better compression: max-lpc-order, ...).
We can therefore expect a more significant compression gain for faster presets/encodes (for example a simple -8ep).

So I would say FLAC files with intelligent variable blocksize encoding could be compressed 1% more (compared to a constant blocksize of 4096).

How could it be implemented?

I already thought about that. Or at least only about useful/meaningful presets.
This is an extrapolation of the data I collected with different constant blocksizes. Applying it to variable block size encoding is perhaps a bad thing, but imho is still relevant.

We can have 2 different approaches:

  • bruteforce: between a min-blocksize and a max, with a incrementation step. This could have multiple presets. As we can see in the graph above, bruteforcing from blocksize 1920 to 4608 with a step of 128 could already do a great job (22 blocksizes to try) and be quite fast. Then we could bruteforce with a smaller step (down to 32 - smaller is useless), and/or expand our min-blocksize and max.
    We can either compute all blocks or estimate the final size (for example, comparing the size of the frame using 4096-blocksize with double the size of the frame using 2048-blocksize).
  • search: this is of course harder to implement, because in the graph we can see that there are many local minimums. But this method would be faster since it will search the blocksize providing smallest filesize by only computing a few points of the graph above.

Conclusion

I tried to modify the flac code, but unfortunately it is too hard for me. I never worked on a big project like this one, and also never modified other people's code. I'm woefully inexperienced in this kind of project.
Further more, flac actually only support constant block size encoding, and changing it to support variable block size encoding would require many modifications in the code.

I'm not an expert (and far from it)! I greatly appreciate flac and its contributors for their good work, and please correct me for any mistake I could have written.
Thank you for your attention to this demand.

@ktmf01
Copy link
Collaborator

ktmf01 commented Jul 24, 2023

Somebody already created a somewhat brute-force variable blocksize encoder by modifying libFLAC. I tested this and these were my results.

As you can see, gains are not tremendous, we're talking about 0.3% here. I cannot read your graph, for some reason Firefox, GIMP and Inkscape just render a big black rectangle.

Anyway, I've already tried an university library a few times, went through google scholar and the likes and tried a few algorithms, but I haven't found an algorithm that can pick reasonable blocksizes reliably. I will not implement brute-force or search, it needs to be somewhat intelligent.

Also, there is a problem of compatibility. Quite a lot of (hardware) decoders are unable to decode variable blocksize files. See here. To name a big one, Android devices refuse playback. That means implementing this in FLAC will cause somewhat of a rift. I've made the decoder testbench in the hope that developers properly implement playback of variable blocksize files, but it doesn't seem to have reached the Android people yet (or they don't care).

So, summing up: there is a potential 0.5% gain possible (I think I'm being optimistic here), at the cost of (1) tremendously slow encoding (as long as no suitable algorithm is found), (2) adding a lot complexity, and (3) compatibility with hardware devices.

Sure, I want to implement this at some point. Mostly to have libFLAC cover as much of the specification as is possible. But I need a good algorithm and a lot of time. So this won't be included anytime soon, maybe never.

@T-3B
Copy link
Author

T-3B commented Jul 24, 2023

Thank you for your answer!

I wasn't aware of the black box of the SVG (since the preview in GitHub is shown just fine). Sorry for the inconvenience.
You can find here the PNG version of flac in.wav -smepb <blocksize> -P 0 -r 15 -l 32 --lax -A subdivide_tukey\(6\):
tukey6_all
Now same command, with a bit faster encoding (using -A subdivide_tukey\(1\)):
tukey1_all

I also provide here all the SVG in an archive, so they should be rendered correctly:
tukey1_AND_6_SVG.tar.gz. In this archive, you will find the SVG with all 40 samples (also in splitted graphs - 10 samples per graph); as well as the sum of file sizes (acting like a mean).

The purpose of benchmarking a faster preset (using subdivide_tukey\(1\)) is to show that more compression gain is possible by just choosing the right constant blocksize (here we can reach 1% of better compression - in the best cases, while it was 0.5% with subdivide_tukey\(6\)).

Thank you for sharing flaccid, a FLAC encoder that I never saw nor heard of. I'll read and test it ASAP.

I quite expected that hardware decoders would not support variable blocksize (even if some of them might), but Android ?
Again, I'm going to test it, but there are big chances that I end up using a software decoder because of the apps I use for listening (therefore they should support it if I guess right).

I might as well have misunderstood something with how data is encoded in FLAC files: you're saying there is a 0.5% gain possible while being optimistic.
Is using variable blocksize adding an overhead to frames (or subframes) of the FLAC file?
Because from my test, by only choosing a better constant blocksize we can reach 0.5% better compression (with quite "maxed" compression settings), so I would (by deduction) say we can hope for an average of at least 0.5% gain.
I'm surely missing something but I don't know what.

Thanks for your time.

@H2Swine
Copy link
Contributor

H2Swine commented Jul 25, 2023

@T-3B: Just to note, CUETools.Flake has disabled its variable blocksize mode last year, after it was discovered it encodes wrong: gchudov/cuetools.net#220 . I don't whether that bug is present in Justin Ruggles' original Flake.

If there were some smart and quick way to estimate optimal block size based on signal characteristics - maybe based on an initial -b4096 run - then maybe that could also improve a fixed-block strategy. Just make the second run and pick the smallest file.

In both cases (variable or optimized-fixed) I suspect the impact could be bigger on material with higher sampling rates - simply because, over the years, the reference implementation has gotten well tuned for 44.1/48 (firing a shotgun blindly has less chances of improving there, better chances at high resolution). For one thing, the default block size of 4096 corresponds to half the duration if sample rate is doubled ... Also, if one wants to stay within the streamable subset, the -b limit is raised from 4608 to 16384, so there are more opportunities to test.

But, if one wants to stay within subset, one should take into account there are signals where the -r8 limit actually binds. Not so many with CDDA in my experience (although particular examples are not hard to find), but what if it so happens that "optimal partition size" is 32 on a particular high resolution signal? -8r7 is subset, but -b16384 -r9 is not, and for subset one will have to choose between doubling the partition size and halving the frame size.

@ktmf01
Copy link
Collaborator

ktmf01 commented Jul 26, 2023

I might as well have misunderstood something with how data is encoded in FLAC files: you're saying there is a 0.5% gain possible while being optimistic. Is using variable blocksize adding an overhead to frames (or subframes) of the FLAC file? Because from my test, by only choosing a better constant blocksize we can reach 0.5% better compression (with quite "maxed" compression settings), so I would (by deduction) say we can hope for an average of at least 0.5% gain. I'm surely missing something but I don't know what.

For one thing, when I say 0.5% I actually mean 0.5%-point, that is my bad. So, you're comparing the gain to the compressed size, and I'm comparing the gain to the uncompressed size. Generally, 0.5%-point is about 1% here.

Also, my numbers are means to be an average over a large number of tracks, while it seems to me the 0.5% you mention is a peak value for certain tracks, some tracks do not improve at all.

@chocolate42
Copy link

Is using variable blocksize adding an overhead to frames (or subframes) of the FLAC file?

Only a minor overhead of a few bytes per frame, thanks to fixed blocksize frames being indexed by frame number whereas variable blocksize frames are indexed by sample number. Single digit kilobytes per track.

IMO flaccid is best used when aiming for maximal compression, none of my experiments with fast algorithms to try and compete with fixed have been successful albeit they are very simple algorithms that to an extent rely on weak brute force so that's no surprise. I'd argue that peakset is good enough as a strong brute-force technique, it doesn't cover every possible blocksize but what it does cover it does optimally as it's not greedy, it weaves the best chain in a large window with a set of blocksizes.

There are many rarely useful blocksizes which all but the most expensive algorithms should probably avoid entirely just because they are that much less likely to be beneficial. Very small or very large blocksizes rarely help, also blocksizes that aren't divisible by a large enough power of 2 (preferably 64 but arguably minimally 8 or 16) limits the maximum partition order the residual can be stored in which disadvantages them by default. Sane peakset settings naturally avoid these, so it's not as detrimental as it appears for peakset to only be considering a handful of blocksizes out of the potential ~65536.

IMO the ideal non-brute variable blocksize algorithm would probably vary more than just blocksize, there's probably a corelation between blocksize (or rather time frame) and the optimal choice of for example apod settings. Not something I can see someone being able to come up with short of some ML black box magic.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants