Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix CompressionLevel.Optimal for Brotli #72266

Merged
merged 1 commit into from
Jul 15, 2022

Conversation

stephentoub
Copy link
Member

@stephentoub stephentoub commented Jul 15, 2022

The intent of CompressionLevel.Optimal is to be a balanced tradeoff between compression ratio and speed:

        /// <summary>
        /// The compression operation should balance compression speed and output size.
        /// </summary>
        Optimal = 0,

but whereas DeflateStream, GZipStream, and ZLibStream all treat Optimal as such (using zlib's default setting for such a balanced tradeoff), BrotliStream treats Optimal the same as maximum compression, which is very slow. Especially now that maximum compression is expressible as CompressionLevel.SmallestSize, it's even more valuable for Optimal to represent that balanced tradeoff. Based on a variety of sources around the net and some local testing, I've changed the Optimal value from 11 to 4; I've also changed the default used when no level is set to be Optimal (e.g. new BrotliStream(..., CompressionMode.Compress). This is also more important now that we've fixed the argument validation bug that allowed arbitrary numerical values to be passed through unvalidated (DeflateStream, GZipStream, and ZLibStream all properly validated already).

Fixes #46595
Fixes #64185
Fixes #72220

For reference, here are local measurements of time and size to compress each of our test files in runtime-assets using this PR. Prior to this PR, the Optimal values matched the SmallestSize values. This is based on writing in 1K chunks.

Level Time (s) Size (b)
UncompressedTestFiles\alice29.txt NoCompression 2.6293 91874
UncompressedTestFiles\alice29.txt Fastest 2.3388 87269
UncompressedTestFiles\alice29.txt Optimal 3.8406 54467
UncompressedTestFiles\alice29.txt SmallestSize 199.6285 46006
UncompressedTestFiles\asyoulik.txt NoCompression 2.2171 80887
UncompressedTestFiles\asyoulik.txt Fastest 1.9447 76373
UncompressedTestFiles\asyoulik.txt Optimal 3.2476 49477
UncompressedTestFiles\asyoulik.txt SmallestSize 159.8074 42712
UncompressedTestFiles\cp.html NoCompression 0.4428 14479
UncompressedTestFiles\cp.html Fastest 0.4035 13662
UncompressedTestFiles\cp.html Optimal 0.7788 8103
UncompressedTestFiles\cp.html SmallestSize 27.4777 6894
UncompressedTestFiles\fields.c NoCompression 0.221 5837
UncompressedTestFiles\fields.c Fastest 0.1894 5381
UncompressedTestFiles\fields.c Optimal 0.5744 3278
UncompressedTestFiles\fields.c SmallestSize 13.1068 2717
UncompressedTestFiles\grammar.lsp NoCompression 0.0761 1850
UncompressedTestFiles\grammar.lsp Fastest 0.0695 1685
UncompressedTestFiles\grammar.lsp Optimal 0.3753 1258
UncompressedTestFiles\grammar.lsp SmallestSize 5.3079 1124
UncompressedTestFiles\kennedy.xls NoCompression 15.0624 301536
UncompressedTestFiles\kennedy.xls Fastest 11.645 240138
UncompressedTestFiles\kennedy.xls Optimal 14.4044 113871
UncompressedTestFiles\kennedy.xls SmallestSize 2060.26 61498
UncompressedTestFiles\lcet10.txt NoCompression 7.4125 256153
UncompressedTestFiles\lcet10.txt Fastest 6.6139 242587
UncompressedTestFiles\lcet10.txt Optimal 9.3937 139527
UncompressedTestFiles\lcet10.txt SmallestSize 641.716 112264
UncompressedTestFiles\plrabn12.txt NoCompression 8.1206 303822
UncompressedTestFiles\plrabn12.txt Fastest 9.1307 288300
UncompressedTestFiles\plrabn12.txt Optimal 11.9471 191916
UncompressedTestFiles\plrabn12.txt SmallestSize 718.69 162585
UncompressedTestFiles\ptt5 NoCompression 7.3581 103527
UncompressedTestFiles\ptt5 Fastest 6.0247 86790
UncompressedTestFiles\ptt5 Optimal 5.5391 53295
UncompressedTestFiles\ptt5 SmallestSize 888.7847 40939
UncompressedTestFiles\sum NoCompression 0.7669 22063
UncompressedTestFiles\sum Fastest 0.6608 20398
UncompressedTestFiles\sum Optimal 1.222 12547
UncompressedTestFiles\sum SmallestSize 46.2843 10144
UncompressedTestFiles\TestDocument.doc NoCompression 0.7481 20406
UncompressedTestFiles\TestDocument.doc Fastest 0.6388 19006
UncompressedTestFiles\TestDocument.doc Optimal 1.1753 6301
UncompressedTestFiles\TestDocument.doc SmallestSize 27.2491 5651
UncompressedTestFiles\TestDocument.docx NoCompression 0.357 15453
UncompressedTestFiles\TestDocument.docx Fastest 0.3097 15121
UncompressedTestFiles\TestDocument.docx Optimal 0.9365 12600
UncompressedTestFiles\TestDocument.docx SmallestSize 30.7547 12176
UncompressedTestFiles\TestDocument.pdf NoCompression 2.3971 120365
UncompressedTestFiles\TestDocument.pdf Fastest 2.073 119868
UncompressedTestFiles\TestDocument.pdf Optimal 2.2002 115862
UncompressedTestFiles\TestDocument.pdf SmallestSize 463.0941 114601
UncompressedTestFiles\TestDocument.txt NoCompression 0.3534 12592
UncompressedTestFiles\TestDocument.txt Fastest 0.3298 11885
UncompressedTestFiles\TestDocument.txt Optimal 0.4199 609
UncompressedTestFiles\TestDocument.txt SmallestSize 3.4359 458
UncompressedTestFiles\xargs.1 NoCompression 0.0925 2611
UncompressedTestFiles\xargs.1 Fastest 0.0777 2429
UncompressedTestFiles\xargs.1 Optimal 0.4027 1760
UncompressedTestFiles\xargs.1 SmallestSize 6.5394 1464
UncompressedTestFiles\GoogleTestData\10x10y NoCompression 0.0308 24
UncompressedTestFiles\GoogleTestData\10x10y Fastest 0.01 22
UncompressedTestFiles\GoogleTestData\10x10y Optimal 0.0441 12
UncompressedTestFiles\GoogleTestData\10x10y SmallestSize 0.5027 12
UncompressedTestFiles\GoogleTestData\64x NoCompression 0.013 66
UncompressedTestFiles\GoogleTestData\64x Fastest 0.0092 19
UncompressedTestFiles\GoogleTestData\64x Optimal 0.0332 10
UncompressedTestFiles\GoogleTestData\64x SmallestSize 0.5038 11
UncompressedTestFiles\GoogleTestData\backward65536 NoCompression 0.5267 3056
UncompressedTestFiles\GoogleTestData\backward65536 Fastest 0.3252 1248
UncompressedTestFiles\GoogleTestData\backward65536 Optimal 0.521 19
UncompressedTestFiles\GoogleTestData\backward65536 SmallestSize 9.8462 20
UncompressedTestFiles\GoogleTestData\compressed_file NoCompression 0.93 50244
UncompressedTestFiles\GoogleTestData\compressed_file Fastest 0.8197 50244
UncompressedTestFiles\GoogleTestData\compressed_file Optimal 0.5494 50100
UncompressedTestFiles\GoogleTestData\compressed_file SmallestSize 13.0755 50100
UncompressedTestFiles\GoogleTestData\compressed_repeated NoCompression 2.3233 104448
UncompressedTestFiles\GoogleTestData\compressed_repeated Fastest 1.8912 103315
UncompressedTestFiles\GoogleTestData\compressed_repeated Optimal 1.2824 50445
UncompressedTestFiles\GoogleTestData\compressed_repeated SmallestSize 142.3112 50156
UncompressedTestFiles\GoogleTestData\empty NoCompression 0.0047 1
UncompressedTestFiles\GoogleTestData\empty Fastest 0.0046 1
UncompressedTestFiles\GoogleTestData\empty Optimal 0.0119 1
UncompressedTestFiles\GoogleTestData\empty SmallestSize 0.2414 1
UncompressedTestFiles\GoogleTestData\mapsdatazrh NoCompression 6.3617 248411
UncompressedTestFiles\GoogleTestData\mapsdatazrh Fastest 5.5249 237599
UncompressedTestFiles\GoogleTestData\mapsdatazrh Optimal 5.3991 172914
UncompressedTestFiles\GoogleTestData\mapsdatazrh SmallestSize 579.1282 159339
UncompressedTestFiles\GoogleTestData\monkey NoCompression 0.0186 535
UncompressedTestFiles\GoogleTestData\monkey Fastest 0.018 464
UncompressedTestFiles\GoogleTestData\monkey Optimal 0.2497 447
UncompressedTestFiles\GoogleTestData\monkey SmallestSize 1.5155 405
UncompressedTestFiles\GoogleTestData\plrabn12.txt NoCompression 7.7685 303822
UncompressedTestFiles\GoogleTestData\plrabn12.txt Fastest 6.9209 288300
UncompressedTestFiles\GoogleTestData\plrabn12.txt Optimal 11.8668 191916
UncompressedTestFiles\GoogleTestData\plrabn12.txt SmallestSize 663.7283 162585
UncompressedTestFiles\GoogleTestData\quickfox NoCompression 0.0145 47
UncompressedTestFiles\GoogleTestData\quickfox Fastest 0.0114 47
UncompressedTestFiles\GoogleTestData\quickfox Optimal 0.117 47
UncompressedTestFiles\GoogleTestData\quickfox SmallestSize 0.6283 47
UncompressedTestFiles\GoogleTestData\quickfox_repeated NoCompression 1.4822 15256
UncompressedTestFiles\GoogleTestData\quickfox_repeated Fastest 1.0156 10737
UncompressedTestFiles\GoogleTestData\quickfox_repeated Optimal 1.0473 52
UncompressedTestFiles\GoogleTestData\quickfox_repeated SmallestSize 10.472 57
UncompressedTestFiles\GoogleTestData\random_org_10k.bin NoCompression 0.2042 10031
UncompressedTestFiles\GoogleTestData\random_org_10k.bin Fastest 0.1747 10031
UncompressedTestFiles\GoogleTestData\random_org_10k.bin Optimal 0.3997 10004
UncompressedTestFiles\GoogleTestData\random_org_10k.bin SmallestSize 16.7305 10004
UncompressedTestFiles\GoogleTestData\ukkonooa NoCompression 0.0256 123
UncompressedTestFiles\GoogleTestData\ukkonooa Fastest 0.0124 84
UncompressedTestFiles\GoogleTestData\ukkonooa Optimal 0.1321 62
UncompressedTestFiles\GoogleTestData\ukkonooa SmallestSize 0.7021 81
UncompressedTestFiles\GoogleTestData\x NoCompression 0.0125 5
UncompressedTestFiles\GoogleTestData\x Fastest 0.0077 5
UncompressedTestFiles\GoogleTestData\x Optimal 0.0209 5
UncompressedTestFiles\GoogleTestData\x SmallestSize 0.1997 5
UncompressedTestFiles\GoogleTestData\xyzzy NoCompression 0.012 9
UncompressedTestFiles\GoogleTestData\xyzzy Fastest 0.0127 9
UncompressedTestFiles\GoogleTestData\xyzzy Optimal 0.0341 9
UncompressedTestFiles\GoogleTestData\xyzzy SmallestSize 0.5024 9
UncompressedTestFiles\GoogleTestData\zeros NoCompression 3.2957 11957
UncompressedTestFiles\GoogleTestData\zeros Fastest 1.2054 4897
UncompressedTestFiles\GoogleTestData\zeros Optimal 1.5983 13
UncompressedTestFiles\GoogleTestData\zeros SmallestSize 26.1265 14
UncompressedTestFiles\WebFiles\angular.js NoCompression 21.7842 658201
UncompressedTestFiles\WebFiles\angular.js Fastest 19.3305 616118
UncompressedTestFiles\WebFiles\angular.js Optimal 20.7139 303734
UncompressedTestFiles\WebFiles\angular.js SmallestSize 1809.499 238542
UncompressedTestFiles\WebFiles\angular.min.js NoCompression 2.9548 101106
UncompressedTestFiles\WebFiles\angular.min.js Fastest 2.8023 94502
UncompressedTestFiles\WebFiles\angular.min.js Optimal 3.6814 59407
UncompressedTestFiles\WebFiles\angular.min.js SmallestSize 231.552 51183
UncompressedTestFiles\WebFiles\broker-config.js NoCompression 0.2727 7693
UncompressedTestFiles\WebFiles\broker-config.js Fastest 0.2429 7180
UncompressedTestFiles\WebFiles\broker-config.js Optimal 0.5422 4019
UncompressedTestFiles\WebFiles\broker-config.js SmallestSize 15.5805 3425
UncompressedTestFiles\WebFiles\config.js NoCompression 0.0485 1188
UncompressedTestFiles\WebFiles\config.js Fastest 0.043 1111
UncompressedTestFiles\WebFiles\config.js Optimal 0.3007 835
UncompressedTestFiles\WebFiles\config.js SmallestSize 3.4928 714
UncompressedTestFiles\WebFiles\jquery-3.2.1.js NoCompression 4.7525 154833
UncompressedTestFiles\WebFiles\jquery-3.2.1.js Fastest 4.2201 144873
UncompressedTestFiles\WebFiles\jquery-3.2.1.js Optimal 5.4571 80767
UncompressedTestFiles\WebFiles\jquery-3.2.1.js SmallestSize 366.0033 65996
UncompressedTestFiles\WebFiles\jquery-3.2.1.min.js NoCompression 1.5167 53824
UncompressedTestFiles\WebFiles\jquery-3.2.1.min.js Fastest 1.3658 50386
UncompressedTestFiles\WebFiles\jquery-3.2.1.min.js Optimal 2.2928 31257
UncompressedTestFiles\WebFiles\jquery-3.2.1.min.js SmallestSize 105.1005 27233
UncompressedTestFiles\WebFiles\meBoot.min.js NoCompression 0.3768 12442
UncompressedTestFiles\WebFiles\meBoot.min.js Fastest 0.3303 11647
UncompressedTestFiles\WebFiles\meBoot.min.js Optimal 0.7382 7583
UncompressedTestFiles\WebFiles\meBoot.min.js SmallestSize 24.3661 6692
UncompressedTestFiles\WebFiles\mwf-west-european-default.min.css NoCompression 8.6086 199372
UncompressedTestFiles\WebFiles\mwf-west-european-default.min.css Fastest 7.2527 181110
UncompressedTestFiles\WebFiles\mwf-west-european-default.min.css Optimal 6.3795 67306
UncompressedTestFiles\WebFiles\mwf-west-european-default.min.css SmallestSize 724.5991 50550
UncompressedTestFiles\WebFiles\MWFMDL2.woff NoCompression 0.2132 10958
UncompressedTestFiles\WebFiles\MWFMDL2.woff Fastest 0.1978 10932
UncompressedTestFiles\WebFiles\MWFMDL2.woff Optimal 0.4329 10850
UncompressedTestFiles\WebFiles\MWFMDL2.woff SmallestSize 22.3428 10764
UncompressedTestFiles\WebFiles\style.css NoCompression 0.0831 2266
UncompressedTestFiles\WebFiles\style.css Fastest 0.0738 2080
UncompressedTestFiles\WebFiles\style.css Optimal 0.3723 1173
UncompressedTestFiles\WebFiles\style.css SmallestSize 5.5696 1019
UncompressedTestFiles\WebFiles\uhf-west-european-default.min.css NoCompression 1.8925 41797
UncompressedTestFiles\WebFiles\uhf-west-european-default.min.css Fastest 1.566 37828
UncompressedTestFiles\WebFiles\uhf-west-european-default.min.css Optimal 1.635 15212
UncompressedTestFiles\WebFiles\uhf-west-european-default.min.css SmallestSize 142.875 12063
UncompressedTestFiles\WebFiles\www.reddit.com6.23.2017.har NoCompression 70.7763 2444920
UncompressedTestFiles\WebFiles\www.reddit.com6.23.2017.har Fastest 61.9193 2284328
UncompressedTestFiles\WebFiles\www.reddit.com6.23.2017.har Optimal 61.9911 1102579
UncompressedTestFiles\WebFiles\www.reddit.com6.23.2017.har SmallestSize 5652.197 924534

The intent of Optimal is to be a balanced trade-off between compression ratio and speed, but whereas DeflateStream, GZipStream, and ZLibStream all treat Optimal as such (using zlib's default setting for such a balanced tradeoff), Brotli treats Optimal the same as maximum compression, which is very slow.  Especially now that maximum compression is expressible as CompressionLevel.SmallestSize, it's even more valuable for Optimal to represent that balanced tradeoff.  Based on a variety of sources around the net and some local testing, I've changed the Optimal value from 11 to 4.  This is also more important now that we've fixed the argument validation bug that allowed arbitrary numerical values to be passed through unvalidated (DeflateStream, GZipStream, and ZLibStream all properly validate).
@ghost
Copy link

ghost commented Jul 15, 2022

Tagging subscribers to this area: @dotnet/area-system-io-compression
See info in area-owners.md if you want to be subscribed.

Issue Details

The intent of CompressionLevel.Optimal is to be a balanced tradeoff between compression ratio and speed:

        /// <summary>
        /// The compression operation should balance compression speed and output size.
        /// </summary>
        Optimal = 0,

but whereas DeflateStream, GZipStream, and ZLibStream all treat Optimal as such (using zlib's default setting for such a balanced tradeoff), BrotliStream treats Optimal the same as maximum compression, which is very slow. Especially now that maximum compression is expressible as CompressionLevel.SmallestSize, it's even more valuable for Optimal to represent that balanced tradeoff. Based on a variety of sources around the net and some local testing, I've changed the Optimal value from 11 to 4; I've also changed the default used when no level is set to be Optimal (e.g. new BrotliStream(..., CompressionMode.Compress). This is also more important now that we've fixed the argument validation bug that allowed arbitrary numerical values to be passed through unvalidated (DeflateStream, GZipStream, and ZLibStream all properly validated already).

Fixes #46595
Fixes #64185
Fixes #72220

For reference, here are local measurements of time and size to compress each of our test files in runtime-assets using this PR. Prior to this PR, the Optimal values matched the SmallestSize values. This is based on writing in 1K chunks.

Level Time (s) Size (b)
UncompressedTestFiles\alice29.txt NoCompression 2.6293 91874
UncompressedTestFiles\alice29.txt Fastest 2.3388 87269
UncompressedTestFiles\alice29.txt Optimal 3.8406 54467
UncompressedTestFiles\alice29.txt SmallestSize 199.6285 46006
UncompressedTestFiles\asyoulik.txt NoCompression 2.2171 80887
UncompressedTestFiles\asyoulik.txt Fastest 1.9447 76373
UncompressedTestFiles\asyoulik.txt Optimal 3.2476 49477
UncompressedTestFiles\asyoulik.txt SmallestSize 159.8074 42712
UncompressedTestFiles\cp.html NoCompression 0.4428 14479
UncompressedTestFiles\cp.html Fastest 0.4035 13662
UncompressedTestFiles\cp.html Optimal 0.7788 8103
UncompressedTestFiles\cp.html SmallestSize 27.4777 6894
UncompressedTestFiles\fields.c NoCompression 0.221 5837
UncompressedTestFiles\fields.c Fastest 0.1894 5381
UncompressedTestFiles\fields.c Optimal 0.5744 3278
UncompressedTestFiles\fields.c SmallestSize 13.1068 2717
UncompressedTestFiles\grammar.lsp NoCompression 0.0761 1850
UncompressedTestFiles\grammar.lsp Fastest 0.0695 1685
UncompressedTestFiles\grammar.lsp Optimal 0.3753 1258
UncompressedTestFiles\grammar.lsp SmallestSize 5.3079 1124
UncompressedTestFiles\kennedy.xls NoCompression 15.0624 301536
UncompressedTestFiles\kennedy.xls Fastest 11.645 240138
UncompressedTestFiles\kennedy.xls Optimal 14.4044 113871
UncompressedTestFiles\kennedy.xls SmallestSize 2060.26 61498
UncompressedTestFiles\lcet10.txt NoCompression 7.4125 256153
UncompressedTestFiles\lcet10.txt Fastest 6.6139 242587
UncompressedTestFiles\lcet10.txt Optimal 9.3937 139527
UncompressedTestFiles\lcet10.txt SmallestSize 641.716 112264
UncompressedTestFiles\plrabn12.txt NoCompression 8.1206 303822
UncompressedTestFiles\plrabn12.txt Fastest 9.1307 288300
UncompressedTestFiles\plrabn12.txt Optimal 11.9471 191916
UncompressedTestFiles\plrabn12.txt SmallestSize 718.69 162585
UncompressedTestFiles\ptt5 NoCompression 7.3581 103527
UncompressedTestFiles\ptt5 Fastest 6.0247 86790
UncompressedTestFiles\ptt5 Optimal 5.5391 53295
UncompressedTestFiles\ptt5 SmallestSize 888.7847 40939
UncompressedTestFiles\sum NoCompression 0.7669 22063
UncompressedTestFiles\sum Fastest 0.6608 20398
UncompressedTestFiles\sum Optimal 1.222 12547
UncompressedTestFiles\sum SmallestSize 46.2843 10144
UncompressedTestFiles\TestDocument.doc NoCompression 0.7481 20406
UncompressedTestFiles\TestDocument.doc Fastest 0.6388 19006
UncompressedTestFiles\TestDocument.doc Optimal 1.1753 6301
UncompressedTestFiles\TestDocument.doc SmallestSize 27.2491 5651
UncompressedTestFiles\TestDocument.docx NoCompression 0.357 15453
UncompressedTestFiles\TestDocument.docx Fastest 0.3097 15121
UncompressedTestFiles\TestDocument.docx Optimal 0.9365 12600
UncompressedTestFiles\TestDocument.docx SmallestSize 30.7547 12176
UncompressedTestFiles\TestDocument.pdf NoCompression 2.3971 120365
UncompressedTestFiles\TestDocument.pdf Fastest 2.073 119868
UncompressedTestFiles\TestDocument.pdf Optimal 2.2002 115862
UncompressedTestFiles\TestDocument.pdf SmallestSize 463.0941 114601
UncompressedTestFiles\TestDocument.txt NoCompression 0.3534 12592
UncompressedTestFiles\TestDocument.txt Fastest 0.3298 11885
UncompressedTestFiles\TestDocument.txt Optimal 0.4199 609
UncompressedTestFiles\TestDocument.txt SmallestSize 3.4359 458
UncompressedTestFiles\xargs.1 NoCompression 0.0925 2611
UncompressedTestFiles\xargs.1 Fastest 0.0777 2429
UncompressedTestFiles\xargs.1 Optimal 0.4027 1760
UncompressedTestFiles\xargs.1 SmallestSize 6.5394 1464
UncompressedTestFiles\GoogleTestData\10x10y NoCompression 0.0308 24
UncompressedTestFiles\GoogleTestData\10x10y Fastest 0.01 22
UncompressedTestFiles\GoogleTestData\10x10y Optimal 0.0441 12
UncompressedTestFiles\GoogleTestData\10x10y SmallestSize 0.5027 12
UncompressedTestFiles\GoogleTestData\64x NoCompression 0.013 66
UncompressedTestFiles\GoogleTestData\64x Fastest 0.0092 19
UncompressedTestFiles\GoogleTestData\64x Optimal 0.0332 10
UncompressedTestFiles\GoogleTestData\64x SmallestSize 0.5038 11
UncompressedTestFiles\GoogleTestData\backward65536 NoCompression 0.5267 3056
UncompressedTestFiles\GoogleTestData\backward65536 Fastest 0.3252 1248
UncompressedTestFiles\GoogleTestData\backward65536 Optimal 0.521 19
UncompressedTestFiles\GoogleTestData\backward65536 SmallestSize 9.8462 20
UncompressedTestFiles\GoogleTestData\compressed_file NoCompression 0.93 50244
UncompressedTestFiles\GoogleTestData\compressed_file Fastest 0.8197 50244
UncompressedTestFiles\GoogleTestData\compressed_file Optimal 0.5494 50100
UncompressedTestFiles\GoogleTestData\compressed_file SmallestSize 13.0755 50100
UncompressedTestFiles\GoogleTestData\compressed_repeated NoCompression 2.3233 104448
UncompressedTestFiles\GoogleTestData\compressed_repeated Fastest 1.8912 103315
UncompressedTestFiles\GoogleTestData\compressed_repeated Optimal 1.2824 50445
UncompressedTestFiles\GoogleTestData\compressed_repeated SmallestSize 142.3112 50156
UncompressedTestFiles\GoogleTestData\empty NoCompression 0.0047 1
UncompressedTestFiles\GoogleTestData\empty Fastest 0.0046 1
UncompressedTestFiles\GoogleTestData\empty Optimal 0.0119 1
UncompressedTestFiles\GoogleTestData\empty SmallestSize 0.2414 1
UncompressedTestFiles\GoogleTestData\mapsdatazrh NoCompression 6.3617 248411
UncompressedTestFiles\GoogleTestData\mapsdatazrh Fastest 5.5249 237599
UncompressedTestFiles\GoogleTestData\mapsdatazrh Optimal 5.3991 172914
UncompressedTestFiles\GoogleTestData\mapsdatazrh SmallestSize 579.1282 159339
UncompressedTestFiles\GoogleTestData\monkey NoCompression 0.0186 535
UncompressedTestFiles\GoogleTestData\monkey Fastest 0.018 464
UncompressedTestFiles\GoogleTestData\monkey Optimal 0.2497 447
UncompressedTestFiles\GoogleTestData\monkey SmallestSize 1.5155 405
UncompressedTestFiles\GoogleTestData\plrabn12.txt NoCompression 7.7685 303822
UncompressedTestFiles\GoogleTestData\plrabn12.txt Fastest 6.9209 288300
UncompressedTestFiles\GoogleTestData\plrabn12.txt Optimal 11.8668 191916
UncompressedTestFiles\GoogleTestData\plrabn12.txt SmallestSize 663.7283 162585
UncompressedTestFiles\GoogleTestData\quickfox NoCompression 0.0145 47
UncompressedTestFiles\GoogleTestData\quickfox Fastest 0.0114 47
UncompressedTestFiles\GoogleTestData\quickfox Optimal 0.117 47
UncompressedTestFiles\GoogleTestData\quickfox SmallestSize 0.6283 47
UncompressedTestFiles\GoogleTestData\quickfox_repeated NoCompression 1.4822 15256
UncompressedTestFiles\GoogleTestData\quickfox_repeated Fastest 1.0156 10737
UncompressedTestFiles\GoogleTestData\quickfox_repeated Optimal 1.0473 52
UncompressedTestFiles\GoogleTestData\quickfox_repeated SmallestSize 10.472 57
UncompressedTestFiles\GoogleTestData\random_org_10k.bin NoCompression 0.2042 10031
UncompressedTestFiles\GoogleTestData\random_org_10k.bin Fastest 0.1747 10031
UncompressedTestFiles\GoogleTestData\random_org_10k.bin Optimal 0.3997 10004
UncompressedTestFiles\GoogleTestData\random_org_10k.bin SmallestSize 16.7305 10004
UncompressedTestFiles\GoogleTestData\ukkonooa NoCompression 0.0256 123
UncompressedTestFiles\GoogleTestData\ukkonooa Fastest 0.0124 84
UncompressedTestFiles\GoogleTestData\ukkonooa Optimal 0.1321 62
UncompressedTestFiles\GoogleTestData\ukkonooa SmallestSize 0.7021 81
UncompressedTestFiles\GoogleTestData\x NoCompression 0.0125 5
UncompressedTestFiles\GoogleTestData\x Fastest 0.0077 5
UncompressedTestFiles\GoogleTestData\x Optimal 0.0209 5
UncompressedTestFiles\GoogleTestData\x SmallestSize 0.1997 5
UncompressedTestFiles\GoogleTestData\xyzzy NoCompression 0.012 9
UncompressedTestFiles\GoogleTestData\xyzzy Fastest 0.0127 9
UncompressedTestFiles\GoogleTestData\xyzzy Optimal 0.0341 9
UncompressedTestFiles\GoogleTestData\xyzzy SmallestSize 0.5024 9
UncompressedTestFiles\GoogleTestData\zeros NoCompression 3.2957 11957
UncompressedTestFiles\GoogleTestData\zeros Fastest 1.2054 4897
UncompressedTestFiles\GoogleTestData\zeros Optimal 1.5983 13
UncompressedTestFiles\GoogleTestData\zeros SmallestSize 26.1265 14
UncompressedTestFiles\WebFiles\angular.js NoCompression 21.7842 658201
UncompressedTestFiles\WebFiles\angular.js Fastest 19.3305 616118
UncompressedTestFiles\WebFiles\angular.js Optimal 20.7139 303734
UncompressedTestFiles\WebFiles\angular.js SmallestSize 1809.499 238542
UncompressedTestFiles\WebFiles\angular.min.js NoCompression 2.9548 101106
UncompressedTestFiles\WebFiles\angular.min.js Fastest 2.8023 94502
UncompressedTestFiles\WebFiles\angular.min.js Optimal 3.6814 59407
UncompressedTestFiles\WebFiles\angular.min.js SmallestSize 231.552 51183
UncompressedTestFiles\WebFiles\broker-config.js NoCompression 0.2727 7693
UncompressedTestFiles\WebFiles\broker-config.js Fastest 0.2429 7180
UncompressedTestFiles\WebFiles\broker-config.js Optimal 0.5422 4019
UncompressedTestFiles\WebFiles\broker-config.js SmallestSize 15.5805 3425
UncompressedTestFiles\WebFiles\config.js NoCompression 0.0485 1188
UncompressedTestFiles\WebFiles\config.js Fastest 0.043 1111
UncompressedTestFiles\WebFiles\config.js Optimal 0.3007 835
UncompressedTestFiles\WebFiles\config.js SmallestSize 3.4928 714
UncompressedTestFiles\WebFiles\jquery-3.2.1.js NoCompression 4.7525 154833
UncompressedTestFiles\WebFiles\jquery-3.2.1.js Fastest 4.2201 144873
UncompressedTestFiles\WebFiles\jquery-3.2.1.js Optimal 5.4571 80767
UncompressedTestFiles\WebFiles\jquery-3.2.1.js SmallestSize 366.0033 65996
UncompressedTestFiles\WebFiles\jquery-3.2.1.min.js NoCompression 1.5167 53824
UncompressedTestFiles\WebFiles\jquery-3.2.1.min.js Fastest 1.3658 50386
UncompressedTestFiles\WebFiles\jquery-3.2.1.min.js Optimal 2.2928 31257
UncompressedTestFiles\WebFiles\jquery-3.2.1.min.js SmallestSize 105.1005 27233
UncompressedTestFiles\WebFiles\meBoot.min.js NoCompression 0.3768 12442
UncompressedTestFiles\WebFiles\meBoot.min.js Fastest 0.3303 11647
UncompressedTestFiles\WebFiles\meBoot.min.js Optimal 0.7382 7583
UncompressedTestFiles\WebFiles\meBoot.min.js SmallestSize 24.3661 6692
UncompressedTestFiles\WebFiles\mwf-west-european-default.min.css NoCompression 8.6086 199372
UncompressedTestFiles\WebFiles\mwf-west-european-default.min.css Fastest 7.2527 181110
UncompressedTestFiles\WebFiles\mwf-west-european-default.min.css Optimal 6.3795 67306
UncompressedTestFiles\WebFiles\mwf-west-european-default.min.css SmallestSize 724.5991 50550
UncompressedTestFiles\WebFiles\MWFMDL2.woff NoCompression 0.2132 10958
UncompressedTestFiles\WebFiles\MWFMDL2.woff Fastest 0.1978 10932
UncompressedTestFiles\WebFiles\MWFMDL2.woff Optimal 0.4329 10850
UncompressedTestFiles\WebFiles\MWFMDL2.woff SmallestSize 22.3428 10764
UncompressedTestFiles\WebFiles\style.css NoCompression 0.0831 2266
UncompressedTestFiles\WebFiles\style.css Fastest 0.0738 2080
UncompressedTestFiles\WebFiles\style.css Optimal 0.3723 1173
UncompressedTestFiles\WebFiles\style.css SmallestSize 5.5696 1019
UncompressedTestFiles\WebFiles\uhf-west-european-default.min.css NoCompression 1.8925 41797
UncompressedTestFiles\WebFiles\uhf-west-european-default.min.css Fastest 1.566 37828
UncompressedTestFiles\WebFiles\uhf-west-european-default.min.css Optimal 1.635 15212
UncompressedTestFiles\WebFiles\uhf-west-european-default.min.css SmallestSize 142.875 12063
UncompressedTestFiles\WebFiles\www.reddit.com6.23.2017.har NoCompression 70.7763 2444920
UncompressedTestFiles\WebFiles\www.reddit.com6.23.2017.har Fastest 61.9193 2284328
UncompressedTestFiles\WebFiles\www.reddit.com6.23.2017.har Optimal 61.9911 1102579
UncompressedTestFiles\WebFiles\www.reddit.com6.23.2017.har SmallestSize 5652.197 924534
Author: stephentoub
Assignees: -
Labels:

area-System.IO.Compression, tenet-performance

Milestone: 7.0.0

Copy link
Member

@jeffhandley jeffhandley left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks great. Thanks for sharing the detailed test results.

@stephentoub stephentoub merged commit f281393 into dotnet:main Jul 15, 2022
@stephentoub stephentoub deleted the brotlioptimal branch July 15, 2022 19:14
@TonyValenti
Copy link

TonyValenti commented Jul 15, 2022

@stephentoub Im not sure if you're able to do this, but I'd suggest renaming "Optimal" to actually be named "Balanced". I think that is more self describing and avoids confusion with "SmallestSize".

@stephentoub
Copy link
Member Author

Im not sure if you're able to do this

We're not.

@danmoseley
Copy link
Member

Curious that fastest compression is often faster than no compression. Maybe less time writing the file?

@stephentoub
Copy link
Member Author

Curious that fastest compression is often faster than no compression. Maybe less time writing the file?

Don't put a lot of stock in the values that are so close. This is only one iteration per level. The important thing is the big gaps from SmallestSize.

@Genbox
Copy link

Genbox commented Jul 18, 2022

The run-time for SmallestSize is absolutely ridiculous for Brotli and should not be mapped to an option a lot of users will choose.

As someone who has spent a fair amount of time with compression algorithms from an academic perspective, it is not unusual to see algorithm creators support a wide range of settings (memory usage, window size, chain length, etc.) that goes beyond what is reasonable in 99% of use cases.

To make it easier for users, these settings are (usually) mapped on a scale from 0 to 10, where 11 is a nod to Turn to 11, which a lot of implementers of the algorithms do not seem to understand should never be used outside theoretical/synthetic use cases.

I reported this back in 2018; thankfully, this merge fixes the default from 11 to 4, which is a lot saner. However, the issue of users being unable to set the quality setting themselves remains.

@danmoseley
Copy link
Member

There are scenarios, of course, where you are willing to burn almost any CPU to minimize the size. For example, I am guessing that the Visual Studio setup team dial up the LZMA setting to the max, because the savings will be multiplied by millions of downloads. On a web product I did this for PNG's eg. That doesn't directly address your suggestion, but "11" might save bytes, it's not unreasonable in some cases.

@Genbox
Copy link

Genbox commented Jul 18, 2022

You are correct, but how far are you willing to take it?

You can completely bypass the quality parameter and tweak the underlying settings to get an even smaller result. The underlying algorithms are often derivatives of the shortest common supersequence problem, which is an O(N^2) class of algorithms (often solved in O(N^k), though).

That means you should see exponential growth in run time as you increase the quality factor.

I'm not sure how Stephen got those large numbers (seconds) for the test corpus, but it follows similar patterns I've seen before. I've bolded the numbers.

UncompressedTestFiles\alice29.txt:
NoCompression | 2.6293 | 91874
Fastest | 2.3388 | 87269
Optimal | 3.8406 | 54467
SmallestSize | 199.6285 | 46006

It shows that quality = 11 is completely out of bounds on the exponential growth factor.

Edit: I'm unsure how the numbers are derived, so I'm going to test myself in a few minutes.

@Genbox
Copy link

Genbox commented Jul 18, 2022

I did a quick test with a file of 10 MB.

level file size comp. size ratio time (seconds)
1 large-file.json 26141343 3744113 0.1432 0.0583943
2 large-file.json 26141343 3373190 0.129 0.1118668
3 large-file.json 26141343 3224375 0.1233 0.1220769
4 large-file.json 26141343 2733002 0.1045 0.2240102
5 large-file.json 26141343 2516949 0.0963 0.3680430
6 large-file.json 26141343 2460750 0.0941 0.4247720
7 large-file.json 26141343 2433219 0.0931 0.6596870
8 large-file.json 26141343 2419904 0.0926 0.7471796
9 large-file.json 26141343 2376898 0.0909 0.9791963
10 large-file.json 26141343 2129655 0.0815 10.4230968
11 large-file.json 26141343 2075080 0.0794 27.8797591

Or as a graph:
X = levels
Y = time in seconds

image

Edit: This is with a window size of 22, which is the default.

Edit2: This is with .NET 7 preview 6 using BrotliEncoder directly

@Genbox
Copy link

Genbox commented Jul 18, 2022

Note that I'm not nessecarily saying Quality = 11 should be completely inaccessible. It is definitely "SmallestSize", but at an extreme cost to such an extent that it cannot be used by 99% of users.

MS decided to abstract away the granularity of compression settings to Optimal, SmallestSize etc. which is nice for newbies, but setting SmallestSize to the equivalent of "Extreme compression settings. Use with caution." will probably cause a lot of headache.

For comparison, LZMA2 on level 9 (the highest - called Ultra in 7-Zip Manager) compressed the 10MB file in 0.49s on my computer.

@stephentoub
Copy link
Member Author

stephentoub commented Jul 18, 2022

  • BrotliEncoder exposes the underlying options, so they're all accessible if someone needs them.
  • Standardize pattern for exposing advanced configuration for compression streams #42820 tracks adding additional constructors to BrotliStream, DeflateStream, etc. for configuring these knobs as a passthrough rather than using CompressionLevel.
  • CompressionLevel.SmallestSize is defined to be the level that provides the smallest size according to the compression algorithm. Brotli explicitly defines that as 11 (and FWIW it also defines 11 as the default if you don't specify a level).
  • The most common use case is to just specify CompressionMode.Compress, in which case the 11 for SmallestSize isn't relevant.
  • Level 11 in your example is ~2.7x slower than level 10. I do not see that difference as so utterly egregious that we should lie about it being the "smallest size".

@EgorBo
Copy link
Member

EgorBo commented Aug 2, 2022

Perf Improvements dotnet/perf-autofiling-issues#6965 🙂

@kunalspathak
Copy link
Member

kunalspathak commented Aug 4, 2022

Seems to be regression on windows/arm64: #73391 and dotnet/perf-autofiling-issues#6971

@kunalspathak
Copy link
Member

kunalspathak commented Aug 4, 2022

Windows/arm64 improvements: dotnet/perf-autofiling-issues#6950, dotnet/perf-autofiling-issues#6980

@stephentoub
Copy link
Member Author

Seems to be regression on windows/arm64: #73391

The test is faulty:
#73391 (comment)

@Genbox
Copy link

Genbox commented Aug 5, 2022

@EgorBo
Perf Improvements dotnet/perf-autofiling-issues#6965 🙂

It is a change in algorithmic defaults. Binding the CompressionLevel enum to a different quality level. The benchmark should use quality settings directly so a change in defaults wouldn't seem like a perf improvement.

eerhardt added a commit to eerhardt/sdk that referenced this pull request Sep 2, 2022
Reacting to dotnet/runtime#72266, which changed CompressionLevel.Optimal to no longer mean "smallest size", but instead a balance between compression speed and output size. In Blazor WASM publishing, we really want smallest size - it is preferred to spend more time during publish in order for less bytes to be downloaded and cached in the browser.

The fix is to change the default compression level to SmallestSize in the brotli tool used by WASM publish.
eerhardt added a commit to dotnet/sdk that referenced this pull request Sep 6, 2022
Reacting to dotnet/runtime#72266, which changed CompressionLevel.Optimal to no longer mean "smallest size", but instead a balance between compression speed and output size. In Blazor WASM publishing, we really want smallest size - it is preferred to spend more time during publish in order for less bytes to be downloaded and cached in the browser.

The fix is to change the default compression level to SmallestSize in the brotli tool used by WASM publish.
@eerhardt
Copy link
Member

eerhardt commented Sep 7, 2022

@stephentoub - do you think this change warrants a breaking change doc being written?

cc @ericstj @preethikurup

@stephentoub
Copy link
Member Author

stephentoub commented Sep 7, 2022

I don't personally think so; the impact is some code will get a lot faster and the resulting compressed data may be a bit larger. If we updated to a new version of Brotli and it itself incurred a change that resulted in a bit larger output, we wouldn't call that a breaking change.

@ericstj
Copy link
Member

ericstj commented Sep 7, 2022

We always have folks tell us we broke them when we make a change like this. It's better to let them know it was intentional. It's also a good place to tell them what steps they can take to get back to the old behavior - if that's what they want.

@stephentoub
Copy link
Member Author

stephentoub commented Sep 7, 2022

Then we should also have one for updating zlib, unless you're sure that all the changes will result in exactly the same output bytes for any input.

@ericstj
Copy link
Member

ericstj commented Sep 7, 2022

I think such a notice would be helpful and would help folks understand what our guarantees are. In those cases the changes should be more subtle and we can imagine that folks wouldn't want / need to be able to restore the old behavior - but I am sure they would appreciate the notice that a change is expected.

In this case we can imagine that someone might want to restore the old behavior (as was the case in the SDK scenario) so it's more important to have the docs in place to help them do that.

@danmoseley
Copy link
Member

Then we should also have one for updating zlib, unless you're sure that all the changes will result in exactly the same output bytes for any input.

I don't follow. Due to this change, when upgrading an app, an important codepath may well get 30-40x slower (viz dotnet/sdk#27659). For most customers it won't be immediately obvious what happened or how to fix it. A simple breaking change note about it costs us little but would set people in the right direction. A zlib update seems rather different, as far as I know we expect it to just work, so I don't see the connection.

@stephentoub
Copy link
Member Author

stephentoub commented Sep 7, 2022

Due to this change, when upgrading an app, an important codepath may well get 30-40x slower (viz dotnet/sdk#27659).

Huh? This PR (Fix CompressionLevel.Optimal for Brotli) made that task in SDK 30x-40x faster (209ms instead of 7600ms), not slower. It did so by not employing as much compression, such that the resulting binary size was then 3.28 MB instead of 2.64 MB. The PR you link to that incurs a 40x slowdown is explicitly specifying CompressionLevel.SmallestSize in order to revert back to the behavior prior to this change, in order to get back that 0.6MB in size in exchange for increased 7s of processing time.

@stephentoub
Copy link
Member Author

stephentoub commented Sep 7, 2022

A zlib update seems rather different, as far as I know we expect it to just work, so I don't see the connection.

The connection is there are five years worth of improvements in that zlib upgrade. I have no idea if any of those changes might impact exactly what (valid) bytes are output as part of compression. Maybe the compressed output between zlib 1.12.11 and 1.12.12 is always 100% identical, or maybe there are cases where they both produce valid compressed outputs that aren't byte-for-byte identical. If this Brotli PR is a breaking change, it'd be because we consider byte-for-byte changes from version to version a breaking change worthy of documenting. This PR "just works", too.

If the goal of adding a breaking change notice is just that it's a place where we can call attention to interesting differences in versions, then we need to come up with a different term other than "breaking". As we all know, literally every bug fix we make, every performance improvement we make, can break someone somewhere somehow, so we have a bar for what constitutes a break, and from my perspective, this PR doesn't rise to the level of that bar. I have zero concerns with creating a notice somewhere about "if you see your BrotliStream working faster but creating larger files and that's an issue for you, here's how to get back to the old slowness and smallness", I just don't believe it's a "break" by our definition (unless we consider byte-for-byte compressed output compatibility a thing).

@danmoseley
Copy link
Member

Huh? This PR (Fix CompressionLevel.Optimal for Brotli) made that task in SDK 30x-40x faster

Ha, yes, brainstorm. My point was -- it was a change that made the output significantly larger, and a fair number of users may need to react because after the upgrade that no longer meets their requirements. I think that's materially more than an output that isn't "byte for byte identical". The breaking change note would be simple. We did this, the result you will see is this, if that's not what you want make this explicit change.

@danmoseley
Copy link
Member

If the goal of adding a breaking change notice is just that it's a place where we can call attention to interesting differences in versions

It's not. The goal is to call out changes that we believe have a good chance of breaking some existing code (meaning, it goes out of spec) usually where there is a clear recommendation we can make to adapt to the change. This meets those criteria and a zlib update does not.

@stephentoub
Copy link
Member Author

The breaking change note would be simple. We did this, the result you will see is this, if that's not what you want make this explicit change.

My concern isn't around the complexity of the note. It's that documenting this kind of change as a "break" dilutes the meaning of "breaking change" and the value of a breaking change list. Plain and simple.

@danmoseley
Copy link
Member

Gotcha. Maybe we can make that clear by putting it at the end separately, with a note that it's not a breaking change, but may be a visible change that upgraders may want to accommodate.

@danmoseley
Copy link
Member

@PriyaPurkayastha do you have written criteria for the breaking change list?

@PriyaPurkayastha
Copy link

No, we don't specific criteria that defines what should get documented as a breaking change.
On reading through this issue:

  1. I understand that this change speeds up performance (which is non-breaking)
  2. I also see that there might customers that prefer compression size to be smaller and not want the increased speed, similar to our requirements for Blazor WASM per the PR linked(Fix br compression size regression in Blazor WASM sdk#27659). Increase in size of compressed data is a change that might not be acceptable based on customer requirements and hence can be considered breaking.

To the best of my knowledge, we have the following modes for documentation:

  • Known issues
  • What's new
  • Breaking changes

Customers tend to look at Known Issues and Breaking Changes when they observe something that is different from what they are used to seeing when they migrate to a new .NET version and especially if they are looking for details that will help revert to previous behavior. Another factor to consider is how easily can customers diagnose and root cause a change in their output to this specific change.

The reason we don't have strict criteria about what is considered breaking or not is because it is not possible to come up with an exhaustive/complete list. I generally rely upon teams to determine if there is any change that customers might not be able/want to consume based on code inspection/subject matter expertise/compat lab indicators and document those changes as breaking changes. An important point to remember is that we are not discussing what is allowed v/s not allowed - this is purely what would be helpful to surface to customers via documentation.

@WangleLine
Copy link

That's really neat!

@masonwheeler
Copy link
Contributor

This PR is marked as "merged," but it's not in .NET 7 RC1. Brotli compression under level "Optimal" is excessively slow, and decompiling C:\Program Files\dotnet\shared\Microsoft.NETCore.App\7.0.0-preview.7.22375.6\System.IO.Compression.Brotli.dll in DotPeek shows that BrotliUtils.Quality_Default is still set to 11 rather than 4. Any idea why the RC got built off of outdated code?

@stephentoub
Copy link
Member Author

stephentoub commented Sep 19, 2022

This PR is marked as "merged," but it's not in .NET 7 RC1.

It is in .NET 7 RC1.

decompiling C:\Program Files\dotnet\shared\Microsoft.NETCore.App\7.0.0-preview.7.22375.6\System.IO.Compression.Brotli.dll

That's not RC1. It's "7.0.0-preview.7.22375.6" (a "preview 7" build). RC1 is 7.0.0-rc.1.22426.10.

@masonwheeler
Copy link
Contributor

All right, maybe I'm missing something. On the post announcing RC1, it said to use the latest Visual Studio Preview build if you want to use RC1 with Visual Studio. I'm on that preview. This is the .NET 7 build I have. What gives?

@stephentoub
Copy link
Member Author

stephentoub commented Sep 19, 2022

What gives?

You also need to install RC1.
https://dotnet.microsoft.com/en-us/download/dotnet/7.0
The blog post is saying that the latest VS is recommended for working with that.

@masonwheeler
Copy link
Contributor

Well I installed that, and now Visual Studio is broken. I try to launch it and open my project, and it hangs indefinitely on "Loading project 22 of 36."

@danmoseley
Copy link
Member

Hmm, does that occur consistently? And if you uninstall RC1, does it go away?

@masonwheeler
Copy link
Contributor

Yes, it occurs consistently. I was able to work around it by removing a specific project out of my SLN, but there's no clear reason I can see as to why that project should have given it any trouble in the first place.

Now, everything continues to be broken, for a different reason. How is it even possible that this is the state of an RC? This is what I'd expect out of an Alpha release.

@ghost ghost locked as resolved and limited conversation to collaborators Oct 20, 2022
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
Projects
None yet