-
Notifications
You must be signed in to change notification settings - Fork 199
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
New implementation of methods readBitsInt{Be,Le}()
(fixes known bugs) + bits int fuzzer
#949
Comments
If you going to fix bugs in runtimes, could you also look at kaitai-io/kaitai_struct_javascript_runtime#26 and probably include your tests to proposed testsuite. |
As I promised:
Implementation changes in all runtime libraries are also complete, I will publish them later this week. |
See kaitai-io/kaitai_struct#949 Changing the parameter type is potentially a breaking change, because calls like ReadBitsIntBe(uint8(15)) will no longer work. But this will not be a problem for KSC-generated code, it could only break manually-written (or manually-patched) code. `ReadBitsInt(n uint8)` is intentionally left as it was, since it exists only to maintain backward compatibility and will be removed at some point.
Note: BitsSignedB{32,64}Le were removed in the previous commit 43d8b74 According to my tests, new formats BitsSignedShiftB{32,64}Le have better detection capabilities than their predecessors. Old tests only detected kaitai-io/kaitai_struct_javascript_runtime@1f8dd95a, but did not catch the bug when I intentionally change the appropriate unsigned right shift `>>>` to signed `>>` in the new `readBitsIntLe` impl introduced in kaitai-io/kaitai_struct#949. Tests added in this commit reveal both of these errors.
I also wanted to mention that this new implementation allows to further optimize the speed of the This can be made more efficient by replacing the @@ -11,9 +11,21 @@ KaitaiStream.prototype.readBitsIntLe = function(n) {
// 8 bits => 1 byte
// 9 bits => 2 bytes
var bytesNeeded = ((bitsNeeded - 1) >> 3) + 1; // `ceil(bitsNeeded / 8)` (NB: `x >> 3` is `floor(x / 8)`)
- var buf = this.readBytes(bytesNeeded);
- for (var i = 0; i < bytesNeeded; i++) {
- res |= buf[i] << (i * 8);
+ var shift = 0;
+ switch (bytesNeeded) {
+ case 4:
+ res = this.readU4le();
+ break;
+ case 3:
+ res = this.readU1();
+ shift = 8;
+ // fallthrough
+ case 2:
+ res |= this.readU2le() << shift;
+ break;
+ case 1:
+ res = this.readU1();
+ break;
}
// NB: in JavaScript, bit shift operators always shift by modulo 32 of the right-hand operand (see I made a simple JS benchmark: https://gist.github.com/generalmimon/866bd57f98c227bcbba1e9834e225072 $ node bench.js
Running "readBitsIntLe suite" suite...
Progress: 100%
KaitaiStream#readBitsIntLeV2:
9 105 608 ops/s, ±0.38% | fastest
KaitaiStream#readBitsIntLe:
4 524 954 ops/s, ±0.32% | slowest, 50.31% slower
Finished 2 cases!
Fastest: KaitaiStream#readBitsIntLeV2
Slowest: KaitaiStream#readBitsIntLe So this would clearly improve the performance significantly. The approach can be easily extended to 64-bit integers - this is how it would look in Java for big endian byte order (that is actually easier than for little endian, because the switch (bytesNeeded) {
case 8:
res = this.readU8be();
break;
case 7:
res = this.readU1() << 48;
// fallthrough
case 6:
res |= this.readU2be() << 32 | this.readU4be();
break;
case 5:
res = this.readU1() << 32;
// fallthrough
case 4:
res |= this.readU4be();
break;
case 3:
res = this.readU1() << 16;
// fallthrough
case 2:
res |= this.readU2be();
break;
case 1:
res = this.readU1();
break;
} But this change would require another great piece of work to adopt this everywhere, and the result would be a longer code that is harder to maintain (but at least we can use my tool https://github.com/generalmimon/ks-bits-fuzzer to make sure everything still works correctly). So I'm not going to do that for now. This makes sense mainly in target languages with a native way to read an Anyway, this issue is now finished. Further improvements can be discussed in future issues if needed. |
Current algorithm used for bit-sized integer reading gives incorrect results in all languages that have fixed bit precision - see kaitai-io/kaitai_struct_javascript_runtime#20. The problem occurs notably when an integer with the highest possible number of bits (32 bits in JavaScript, 64 bits in C++/STL, C#, Go, Java, Nim, PHP) is read from an unaligned bit position - then always some bits get lost in the process.
There is also another issue in the JS runtime (kaitai-io/kaitai_struct_javascript_runtime#22), which is a result of me "improving" the algorithm without taking into account the specifics of JS bitwise operators.
I've discovered other bugs in the past - for example our implementation of
readBitsIntLe()
in JavaScript and Java was using a "sign-propagating" right shift (>>
) instead of the correct unsigned right shift (>>>
), which showed that>>
is a awkward operator that summons an entire train of1
-bits starting on the most significant bit, if the MSB (sign bit) happened to be set to1
. I mean - this may be great for division of negative integers (if you wondered whatMath.floor(-13 / 4)
equals to, you can do-13 >> 2 = -4 = Math.floor(-3.25)
, nice), but I doubt it's useful for anything else.Actually the same problem was in PHP (kaitai-io/kaitai_struct_php_runtime@c0cc3e2), but PHP even doesn't have any unsigned right shift built-in, so I had to supply a tiny function to emulate it. Another funny thing in PHP (solved in the same commit) is this:
(which led to a loss of precision).
The point I'm making here is that writing reliable
readBitsInt{Be,Le}()
functions is hard. All bugs so far were revealed at random, so it's still quite likely that there are some inputs which wouldn't work but just have not been tried or reported yet. Therefore I also wrote a simple "fuzzer" for bit integer layouts and inputs - but instead of generating some random data, it evaluates all possible bit layouts and combination of "fillings" per configuration. The fuzzer is written in Python, but is supposed to work for all our target languages - the Python code just generates.ksy
and.kst
files and then the existing infrastructure of https://github.com/kaitai-io/kaitai_struct_tests is used to generate unit tests, run them and aggregate results. I will publish the code soon.I've already rewritten the functions in JavaScript to fix all known bugs and I'm quite happy with them, so now I'll gradually update all runtime libraries to use this new implementation. I think it's a notable change, so I'm creating this issue.
I also like the idea of the fuzzer and I'm wondering if we could use it also for other areas of Kaitai Struct - it virtually eliminates the possibility of any hidden bug in the tested function. Therefore, I hope that this is the last implementation of
readBitsInt{Be,Le}()
we will ever need - I'm going to test all target languages with the fuzzer so that all language-specific properties of the used arithmetic and bitwise operators that could cause bugs do not go unnoticed.The text was updated successfully, but these errors were encountered: