New implementation of methods `readBitsInt{Be,Le}()` (fixes known bugs) + bits int fuzzer #949

generalmimon · 2022-02-08T12:30:37Z

Current algorithm used for bit-sized integer reading gives incorrect results in all languages that have fixed bit precision - see kaitai-io/kaitai_struct_javascript_runtime#20. The problem occurs notably when an integer with the highest possible number of bits (32 bits in JavaScript, 64 bits in C++/STL, C#, Go, Java, Nim, PHP) is read from an unaligned bit position - then always some bits get lost in the process.

There is also another issue in the JS runtime (kaitai-io/kaitai_struct_javascript_runtime#22), which is a result of me "improving" the algorithm without taking into account the specifics of JS bitwise operators.

I've discovered other bugs in the past - for example our implementation of readBitsIntLe() in JavaScript and Java was using a "sign-propagating" right shift (>>) instead of the correct unsigned right shift (>>>), which showed that >> is a awkward operator that summons an entire train of 1-bits starting on the most significant bit, if the MSB (sign bit) happened to be set to 1. I mean - this may be great for division of negative integers (if you wondered what Math.floor(-13 / 4) equals to, you can do -13 >> 2 = -4 = Math.floor(-3.25), nice), but I doubt it's useful for anything else.

Actually the same problem was in PHP (kaitai-io/kaitai_struct_php_runtime@c0cc3e2), but PHP even doesn't have any unsigned right shift built-in, so I had to supply a tiny function to emulate it. Another funny thing in PHP (solved in the same commit) is this:

$ php -r "var_dump(1 << 63);"
int(-9223372036854775808)
$ php -r "var_dump((1 << 63) - 1);"
float(-9.2233720368548E+18)

(which led to a loss of precision).

The point I'm making here is that writing reliable readBitsInt{Be,Le}() functions is hard. All bugs so far were revealed at random, so it's still quite likely that there are some inputs which wouldn't work but just have not been tried or reported yet. Therefore I also wrote a simple "fuzzer" for bit integer layouts and inputs - but instead of generating some random data, it evaluates all possible bit layouts and combination of "fillings" per configuration. The fuzzer is written in Python, but is supposed to work for all our target languages - the Python code just generates .ksy and .kst files and then the existing infrastructure of https://github.com/kaitai-io/kaitai_struct_tests is used to generate unit tests, run them and aggregate results. I will publish the code soon.

I've already rewritten the functions in JavaScript to fix all known bugs and I'm quite happy with them, so now I'll gradually update all runtime libraries to use this new implementation. I think it's a notable change, so I'm creating this issue.

I also like the idea of the fuzzer and I'm wondering if we could use it also for other areas of Kaitai Struct - it virtually eliminates the possibility of any hidden bug in the tested function. Therefore, I hope that this is the last implementation of readBitsInt{Be,Le}() we will ever need - I'm going to test all target languages with the fuzzer so that all language-specific properties of the used arithmetic and bitwise operators that could cause bugs do not go unnoticed.

The text was updated successfully, but these errors were encountered:

Mingun · 2022-02-08T13:57:22Z

If you going to fix bugs in runtimes, could you also look at kaitai-io/kaitai_struct_javascript_runtime#26 and probably include your tests to proposed testsuite.

generalmimon · 2022-02-28T21:52:05Z

As I promised:

Therefore I also wrote a simple "fuzzer" for bit integer layouts and inputs - but instead of generating some random data, it evaluates all possible bit layouts and combination of "fillings" per configuration.

I will publish the code soon.

The fuzzer: https://github.com/generalmimon/ks-bits-fuzzer
Related changes in the kaitai_struct_tests repo: https://github.com/generalmimon/kaitai_struct_tests/tree/ks-bits-fuzzer

Implementation changes in all runtime libraries are also complete, I will publish them later this week.

See kaitai-io/kaitai_struct#949

See kaitai-io/kaitai_struct#949 Changing the parameter type is potentially a breaking change, because calls like ReadBitsIntBe(uint8(15)) will no longer work. But this will not be a problem for KSC-generated code, it could only break manually-written (or manually-patched) code. `ReadBitsInt(n uint8)` is intentionally left as it was, since it exists only to maintain backward compatibility and will be removed at some point.

See kaitai-io/kaitai_struct#949

See kaitai-io/kaitai_struct#949 Fix #20, fix #22

See kaitai-io/kaitai_struct#949

Note: BitsSignedB{32,64}Le were removed in the previous commit 43d8b74 According to my tests, new formats BitsSignedShiftB{32,64}Le have better detection capabilities than their predecessors. Old tests only detected kaitai-io/kaitai_struct_javascript_runtime@1f8dd95a, but did not catch the bug when I intentionally change the appropriate unsigned right shift `>>>` to signed `>>` in the new `readBitsIntLe` impl introduced in kaitai-io/kaitai_struct#949. Tests added in this commit reveal both of these errors.

generalmimon · 2022-04-14T18:00:17Z

I also wanted to mention that this new implementation allows to further optimize the speed of the readBitsInt{Be,Le} methods (if there's any interest in this). Probably the part where most execution time is spent is reading the bytes_needed-byte integer, which is currently done by calling read_bytes(bytes_needed) and then using a for loop with << and | operations for each byte.

This can be made more efficient by replacing the for loop with methods reading primitive byte integers, such as readU8le() for bytesNeeded == 8. In JavaScript, it can look like this:

@@ -11,9 +11,21 @@ KaitaiStream.prototype.readBitsIntLe = function(n) {
     // 8 bits => 1 byte
     // 9 bits => 2 bytes
     var bytesNeeded = ((bitsNeeded - 1) >> 3) + 1; // `ceil(bitsNeeded / 8)` (NB: `x >> 3` is `floor(x / 8)`)
-    var buf = this.readBytes(bytesNeeded);
-    for (var i = 0; i < bytesNeeded; i++) {
-      res |= buf[i] << (i * 8);
+    var shift = 0;
+    switch (bytesNeeded) {
+      case 4:
+        res = this.readU4le();
+        break;
+      case 3:
+        res = this.readU1();
+        shift = 8;
+        // fallthrough
+      case 2:
+        res |= this.readU2le() << shift;
+        break;
+      case 1:
+        res = this.readU1();
+        break;
     }
     // NB: in JavaScript, bit shift operators always shift by modulo 32 of the right-hand operand (see

I made a simple JS benchmark: https://gist.github.com/generalmimon/866bd57f98c227bcbba1e9834e225072

$ node bench.js
Running "readBitsIntLe suite" suite...
Progress: 100%

  KaitaiStream#readBitsIntLeV2:
    9 105 608 ops/s, ±0.38%   | fastest

  KaitaiStream#readBitsIntLe:
    4 524 954 ops/s, ±0.32%   | slowest, 50.31% slower

Finished 2 cases!
  Fastest: KaitaiStream#readBitsIntLeV2
  Slowest: KaitaiStream#readBitsIntLe

So this would clearly improve the performance significantly.

The approach can be easily extended to 64-bit integers - this is how it would look in Java for big endian byte order (that is actually easier than for little endian, because the shift variable is not needed):

    switch (bytesNeeded) {
      case 8:
        res = this.readU8be();
        break;
      case 7:
        res = this.readU1() << 48;
        // fallthrough
      case 6:
        res |= this.readU2be() << 32 | this.readU4be();
        break;
      case 5:
        res = this.readU1() << 32;
        // fallthrough
      case 4:
        res |= this.readU4be();
        break;
      case 3:
        res = this.readU1() << 16;
        // fallthrough
      case 2:
        res |= this.readU2be();
        break;
      case 1:
        res = this.readU1();
        break;
    }

But this change would require another great piece of work to adopt this everywhere, and the result would be a longer code that is harder to maintain (but at least we can use my tool https://github.com/generalmimon/ks-bits-fuzzer to make sure everything still works correctly). So I'm not going to do that for now.

This makes sense mainly in target languages with a native way to read an n-byte integer for arbitrary n (n can only be at most 8 in 64-bit integer languages, though), such as int.from_bytes() in Python 3. Native methods are usually faster than a custom for loop.

Anyway, this issue is now finished. Further improvements can be discussed in future issues if needed.

generalmimon added the enhancement label Feb 8, 2022

generalmimon added the related project label Feb 28, 2022

generalmimon added a commit to kaitai-io/kaitai_struct_cpp_stl_runtime that referenced this issue Mar 13, 2022

Rewrite read_bits_int_{be,le}()

5b5dcc8

See kaitai-io/kaitai_struct#949

generalmimon added a commit to kaitai-io/kaitai_struct_csharp_runtime that referenced this issue Mar 13, 2022

Rewrite ReadBitsInt{Be,Le}()

729d2e5

See kaitai-io/kaitai_struct#949

generalmimon added a commit to kaitai-io/kaitai_struct_java_runtime that referenced this issue Mar 13, 2022

Rewrite readBitsInt{Be,Le}()

e7067d7

See kaitai-io/kaitai_struct#949

generalmimon added a commit to kaitai-io/kaitai_struct_javascript_runtime that referenced this issue Mar 13, 2022

Rewrite readBitsInt{Be,Le}() - fix known issues, speed up a bit

78ef0bb

See kaitai-io/kaitai_struct#949 Fix #20, fix #22

generalmimon added a commit to kaitai-io/kaitai_struct_lua_runtime that referenced this issue Mar 13, 2022

Rewrite read_bits_int_{be,le}()

b283f2a

See kaitai-io/kaitai_struct#949

generalmimon added a commit to kaitai-io/kaitai_struct_nim_runtime that referenced this issue Mar 13, 2022

Rewrite readBitsInt{Be,Le}()

3072f3f

See kaitai-io/kaitai_struct#949

generalmimon added a commit to kaitai-io/kaitai_struct_perl_runtime that referenced this issue Mar 13, 2022

Rewrite read_bits_int_{be,le}()

1866c61

See kaitai-io/kaitai_struct#949

generalmimon added a commit to kaitai-io/kaitai_struct_php_runtime that referenced this issue Mar 13, 2022

Rewrite readBitsInt{Be,Le}()

78735ff

See kaitai-io/kaitai_struct#949

generalmimon added a commit to kaitai-io/kaitai_struct_python_runtime that referenced this issue Mar 13, 2022

Rewrite read_bits_int_{be,le}()

427ea19

See kaitai-io/kaitai_struct#949

generalmimon added a commit to kaitai-io/kaitai_struct_ruby_runtime that referenced this issue Mar 13, 2022

Rewrite read_bits_int_{be,le}()

12d9b05

See kaitai-io/kaitai_struct#949

generalmimon added a commit that referenced this issue Mar 13, 2022

Pull KSC, KST and runtime changes related to #949

699b5a0

generalmimon added this to the v0.10 milestone Mar 13, 2022

generalmimon closed this as completed Apr 14, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

New implementation of methods `readBitsInt{Be,Le}()` (fixes known bugs) + bits int fuzzer #949

New implementation of methods `readBitsInt{Be,Le}()` (fixes known bugs) + bits int fuzzer #949

generalmimon commented Feb 8, 2022

Mingun commented Feb 8, 2022

generalmimon commented Feb 28, 2022

generalmimon commented Apr 14, 2022 •

edited

Loading

New implementation of methods readBitsInt{Be,Le}() (fixes known bugs) + bits int fuzzer #949

New implementation of methods readBitsInt{Be,Le}() (fixes known bugs) + bits int fuzzer #949

Comments

generalmimon commented Feb 8, 2022

Mingun commented Feb 8, 2022

generalmimon commented Feb 28, 2022

generalmimon commented Apr 14, 2022 • edited Loading

New implementation of methods `readBitsInt{Be,Le}()` (fixes known bugs) + bits int fuzzer #949

New implementation of methods `readBitsInt{Be,Le}()` (fixes known bugs) + bits int fuzzer #949

generalmimon commented Apr 14, 2022 •

edited

Loading