Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

New implementation of methods readBitsInt{Be,Le}() (fixes known bugs) + bits int fuzzer #949

Closed
generalmimon opened this issue Feb 8, 2022 · 3 comments

Comments

@generalmimon
Copy link
Member

Current algorithm used for bit-sized integer reading gives incorrect results in all languages that have fixed bit precision - see kaitai-io/kaitai_struct_javascript_runtime#20. The problem occurs notably when an integer with the highest possible number of bits (32 bits in JavaScript, 64 bits in C++/STL, C#, Go, Java, Nim, PHP) is read from an unaligned bit position - then always some bits get lost in the process.

There is also another issue in the JS runtime (kaitai-io/kaitai_struct_javascript_runtime#22), which is a result of me "improving" the algorithm without taking into account the specifics of JS bitwise operators.

I've discovered other bugs in the past - for example our implementation of readBitsIntLe() in JavaScript and Java was using a "sign-propagating" right shift (>>) instead of the correct unsigned right shift (>>>), which showed that >> is a awkward operator that summons an entire train of 1-bits starting on the most significant bit, if the MSB (sign bit) happened to be set to 1. I mean - this may be great for division of negative integers (if you wondered what Math.floor(-13 / 4) equals to, you can do -13 >> 2 = -4 = Math.floor(-3.25), nice), but I doubt it's useful for anything else.

Actually the same problem was in PHP (kaitai-io/kaitai_struct_php_runtime@c0cc3e2), but PHP even doesn't have any unsigned right shift built-in, so I had to supply a tiny function to emulate it. Another funny thing in PHP (solved in the same commit) is this:

$ php -r "var_dump(1 << 63);"
int(-9223372036854775808)
$ php -r "var_dump((1 << 63) - 1);"
float(-9.2233720368548E+18)

(which led to a loss of precision).


The point I'm making here is that writing reliable readBitsInt{Be,Le}() functions is hard. All bugs so far were revealed at random, so it's still quite likely that there are some inputs which wouldn't work but just have not been tried or reported yet. Therefore I also wrote a simple "fuzzer" for bit integer layouts and inputs - but instead of generating some random data, it evaluates all possible bit layouts and combination of "fillings" per configuration. The fuzzer is written in Python, but is supposed to work for all our target languages - the Python code just generates .ksy and .kst files and then the existing infrastructure of https://github.com/kaitai-io/kaitai_struct_tests is used to generate unit tests, run them and aggregate results. I will publish the code soon.

I've already rewritten the functions in JavaScript to fix all known bugs and I'm quite happy with them, so now I'll gradually update all runtime libraries to use this new implementation. I think it's a notable change, so I'm creating this issue.

I also like the idea of the fuzzer and I'm wondering if we could use it also for other areas of Kaitai Struct - it virtually eliminates the possibility of any hidden bug in the tested function. Therefore, I hope that this is the last implementation of readBitsInt{Be,Le}() we will ever need - I'm going to test all target languages with the fuzzer so that all language-specific properties of the used arithmetic and bitwise operators that could cause bugs do not go unnoticed.

@Mingun
Copy link

Mingun commented Feb 8, 2022

If you going to fix bugs in runtimes, could you also look at kaitai-io/kaitai_struct_javascript_runtime#26 and probably include your tests to proposed testsuite.

@generalmimon
Copy link
Member Author

As I promised:

Therefore I also wrote a simple "fuzzer" for bit integer layouts and inputs - but instead of generating some random data, it evaluates all possible bit layouts and combination of "fillings" per configuration.

I will publish the code soon.

Implementation changes in all runtime libraries are also complete, I will publish them later this week.

generalmimon added a commit to kaitai-io/kaitai_struct_cpp_stl_runtime that referenced this issue Mar 13, 2022
generalmimon added a commit to kaitai-io/kaitai_struct_csharp_runtime that referenced this issue Mar 13, 2022
generalmimon added a commit to kaitai-io/kaitai_struct_go_runtime that referenced this issue Mar 13, 2022
See kaitai-io/kaitai_struct#949

Changing the parameter type is potentially a breaking change, because
calls like ReadBitsIntBe(uint8(15)) will no longer work. But this
will not be a problem for KSC-generated code, it could only break
manually-written (or manually-patched) code.

`ReadBitsInt(n uint8)` is intentionally left as it was, since it exists
only to maintain backward compatibility and will be removed at some point.
generalmimon added a commit to kaitai-io/kaitai_struct_java_runtime that referenced this issue Mar 13, 2022
generalmimon added a commit to kaitai-io/kaitai_struct_javascript_runtime that referenced this issue Mar 13, 2022
generalmimon added a commit to kaitai-io/kaitai_struct_lua_runtime that referenced this issue Mar 13, 2022
generalmimon added a commit to kaitai-io/kaitai_struct_nim_runtime that referenced this issue Mar 13, 2022
generalmimon added a commit to kaitai-io/kaitai_struct_perl_runtime that referenced this issue Mar 13, 2022
generalmimon added a commit to kaitai-io/kaitai_struct_php_runtime that referenced this issue Mar 13, 2022
generalmimon added a commit to kaitai-io/kaitai_struct_python_runtime that referenced this issue Mar 13, 2022
generalmimon added a commit to kaitai-io/kaitai_struct_ruby_runtime that referenced this issue Mar 13, 2022
generalmimon added a commit to kaitai-io/kaitai_struct_tests that referenced this issue Mar 13, 2022
Note: BitsSignedB{32,64}Le were removed in the previous commit 43d8b74

According to my tests, new formats BitsSignedShiftB{32,64}Le have better
detection capabilities than their predecessors. Old tests only detected
kaitai-io/kaitai_struct_javascript_runtime@1f8dd95a,
but did not catch the bug when I intentionally change the appropriate
unsigned right shift `>>>` to signed `>>` in the new `readBitsIntLe` impl
introduced in kaitai-io/kaitai_struct#949. Tests
added in this commit reveal both of these errors.
@generalmimon generalmimon added this to the v0.10 milestone Mar 13, 2022
@generalmimon
Copy link
Member Author

generalmimon commented Apr 14, 2022

I also wanted to mention that this new implementation allows to further optimize the speed of the readBitsInt{Be,Le} methods (if there's any interest in this). Probably the part where most execution time is spent is reading the bytes_needed-byte integer, which is currently done by calling read_bytes(bytes_needed) and then using a for loop with << and | operations for each byte.

This can be made more efficient by replacing the for loop with methods reading primitive byte integers, such as readU8le() for bytesNeeded == 8. In JavaScript, it can look like this:

@@ -11,9 +11,21 @@ KaitaiStream.prototype.readBitsIntLe = function(n) {
     // 8 bits => 1 byte
     // 9 bits => 2 bytes
     var bytesNeeded = ((bitsNeeded - 1) >> 3) + 1; // `ceil(bitsNeeded / 8)` (NB: `x >> 3` is `floor(x / 8)`)
-    var buf = this.readBytes(bytesNeeded);
-    for (var i = 0; i < bytesNeeded; i++) {
-      res |= buf[i] << (i * 8);
+    var shift = 0;
+    switch (bytesNeeded) {
+      case 4:
+        res = this.readU4le();
+        break;
+      case 3:
+        res = this.readU1();
+        shift = 8;
+        // fallthrough
+      case 2:
+        res |= this.readU2le() << shift;
+        break;
+      case 1:
+        res = this.readU1();
+        break;
     }
     // NB: in JavaScript, bit shift operators always shift by modulo 32 of the right-hand operand (see

I made a simple JS benchmark: https://gist.github.com/generalmimon/866bd57f98c227bcbba1e9834e225072

$ node bench.js
Running "readBitsIntLe suite" suite...
Progress: 100%

  KaitaiStream#readBitsIntLeV2:
    9 105 608 ops/s, ±0.38%   | fastest

  KaitaiStream#readBitsIntLe:
    4 524 954 ops/s, ±0.32%   | slowest, 50.31% slower

Finished 2 cases!
  Fastest: KaitaiStream#readBitsIntLeV2
  Slowest: KaitaiStream#readBitsIntLe

So this would clearly improve the performance significantly.

The approach can be easily extended to 64-bit integers - this is how it would look in Java for big endian byte order (that is actually easier than for little endian, because the shift variable is not needed):

    switch (bytesNeeded) {
      case 8:
        res = this.readU8be();
        break;
      case 7:
        res = this.readU1() << 48;
        // fallthrough
      case 6:
        res |= this.readU2be() << 32 | this.readU4be();
        break;
      case 5:
        res = this.readU1() << 32;
        // fallthrough
      case 4:
        res |= this.readU4be();
        break;
      case 3:
        res = this.readU1() << 16;
        // fallthrough
      case 2:
        res |= this.readU2be();
        break;
      case 1:
        res = this.readU1();
        break;
    }

But this change would require another great piece of work to adopt this everywhere, and the result would be a longer code that is harder to maintain (but at least we can use my tool https://github.com/generalmimon/ks-bits-fuzzer to make sure everything still works correctly). So I'm not going to do that for now.

This makes sense mainly in target languages with a native way to read an n-byte integer for arbitrary n (n can only be at most 8 in 64-bit integer languages, though), such as int.from_bytes() in Python 3. Native methods are usually faster than a custom for loop.


Anyway, this issue is now finished. Further improvements can be discussed in future issues if needed.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants