ARROW-16117: [JS] Improve decode UTF8 performance

(See comment below for benchmarks) While profiling the performance of decoding TPC-H Customer and Part in-browser, datasets where there are a lot of UTF8s, it turned out that much of the time was being spent in `getVariableWidthBytes` rather than in `TextDecoder` itself. Ideally all the time should be spent in `TextDecoder`. On Chrome `getVariableWidthBytes` took up to ~15% of the e2e decoding latency, and on Safari it was close to ~40% (Safari's TextDecoder is much faster than Chrome's, so this took up relatively more time). This is likely because the code in this PR is more amenable to V8/JSC's JIT, since `x` and `y` now are guaranteed to be SMIs ("small integers") instead of Object, allowing the JIT to emit efficient machine instructions that only deal in 32-bit integers. Once V8 discovers that a `x` and `y` can potentially be null (upon iterating past the bounds), it "poisons" the codepath forever, since it has to deal with the null case. See this V8 post for a more in-depth explanation (in particular see the examples underneath "Performance tips"): https://v8.dev/blog/elements-kinds Doing the bounds check explicitly instead of implicitly basically eliminates this function from showing up in the profiling. Empirically, on my machine decoding TPC-H Part dropped from 1.9s to 1.7s on Chrome, and Customer dropped from 1.4s to 1.2s. Closes apache#12793 from hzuo/hz/dp Authored-by: hzuo <[email protected]> Signed-off-by: Dominik Moritz <[email protected]>
Bit-Quill · Apr 6, 2022 · adfa605 · adfa605
1 parent 88eea9c
commit adfa605
Showing 1 changed file with 6 additions and 2 deletions.
diff --git a/js/src/visitor/get.ts b/js/src/visitor/get.ts
@@ -109,8 +109,12 @@ function wrapGet<T extends DataType>(fn: (data: Data<T>, _1: any) => any) {
 const getNull = <T extends Null>(_data: Data<T>, _index: number): T['TValue'] => null;
 /** @ignore */
 const getVariableWidthBytes = (values: Uint8Array, valueOffsets: Int32Array, index: number) => {
-    const { [index]: x, [index + 1]: y } = valueOffsets;
-    return x != null && y != null ? values.subarray(x, y) : null as any;
+    if (index + 1 >= valueOffsets.length) {
+        return null as any;
+    }
+    const x = valueOffsets[index];
+    const y = valueOffsets[index + 1];
+    return values.subarray(x, y);
 };
 
 /** @ignore */