Skip to content

Commit

Permalink
ARROW-16117: [JS] Improve decode UTF8 performance
Browse files Browse the repository at this point in the history
(See comment below for benchmarks)

While profiling the performance of decoding TPC-H Customer and Part in-browser, datasets where there are a lot of UTF8s, it turned out that much of the time was being spent in `getVariableWidthBytes` rather than in `TextDecoder` itself. Ideally all the time should be spent in `TextDecoder`.

On Chrome `getVariableWidthBytes` took up to ~15% of the e2e decoding latency, and on Safari it was close to ~40% (Safari's TextDecoder is much faster than Chrome's, so this took up relatively more time).

This is likely because the code in this PR is more amenable to V8/JSC's JIT, since `x` and `y` now are guaranteed to be SMIs ("small integers") instead of Object, allowing the JIT to emit efficient machine instructions that only deal in 32-bit integers. Once V8 discovers that a `x` and `y` can potentially be null (upon iterating past the bounds), it "poisons" the codepath forever, since it has to deal with the null case.

See this V8 post for a more in-depth explanation (in particular see the examples underneath "Performance tips"):
https://v8.dev/blog/elements-kinds

Doing the bounds check explicitly instead of implicitly basically eliminates this function from showing up in the profiling. Empirically, on my machine decoding TPC-H Part dropped from 1.9s to 1.7s on Chrome, and Customer dropped from 1.4s to 1.2s.

Closes apache#12793 from hzuo/hz/dp

Authored-by: hzuo <[email protected]>
Signed-off-by: Dominik Moritz <[email protected]>
  • Loading branch information
hzuo authored and domoritz committed Apr 6, 2022
1 parent 88eea9c commit adfa605
Showing 1 changed file with 6 additions and 2 deletions.
8 changes: 6 additions & 2 deletions js/src/visitor/get.ts
Original file line number Diff line number Diff line change
Expand Up @@ -109,8 +109,12 @@ function wrapGet<T extends DataType>(fn: (data: Data<T>, _1: any) => any) {
const getNull = <T extends Null>(_data: Data<T>, _index: number): T['TValue'] => null;
/** @ignore */
const getVariableWidthBytes = (values: Uint8Array, valueOffsets: Int32Array, index: number) => {
const { [index]: x, [index + 1]: y } = valueOffsets;
return x != null && y != null ? values.subarray(x, y) : null as any;
if (index + 1 >= valueOffsets.length) {
return null as any;
}
const x = valueOffsets[index];
const y = valueOffsets[index + 1];
return values.subarray(x, y);
};

/** @ignore */
Expand Down

0 comments on commit adfa605

Please sign in to comment.