Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use clz32 for counting trailing zeroes. #9340

Merged
merged 1 commit into from
Nov 2, 2023

Conversation

jussi-kalliokoski
Copy link
Contributor

This changes the implementation of BitSet to use Math.clz32 to count trailing zeroes instead of the current WebAssembly-based approach. This makes the implementation IMO slightly more readable to those of us without a brain to parse hex-encoded WebAssembly. 😄

Additionally this approach should yield slightly better performance for the iteration of the BitSets. At least that's what running the following micro-benchmark on my machine indicates:

Benchmark Source Code
const wasmBuf = new Uint8Array([
  0x00, 0x61, 0x73, 0x6d, 0x01, 0x00, 0x00, 0x00, 0x01, 0x06, 0x01, 0x60, 0x01,
  0x7f, 0x01, 0x7f, 0x03, 0x02, 0x01, 0x00, 0x07, 0x0d, 0x01, 0x09, 0x74, 0x72,
  0x61, 0x69, 0x6c, 0x69, 0x6e, 0x67, 0x30, 0x00, 0x00, 0x0a, 0x07, 0x01, 0x05,
  0x00, 0x20, 0x00, 0x68, 0x0b, 0x00, 0x0f, 0x04, 0x6e, 0x61, 0x6d, 0x65, 0x02,
  0x08, 0x01, 0x00, 0x01, 0x00, 0x03, 0x6e, 0x75, 0x6d,
]);

const {trailing0} = new WebAssembly.Instance(new WebAssembly.Module(wasmBuf)).exports;

function ctz32(n) {
  if (n === 0) {
    return 32;
  }
  n = n & -n;
  return 31 - Math.clz32(n);
}

const MAX_U32 = 0xffffffff;

function expectEqual(a, b) {
  if (a !== b) {
    throw new Error(`${a} !== ${b}`);
  }
}

function bench(name, f, N = 1e6) {
  console.time(name);
  f(N);
  console.timeEnd(name);
}

for (let i = 0; i < 1e6; i++) {
  expectEqual(trailing0(i), ctz32(i));
  expectEqual(trailing0(MAX_U32 - i), ctz32(MAX_U32 - i));
}

const { platform, arch, release, cpus } = require("os");
console.log("node", process.version, platform(), arch(), release());
console.log(cpus()[0].model);

bench("trailing0", function (N) {
  for (let i = 0; i < N; i++) {
    trailing0(i);
    trailing0(MAX_U32 - i);
  }
});

bench("ctz32", function (N) {
  for (let i = 0; i < N; i++) {
    ctz32(i);
    ctz32(MAX_U32 - i);
  }
});

Giving the output:

node v16.16.0 darwin x64 23.0.0
Intel(R) Core(TM) i9-9980HK CPU @ 2.40GHz
trailing0: 18.292ms
ctz32: 3.429ms

@devongovett
Copy link
Member

ooh nice, I'll run a full end-to-end benchmark but looks promising. I had considered this but I guess I assumed that the wasm instruction would be faster since ideally it would be a single CPU instruction. Maybe there's some overhead though if the engine doesn't inline it.

@jussi-kalliokoski
Copy link
Contributor Author

Yeah, based on the numbers I’ve been seeing I think the current implementations never end up inlining JS->Wasm calls and instead even calls to simple functions end up allocating an execution context, so for single word computations it rarely is worth it, especially in this case as the JS implementation should end up being JIT compiled in the asmjs-like path on V8 (only arithmetic and no compute + assign operators like &=, at least that was the heuristic a few years ago IIRC). An ephemeral micro-optimization for sure, but one that IMO doesn’t compromise on readability meaningfully and gets the job done in current runtimes.

This is my first PR to Parcel btw (about time..), what’s up with the integration tests? Is it expected that they’re failing?

@mischnic
Copy link
Member

mischnic commented Oct 25, 2023

That was fixed recently, so all tests should pass after rebasing That is unfortunately our usual flaky-ness, so don't worry about it

@jussi-kalliokoski
Copy link
Contributor Author

Not sure how to interpret the benchmark results as I'm not familiar with how much variance is expected 🤔

Another question: what's the desired process from here on out, i.e. should I keep the branch up-to-date until reviewed or wait for review and rebase once the changes are up-to-par? :)

@mischnic
Copy link
Member

Don't worry about the CI benchmark action, the actual numbers aren't really reliable.

No need to keep this branch updated/rebased, this PR's commit history will get squashed on merge anyway, so we'll handle that.

Copy link
Member

@devongovett devongovett left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Tested on the React Spectrum website. I didn't see a meaningful overall build time change on that project, but this is definitely easier to read. I think it just isn't the main bottleneck anymore. Could have a bigger effect on an even larger project.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants