Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

perf: optimize lis_algorithm #1438

Merged
merged 2 commits into from
Jan 27, 2019
Merged

perf: optimize lis_algorithm #1438

merged 2 commits into from
Jan 27, 2019

Conversation

fkleuver
Copy link
Contributor

@fkleuver fkleuver commented Jan 25, 2019

Let me start off by saying thank you for providing such an impressive library to the open source community and raising the bar on performance. Inferno is a great source of inspiration for many libraries including Aurelia.

We hope you don't mind that we borrowed your excellent lis_algorithm and its usage for our repeater's keyed mode (which we made sure to acknowledge).

I personally had a lot of fun figuring out how it all worked and tried to make it even faster, even though it is already very fast and a minor portion of the total cpu usage. I'd like to contribute back the end result of that exercise :)

Objective

This PR implements some perf tweaks for the lis_algorithm in patching.ts. There are three aspects to this:

  • Sharing of a single result array across executions
  • Usage of TypedArray (this matters less in Chrome than in other browsers, but it still saves V8 the overhead of doing the optimization in the first place)
  • A minor tweak in averaging the low + high index

Here is a simple benchmark: http://jsben.ch/puCMc (you can modify the cnt (iterations) and len (array size) variables at the bottom of the setup block to tweak the benchmark)

Unfortunately I was unable to fully follow the instructions for contributing. For example, I couldn't find a dev branch and the e2e tests failed to start on my machine.

Expand for a code snippet to verify there are no regressions (can be pasted directly into browser console)
function generateList(length) {
  const used = new Set([0]);
  const result = Array(length);
  let i = 0;
  let cur = 0;
  while (i < length) {
      while (used.has(cur)) {
        cur = (Math.random() * length + 10) | 0;
      }
      used.add(cur);
      result[i] = cur;
      ++i;
  }
  return result;
}

function lis_algorithm_original(arr) {
  const p = arr.slice();
  const result = [0];
  let i;
  let j;
  let u;
  let v;
  let c;
  const len = arr.length;

  for (i = 0; i < len; i++) {
    const arrI = arr[i];

    if (arrI !== 0) {
      j = result[result.length - 1];
      if (arr[j] < arrI) {
        p[i] = j;
        result.push(i);
        continue;
      }

      u = 0;
      v = result.length - 1;

      while (u < v) {
        c = ((u + v) / 2) | 0;
        if (arr[result[c]] < arrI) {
          u = c + 1;
        } else {
          v = c;
        }
      }

      if (arrI < arr[result[u]]) {
        if (u > 0) {
          p[i] = result[u - 1];
        }
        result[u] = i;
      }
    }
  }

  u = result.length;
  v = result[u - 1];

  while (u-- > 0) {
    result[u] = v;
    v = p[v];
  }

  return result;
}


var $result;
var $p;
var $maxLen = 0;

function lis_algorithm_new(arr) {
  let arrI = 0;
  let i = 0;
  let j = 0;
  let k = 0;
  let u = 0;
  let v = 0;
  let c = 0;
  const len = arr.length;
  const TArr = len < 0xFF ? Uint8Array : len < 0xFFFF ? Uint16Array : Uint32Array;
  
  if (len > $maxLen) {
    $maxLen = len;
    $result = new TArr(len);
    $p = new TArr(len);
  }


  for (; i < len; ++i) {
    arrI = arr[i];

    if (arrI !== 0) {
      j = $result[k];
      if (arr[j] < arrI) {
        $p[i] = j;
        $result[++k] = i;
        continue;
      }

      u = 0;
      v = k;

      while (u < v) {
        c = (u + v) >> 1;
        if (arr[$result[c]] < arrI) {
          u = c + 1;
        } else {
          v = c;
        }
      }

      if (arrI < arr[$result[u]]) {
        if (u > 0) {
          $p[i] = $result[u - 1];
        }
        $result[u] = i;
      }
    }
  }

  u = i = k + 1;
  const seq = new TArr(u);
  v = $result[u - 1];

  while (u-- > 0) {
    seq[u] = v;
    v = $p[v];
  }
  while (i-- > 0) $result[i] = 0;
  return seq;
}

for (const c of [1, 2, 3, 5, 10, 25, 500, 1000, 500, 50, 15, 5, 3, 2, 1, 2, 3, 5, 10, 25, 500, 1000, 500, 50, 15, 5, 3, 2, 1]) {
  const input = generateList(c);
  const result1 = lis_algorithm_original(input);
  const result2 = lis_algorithm_new(input);
  if (result1.length !== result2.length) {
    console.error(`result1.length (${result1.length}) !== result2.length (${result2.length})`);
  }
  for (let i = 0, ii = result1.length; i < ii; ++i) {
    if (result1[i] !== result2[i]) {
      console.error(`result1[${i}] (${result1[i]}) !== result2[${i}] (${result2[i]})`);
    }
  }
}

(edit: I pasted the wrong snippet, made it the same as the benchmark and removed the inline console perf measurement which gets skewed if executed like that)

I can add this to your tests if you'd like, I'm not sure where precisely it would fit in your test suites.


Expand for jsbench results for chrome, edge, firefox and ie11

chrome
image

edge
image

firefox
image

ie11
image


Let me know if you have any concerns or questions about the code.

@coveralls
Copy link
Collaborator

coveralls commented Jan 25, 2019

Coverage Status

Coverage increased (+0.02%) to 94.967% when pulling f3db819 on fkleuver:master into fc0afd5 on infernojs:master.

@localvoid
Copy link

localvoid commented Jan 26, 2019

which we made sure to acknowledge.

I wrote this lis-based keyed algo even before Inferno existed [1] :) Inferno originally copy-pasted it from here [2]. And here [3] is the latest version that I am using.

I've tried to use TypedArrays in this algo long time ago and at that time it was definitely slower on microbenchmarks in V8 (haven't tried to use 3 different types), with TypedArrays it would also makes sense to convert input array to TypedArray. Also, is it really worth to use 3 different typed arrays (8/16/32)? Call sites that accessing different typed arrays will be polymorphic.

  1. https://github.com/localvoid/vdom/blob/master/lib/src/utils/container.dart#L411
  2. https://github.com/localvoid/kivi/blob/master/lib/vnode.ts#L1283
  3. https://github.com/localvoid/ivi/blob/f96afea34e86c541963e4f5dec737c18f6fb3610/packages/ivi/src/vdom/reconciler.ts#L656

@fkleuver
Copy link
Contributor Author

[3] is the latest version that I am using.

Cool :)
We've got a somewhat similar approach for tracking added items. I actually needed to deviate for Aurelia because of a different mutation tracking mechanism, but your new approach got me thinking I might be doing some redundant stuff.

it was definitely slower on microbenchmarks in V8

Not sure when this was, but there has been a period where some of these built-ins just weren't optimized in V8 at all. Here's a pretty interesting read on some of those issues not too long ago: https://dzone.com/articles/connecting-the-dots

with TypedArrays it would also makes sense to convert input array to TypedArray.

You could initialize sources as TypedArray right away and not need to convert it. You might even be able to use a shared TypedArray there too.

Call sites that accessing different typed arrays will be polymorphic.

Doh! This explains why performance degraded so much in my console tests when starting at a smaller size. I should just lock it all to Uint32Array. Was trying to minimize memory pressure, but how much are we really talking about here.. couple of kb's in the majority of scenarios.

@localvoid
Copy link

I should just lock it all to Uint32Array

Not sure, but I think that it would be better to use Int32Array, since it guarantees that all values would fit into SMI on 64bit platforms.

@fkleuver
Copy link
Contributor Author

fkleuver commented Jan 26, 2019

I went ahead and changed sources to Uint32Array as well. It seems to hold. For lis_algorithm itself it even seems to be a minor perf improvement http://jsben.ch/QXPZJ

edit you beat me to it by a few seconds :) Int32Array seems fine as well. Nothing would ever go over that anyway. I'll adjust it.

@thysultan
Copy link
Contributor

Another potential optimisation could be to hoist one typed array that is "virtually" partitioned into pages.

var size = 1000 // start smaller (possibly only grows)
var memory = new TypedArray(size)
var page1 = 0
var page2 = size/2

// ... within the function

if (len > size) {
// grow
memory = new TypedArray((page2 = len) * 2)
} else if (size > len * 2) {
// possibly downsize
}

Reading/writing from/to result/seq would start from an offset of the pages offset.

This relying on the heuristics that most apps normally have a single unchanging ceiling of average number of nodes within a a single subtree namespace.

@localvoid
Copy link

In my opinion it isn't worth to overcomplicate lis algo by hoisting and reusing arrays, it is hardly noticeable in the profiler when rearranging DOM nodes. DOM is the bottleneck :)

@fkleuver
Copy link
Contributor Author

It is a cool idea though and it could possibly cut time in half once more. But indeed I bet it's <1% of total already with DOM in the mix.

@localvoid
Copy link

There is a small mistake in the benchmark (sort() and reverse() mutates arrays in-place):

for (let i = 0; i < cnt; ++i) {
	inputs[i] = generateList(len);
	inputsSorted[i] = inputs[i].sort();
	inputsReversed[i] = inputs[i].sort().reverse();
}

Should be:

for (let i = 0; i < cnt; ++i) {
	inputs[i] = generateList(len);
	inputsSorted[i] = inputs[i].slice().sort();
	inputsReversed[i] = inputs[i].slice().sort().reverse();
}

@fkleuver
Copy link
Contributor Author

Good call! http://jsben.ch/MoAdv

@Havunen
Copy link
Member

Havunen commented Jan 27, 2019

Hi @fkleuver thanks for this awesome PR!

I remember I tested typed arrays when I converted infernos LIS implementation to use 0's instead of -1 for non existent vNode detection. That time typed arrays didn't perform very well but maybe this has changed.

All the tests pass so lets merge this! :-)

@Havunen Havunen merged commit f801efd into infernojs:master Jan 27, 2019
@brandonseydel
Copy link

Let's see if Vue will steal this updated algo without credit....to be continued....

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

Successfully merging this pull request may close these issues.

6 participants