perf: optimize lis_algorithm #1438

fkleuver · 2019-01-25T22:03:20Z

Let me start off by saying thank you for providing such an impressive library to the open source community and raising the bar on performance. Inferno is a great source of inspiration for many libraries including Aurelia.

We hope you don't mind that we borrowed your excellent lis_algorithm and its usage for our repeater's keyed mode (which we made sure to acknowledge).

I personally had a lot of fun figuring out how it all worked and tried to make it even faster, even though it is already very fast and a minor portion of the total cpu usage. I'd like to contribute back the end result of that exercise :)

Objective

This PR implements some perf tweaks for the lis_algorithm in patching.ts. There are three aspects to this:

Sharing of a single result array across executions
Usage of TypedArray (this matters less in Chrome than in other browsers, but it still saves V8 the overhead of doing the optimization in the first place)
A minor tweak in averaging the low + high index

Here is a simple benchmark: http://jsben.ch/puCMc (you can modify the cnt (iterations) and len (array size) variables at the bottom of the setup block to tweak the benchmark)

Unfortunately I was unable to fully follow the instructions for contributing. For example, I couldn't find a dev branch and the e2e tests failed to start on my machine.

Expand for a code snippet to verify there are no regressions (can be pasted directly into browser console)

function generateList(length) {
  const used = new Set([0]);
  const result = Array(length);
  let i = 0;
  let cur = 0;
  while (i < length) {
      while (used.has(cur)) {
        cur = (Math.random() * length + 10) | 0;
      }
      used.add(cur);
      result[i] = cur;
      ++i;
  }
  return result;
}

function lis_algorithm_original(arr) {
  const p = arr.slice();
  const result = [0];
  let i;
  let j;
  let u;
  let v;
  let c;
  const len = arr.length;

  for (i = 0; i < len; i++) {
    const arrI = arr[i];

    if (arrI !== 0) {
      j = result[result.length - 1];
      if (arr[j] < arrI) {
        p[i] = j;
        result.push(i);
        continue;
      }

      u = 0;
      v = result.length - 1;

      while (u < v) {
        c = ((u + v) / 2) | 0;
        if (arr[result[c]] < arrI) {
          u = c + 1;
        } else {
          v = c;
        }
      }

      if (arrI < arr[result[u]]) {
        if (u > 0) {
          p[i] = result[u - 1];
        }
        result[u] = i;
      }
    }
  }

  u = result.length;
  v = result[u - 1];

  while (u-- > 0) {
    result[u] = v;
    v = p[v];
  }

  return result;
}


var $result;
var $p;
var $maxLen = 0;

function lis_algorithm_new(arr) {
  let arrI = 0;
  let i = 0;
  let j = 0;
  let k = 0;
  let u = 0;
  let v = 0;
  let c = 0;
  const len = arr.length;
  const TArr = len < 0xFF ? Uint8Array : len < 0xFFFF ? Uint16Array : Uint32Array;
  
  if (len > $maxLen) {
    $maxLen = len;
    $result = new TArr(len);
    $p = new TArr(len);
  }


  for (; i < len; ++i) {
    arrI = arr[i];

    if (arrI !== 0) {
      j = $result[k];
      if (arr[j] < arrI) {
        $p[i] = j;
        $result[++k] = i;
        continue;
      }

      u = 0;
      v = k;

      while (u < v) {
        c = (u + v) >> 1;
        if (arr[$result[c]] < arrI) {
          u = c + 1;
        } else {
          v = c;
        }
      }

      if (arrI < arr[$result[u]]) {
        if (u > 0) {
          $p[i] = $result[u - 1];
        }
        $result[u] = i;
      }
    }
  }

  u = i = k + 1;
  const seq = new TArr(u);
  v = $result[u - 1];

  while (u-- > 0) {
    seq[u] = v;
    v = $p[v];
  }
  while (i-- > 0) $result[i] = 0;
  return seq;
}

for (const c of [1, 2, 3, 5, 10, 25, 500, 1000, 500, 50, 15, 5, 3, 2, 1, 2, 3, 5, 10, 25, 500, 1000, 500, 50, 15, 5, 3, 2, 1]) {
  const input = generateList(c);
  const result1 = lis_algorithm_original(input);
  const result2 = lis_algorithm_new(input);
  if (result1.length !== result2.length) {
    console.error(`result1.length (${result1.length}) !== result2.length (${result2.length})`);
  }
  for (let i = 0, ii = result1.length; i < ii; ++i) {
    if (result1[i] !== result2[i]) {
      console.error(`result1[${i}] (${result1[i]}) !== result2[${i}] (${result2[i]})`);
    }
  }
}

(edit: I pasted the wrong snippet, made it the same as the benchmark and removed the inline console perf measurement which gets skewed if executed like that)

I can add this to your tests if you'd like, I'm not sure where precisely it would fit in your test suites.

Expand for jsbench results for chrome, edge, firefox and ie11

chrome

edge

firefox

ie11

Let me know if you have any concerns or questions about the code.

coveralls · 2019-01-25T22:17:35Z

Coverage increased (+0.02%) to 94.967% when pulling f3db819 on fkleuver:master into fc0afd5 on infernojs:master.

localvoid · 2019-01-26T03:41:13Z

which we made sure to acknowledge.

I wrote this lis-based keyed algo even before Inferno existed [1] :) Inferno originally copy-pasted it from here [2]. And here [3] is the latest version that I am using.

I've tried to use TypedArrays in this algo long time ago and at that time it was definitely slower on microbenchmarks in V8 (haven't tried to use 3 different types), with TypedArrays it would also makes sense to convert input array to TypedArray. Also, is it really worth to use 3 different typed arrays (8/16/32)? Call sites that accessing different typed arrays will be polymorphic.

fkleuver · 2019-01-26T04:59:09Z

[3] is the latest version that I am using.

Cool :)
We've got a somewhat similar approach for tracking added items. I actually needed to deviate for Aurelia because of a different mutation tracking mechanism, but your new approach got me thinking I might be doing some redundant stuff.

it was definitely slower on microbenchmarks in V8

Not sure when this was, but there has been a period where some of these built-ins just weren't optimized in V8 at all. Here's a pretty interesting read on some of those issues not too long ago: https://dzone.com/articles/connecting-the-dots

with TypedArrays it would also makes sense to convert input array to TypedArray.

You could initialize sources as TypedArray right away and not need to convert it. You might even be able to use a shared TypedArray there too.

Call sites that accessing different typed arrays will be polymorphic.

Doh! This explains why performance degraded so much in my console tests when starting at a smaller size. I should just lock it all to Uint32Array. Was trying to minimize memory pressure, but how much are we really talking about here.. couple of kb's in the majority of scenarios.

localvoid · 2019-01-26T05:21:26Z

I should just lock it all to Uint32Array

Not sure, but I think that it would be better to use Int32Array, since it guarantees that all values would fit into SMI on 64bit platforms.

fkleuver · 2019-01-26T05:21:32Z

I went ahead and changed sources to Uint32Array as well. It seems to hold. For lis_algorithm itself it even seems to be a minor perf improvement http://jsben.ch/QXPZJ

edit you beat me to it by a few seconds :) Int32Array seems fine as well. Nothing would ever go over that anyway. I'll adjust it.

thysultan · 2019-01-26T05:34:05Z

Another potential optimisation could be to hoist one typed array that is "virtually" partitioned into pages.

var size = 1000 // start smaller (possibly only grows)
var memory = new TypedArray(size)
var page1 = 0
var page2 = size/2

// ... within the function

if (len > size) {
// grow
memory = new TypedArray((page2 = len) * 2)
} else if (size > len * 2) {
// possibly downsize
}

Reading/writing from/to result/seq would start from an offset of the pages offset.

This relying on the heuristics that most apps normally have a single unchanging ceiling of average number of nodes within a a single subtree namespace.

localvoid · 2019-01-26T05:41:46Z

In my opinion it isn't worth to overcomplicate lis algo by hoisting and reusing arrays, it is hardly noticeable in the profiler when rearranging DOM nodes. DOM is the bottleneck :)

fkleuver · 2019-01-26T05:47:12Z

It is a cool idea though and it could possibly cut time in half once more. But indeed I bet it's <1% of total already with DOM in the mix.

localvoid · 2019-01-26T07:24:31Z

There is a small mistake in the benchmark (sort() and reverse() mutates arrays in-place):

for (let i = 0; i < cnt; ++i) {
	inputs[i] = generateList(len);
	inputsSorted[i] = inputs[i].sort();
	inputsReversed[i] = inputs[i].sort().reverse();
}

Should be:

for (let i = 0; i < cnt; ++i) {
	inputs[i] = generateList(len);
	inputsSorted[i] = inputs[i].slice().sort();
	inputsReversed[i] = inputs[i].slice().sort().reverse();
}

fkleuver · 2019-01-26T14:48:26Z

Good call! http://jsben.ch/MoAdv

Havunen · 2019-01-27T22:36:38Z

Hi @fkleuver thanks for this awesome PR!

I remember I tested typed arrays when I converted infernos LIS implementation to use 0's instead of -1 for non existent vNode detection. That time typed arrays didn't perform very well but maybe this has changed.

All the tests pass so lets merge this! :-)

brandonseydel · 2019-01-28T18:10:31Z

Let's see if Vue will steal this updated algo without credit....to be continued....

perf: optimize lis_algorithm

f37bffc

perf: only use Int32Array (for sources as well)

f3db819

fkleuver force-pushed the master branch from fb5a972 to f3db819 Compare January 26, 2019 05:25

Havunen merged commit f801efd into infernojs:master Jan 27, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

perf: optimize lis_algorithm #1438

perf: optimize lis_algorithm #1438

fkleuver commented Jan 25, 2019 •

edited

Loading

coveralls commented Jan 25, 2019 •

edited

Loading

localvoid commented Jan 26, 2019 •

edited

Loading

fkleuver commented Jan 26, 2019

localvoid commented Jan 26, 2019

fkleuver commented Jan 26, 2019 •

edited

Loading

thysultan commented Jan 26, 2019

localvoid commented Jan 26, 2019

fkleuver commented Jan 26, 2019

localvoid commented Jan 26, 2019

fkleuver commented Jan 26, 2019

Havunen commented Jan 27, 2019

brandonseydel commented Jan 28, 2019

perf: optimize lis_algorithm #1438

perf: optimize lis_algorithm #1438

Conversation

fkleuver commented Jan 25, 2019 • edited Loading

coveralls commented Jan 25, 2019 • edited Loading

localvoid commented Jan 26, 2019 • edited Loading

fkleuver commented Jan 26, 2019

localvoid commented Jan 26, 2019

fkleuver commented Jan 26, 2019 • edited Loading

thysultan commented Jan 26, 2019

localvoid commented Jan 26, 2019

fkleuver commented Jan 26, 2019

localvoid commented Jan 26, 2019

fkleuver commented Jan 26, 2019

Havunen commented Jan 27, 2019

brandonseydel commented Jan 28, 2019

fkleuver commented Jan 25, 2019 •

edited

Loading

coveralls commented Jan 25, 2019 •

edited

Loading

localvoid commented Jan 26, 2019 •

edited

Loading

fkleuver commented Jan 26, 2019 •

edited

Loading