Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Set/Object creation performance depends on input data. #5

Open
wants to merge 1 commit into
base: main
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
22 changes: 22 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -39,6 +39,8 @@ but is it really faster than plain old `Object`?

TL;DR **Set is almost two times faster than Object**.

**In the context of `getUniqueElement`, it depends on the input data. Set is faster than Object when duplication ratio is less than 0.3. Otherwise, plain object is faster.**

## Benchmark: Jaccard Similarity

[Jaccard similarity](https://en.wikipedia.org/wiki/Jaccard_index) is a simple
Expand Down Expand Up @@ -99,6 +101,26 @@ Compute jaccard similarity with objects (for in) x 600 ops/sec ±1.46% (82 runs
Set objects are almost two times faster than our old plain Object. The tests
were executed using v8 engine `4.6.85.31`.

## Benchmark: Emulating Set Creation

```
> node creation.js
Huge collision Obj (size: 1000) x 20,350 ops/sec ±1.00% (91 runs sampled)
Huge collision Set (size: 1000) x 12,017 ops/sec ±1.00% (90 runs sampled)
High collision Obj (size: 1000) x 19,266 ops/sec ±1.19% (91 runs sampled)
High collision Set (size: 1000) x 11,005 ops/sec ±0.94% (92 runs sampled)
Fare collision Obj (size: 1000) x 7,669 ops/sec ±0.97% (90 runs sampled)
Fare collision Set (size: 1000) x 10,974 ops/sec ±0.96% (93 runs sampled)
Rare collision Obj (size: 1000) x 2,689 ops/sec ±1.00% (93 runs sampled)
Rare collision Set (size: 1000) x 10,444 ops/sec ±1.42% (93 runs sampled)
Huge collision: 57.8% duplication
High collision: 36.8% duplication
Fare collision: 15% duplication
Rare collision: 4.8% duplication
```

Set objects are faster than old plain Object when a small fraction of duplicated elements are expected. When getting unique elements from an array with a high percentage of duplicates, old plain object is still faster. The tests were executed using node.js v8.4.0.

## Memory consideration

I compared RAM consumption by building 10,000,000 string keys and stored them
Expand Down
79 changes: 79 additions & 0 deletions creation.js
Original file line number Diff line number Diff line change
@@ -0,0 +1,79 @@
var Benchmark = require('benchmark');
var randomAPI = require('ngraph.random');

var suite = new Benchmark.Suite;
var seed = 43;

var setSize = 1000;

var hugeCollisionRate = 0.5; // mathmatically incorrect, but simple.
var highCollisionRate = 1;
var fareCollisionRate = 3;
var rareCollisionRate = 10;

var actualUniqSetHuge = null,
actualUniqSetHigh = null,
actualUniqSetFare = null,
actualUniqSetRare = null;

suite
.add(`Huge collision Obj (size: ${setSize})`, function() {
var rnd = randomAPI.random(seed);
generateSetObjects(setSize, (setSize * hugeCollisionRate | 0), rnd);
})
.add(`Huge collision Set (size: ${setSize})`, function() {
var rnd = randomAPI.random(seed);
actualUniqSetHuge = generateSet(setSize, (setSize * hugeCollisionRate | 0), rnd);
})
.add(`High collision Obj (size: ${setSize})`, function() {
var rnd = randomAPI.random(seed);
generateSetObjects(setSize, (setSize * highCollisionRate | 0), rnd);
})
.add(`High collision Set (size: ${setSize})`, function() {
var rnd = randomAPI.random(seed);
actualUniqSetHigh = generateSet(setSize, (setSize * highCollisionRate | 0), rnd);
})
.add(`Fare collision Obj (size: ${setSize})`, function() {
var rnd = randomAPI.random(seed);
generateSetObjects(setSize, (setSize * fareCollisionRate | 0), rnd);
})
.add(`Fare collision Set (size: ${setSize})`, function() {
var rnd = randomAPI.random(seed);
actualUniqSetFare = generateSet(setSize, (setSize * fareCollisionRate | 0), rnd);
})
.add(`Rare collision Obj (size: ${setSize})`, function() {
var rnd = randomAPI.random(seed);
generateSetObjects(setSize, (setSize * rareCollisionRate | 0), rnd);
})
.add(`Rare collision Set (size: ${setSize})`, function() {
var rnd = randomAPI.random(seed);
actualUniqSetRare = generateSet(setSize, (setSize * rareCollisionRate | 0), rnd);
})
.on('cycle', function(event) {
console.log(String(event.target));
})
.on('complete', function() {
console.log(`Huge collision: ${(setSize - actualUniqSetHuge.size) / setSize * 100}% duplication`);
console.log(`High collision: ${(setSize - actualUniqSetHigh.size) / setSize * 100}% duplication`);
console.log(`Fare collision: ${(setSize - actualUniqSetFare.size) / setSize * 100}% duplication`);
console.log(`Rare collision: ${(setSize - actualUniqSetRare.size) / setSize * 100}% duplication`);
})
.run({ 'async': true });

function generateSet(count, keyRange, rnd) {
var set = new Set();
for (var i = 0; i < count; ++i) {
const key = rnd.next(keyRange);
set.add(key);
}
return set;
}

function generateSetObjects(count, keyRange, rnd) {
var set = {};
for (var i = 0; i < count; ++i) {
const key = rnd.next(keyRange);
set[key] = 1;
}
return set;
}