-
Notifications
You must be signed in to change notification settings - Fork 492
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
perf: improving performance #528
Conversation
@@ -190,6 +190,35 @@ class Range { | |||
return false | |||
} | |||
} | |||
|
|||
function buildMemoKeyFromOptions(options) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You may also want to try a handwritten key such as:
`${options.includePrerelease ? 1 : 0},${options.loose ? 1 : 0}...
Which is what we do in the TypeScript compiler. Not sure if it's faster than here but could be interesting to test.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It doesn't hurt the performance:
buildMemoKeyFromOptions({"includePrelease":true}) x 1,121,468,719 ops/sec ±0.63% (96 runs sampled)
buildMemoKeyFromOptions({"includePrelease":true,"loose":true}) x 1,132,354,133 ops/sec ±0.67% (95 runs sampled)
buildMemoKeyFromOptions({"includePrelease":true,"loose":true,"rtl":true}) x 1,122,788,698 ops/sec ±0.53% (96 runs sampled)
After 1B ops/s, it almost has no effect choosing one or another.
The only drawback is increasing the lru-cache key by 6 characters instead of having it as 1.
If NPM team prefer readability over a tiny cache key, I can change the implementation.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is actually a good use case for a bit flag. It's a bunch of booleans, and bitwise operations are super fast. And while bit ops are often frowned upon as being dense or obscure, if organized properly, they're much more maintainable than a series of if/else
statements and hoping that you captured each case.
let s = 1
const FLAG_includePrerelease = s
s<<=1
const FLAG_loose = s
s<<=1
const FLAG_rtl = s
s<<=1
// ...
function buildMemoKeyFromOptions(options) {
// now just bitwise-OR them all together as appropriate
return (
options.loose ? FLAG_loose : 0
) | (
options.rtl ? FLAG_rtl : 0
) | (
options.includePrerelease ? FLAG_includePrerelease : 0
)
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think the suggestion is to change Options, but to use a different way of constructing a key.
That being said, I think it should be possible to internally store flags for these and then recustruct the options bag when asked on whichever methods that request it. Then, internally, things can use the fast check and pass number
around. Spitballing, though.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, I realized a few seconds after writing that, and yes, using the option internally and then exposing makes sense, I'll try using the getter to compute the options and internally using just the bit flag.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thinking further, the user already has to pass the object in, so you probably can just keep that one and not reconstruct it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@jakebailey I tried, but it would break a lot of tests if I did.
I put the version of parseOptions with bit flags here: h4ad-forks@6c7a68a
@isaacs Take a look and see if this PR is worth pushing, I made a lot of changes.
Also, I introduce 1 small breaking change, now the numbers passed to parseOptions
are considered valid options, so instead of re-parsing it by passing the object, I use many of the options as bit flags to construct new objects that are inside the functions.
About performance, no performance penalties and I assume it got a bit faster because I now parse the options as an object once and then just parse the options again as a number which saves some comparisons.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The only downside I notice when using bitmask is the memory usage (ref: #528 (comment)) is the same as allocating the objects at runtime (8.7mb).
I don't know why this behavior is happening, but I just want to point out that it's not as memory efficient as Object.freeze.
684d866
to
8a20c28
Compare
const var1 = Object.freeze({ includePrerelease: true, loose: true, rtl: true }); | ||
const var2 = Object.freeze({ includePrerelease: true, loose: true }); | ||
const var3 = Object.freeze({ includePrerelease: true, rtl: true }); | ||
const var4 = Object.freeze({ includePrerelease: true }); | ||
const var5 = Object.freeze({ loose: true, rtl: true }); | ||
const var6 = Object.freeze({ loose: true }); | ||
const var7 = Object.freeze({ rtl: true }); | ||
const emptyOpts = Object.freeze({}); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
typically freezing objects makes them much slower, so it's worth benchmarking this.
Also, all of these should be __proto__: null
so lookups can be faster.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1 on trying out Object.create(null)
, but I wonder if trying out making this a Map
, or maybe even using Bitmasks could speed it up further?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, freezing has never changed performance in anything I've ever tried, unfortuantely.
Honestly, I think that the way forward in optimizing things is not going to be down to tricks like freezing or looking for the fastest LRU cache; it's going to be overall algorithmic changes to the code.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The idea of freezing is just to prevent modification in the object, also, is not to gain performance but to save memory allocation.
With the older version, I did a test that ended with 8.7mb of memory usage and with this freeze version, it was consistent at 6.7mb.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think @ljharb 's point was more about using Object.freeze
(the fn itself) makes the code slower, not faster (because the freezing itself is an operation).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@kurtextrem like #528 (comment), if we do it inline it can be slower, so as not to break the tests, I think it's better not to include __proto__: null
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It would be fine to change the tests here, to be clear. I'm very surprised doing it inline is slower - in which engines is that true?
Note that we're not just concerned with node here, this package is used in browsers as well.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@ljharb The benchmark site I've used (from the other thread) doesn't work for me in Firefox, so I can only say it's definitely much slower in Chromium, Safari and Node (that caught my be surprise too, I assume __proto__
has a special treatment during creation if not done with Object.create(null)
, thus making it slower than the standard object). It's even slower than Object.freeze/seal
.
Just from my microbenchmark, I'd stick to a regular object creation (without freezing), but if we can benchmark all browser vendors + node and see that Object.freeze
is not slower than without, we can as well pick up the small memory reduction.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@kurtextrem The main problem is if we were returning the object but since we run it once, the op/s is not a problem.
It will be a problem if it slows down the property access but I don't think that is the case.
const emptyOpts = Object.freeze({}); | ||
|
||
const parseOptions = options => { | ||
if (!options) return emptyOpts; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
there's no way this is faster than return {}
, have you benchmarked this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why shouldn't it? object allocation (and cleanup later) isn't free
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, the idea is to reduce memory allocation.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Have you benchmarked it? There are a great many optimizations people attempt to make in JS that are counterintuitively slower than the idiomatic approach.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@ljharb Yes,
benchmark-memory.js
const satisfies = require('./functions/satisfies');
(() => {
const versions = ['1.0.3', '2.2.2', '2.3.0'];
const versionToCompare = '>=1.0.2';
const option1 = { includePrelease: true };
const option2 = { includePrelease: true, loose: true };
const option3 = { includePrelease: true, loose: true, rtl: true };
for (let i = 0; i < 1e5; i++) {
for (const version of versions) {
satisfies(versionToCompare, version);
satisfies(versionToCompare, version, option1);
satisfies(versionToCompare, version, option2);
satisfies(versionToCompare, version, option3);
}
}
})();
Run the benchmark with the following command: node --trace-gc --trace-deopt benchmark-memory.js
.
- Without object freeze, this is the output: without-freeze-object.txt
- With object: with-freeze-object.txt
The difference shows up at the end of the text file, without object freeze with ended up with 8.7mb
of memory, and with object freeze, we ended up with 6.7mb
.
About the pure performance of satisfies
, I didn't notice any drawback:
Without Object.freeze
:
satisfies(1.0.6, 1.0.3||^2.0.0) x 420,051 ops/sec ±1.14% (94 runs sampled)
satisfies(1.0.6, 2.2.2||~3.0.0) x 429,208 ops/sec ±0.59% (94 runs sampled)
satisfies(1.0.6, 2.3.0||<4.0.0) x 459,359 ops/sec ±0.67% (88 runs sampled)
satisfies(1.0.6, 1.0.3||^2.0.0, {"includePrelease":true}) x 409,670 ops/sec ±1.02% (94 runs sampled)
satisfies(1.0.6, 2.2.2||~3.0.0, {"includePrelease":true}) x 428,150 ops/sec ±0.91% (94 runs sampled)
satisfies(1.0.6, 2.3.0||<4.0.0, {"includePrelease":true}) x 464,502 ops/sec ±0.78% (97 runs sampled)
satisfies(1.0.6, 1.0.3||^2.0.0, {"includePrelease":true,"loose":true}) x 406,053 ops/sec ±0.83% (96 runs sampled)
satisfies(1.0.6, 2.2.2||~3.0.0, {"includePrelease":true,"loose":true}) x 429,494 ops/sec ±0.58% (96 runs sampled)
satisfies(1.0.6, 2.3.0||<4.0.0, {"includePrelease":true,"loose":true}) x 433,164 ops/sec ±1.11% (95 runs sampled)
satisfies(1.0.6, 1.0.3||^2.0.0, {"includePrelease":true,"loose":true,"rtl":true}) x 385,354 ops/sec ±0.87% (95 runs sampled)
satisfies(1.0.6, 2.2.2||~3.0.0, {"includePrelease":true,"loose":true,"rtl":true}) x 404,200 ops/sec ±0.72% (94 runs sampled)
satisfies(1.0.6, 2.3.0||<4.0.0, {"includePrelease":true,"loose":true,"rtl":true}) x 434,975 ops/sec ±1.02% (97 runs sampled)
With Object.freeze
satisfies(1.0.6, 1.0.3||^2.0.0) x 419,615 ops/sec ±1.38% (95 runs sampled)
satisfies(1.0.6, 2.2.2||~3.0.0) x 438,583 ops/sec ±0.48% (96 runs sampled)
satisfies(1.0.6, 2.3.0||<4.0.0) x 447,261 ops/sec ±0.92% (93 runs sampled)
satisfies(1.0.6, 1.0.3||^2.0.0, {"includePrelease":true}) x 415,861 ops/sec ±1.10% (95 runs sampled)
satisfies(1.0.6, 2.2.2||~3.0.0, {"includePrelease":true}) x 433,068 ops/sec ±0.98% (96 runs sampled)
satisfies(1.0.6, 2.3.0||<4.0.0, {"includePrelease":true}) x 461,636 ops/sec ±0.92% (92 runs sampled)
satisfies(1.0.6, 1.0.3||^2.0.0, {"includePrelease":true,"loose":true}) x 412,220 ops/sec ±0.06% (96 runs sampled)
satisfies(1.0.6, 2.2.2||~3.0.0, {"includePrelease":true,"loose":true}) x 428,966 ops/sec ±0.65% (97 runs sampled)
satisfies(1.0.6, 2.3.0||<4.0.0, {"includePrelease":true,"loose":true}) x 448,587 ops/sec ±0.29% (95 runs sampled)
satisfies(1.0.6, 1.0.3||^2.0.0, {"includePrelease":true,"loose":true,"rtl":true}) x 388,662 ops/sec ±0.48% (94 runs sampled)
satisfies(1.0.6, 2.2.2||~3.0.0, {"includePrelease":true,"loose":true,"rtl":true}) x 412,242 ops/sec ±0.16% (93 runs sampled)
satisfies(1.0.6, 2.3.0||<4.0.0, {"includePrelease":true,"loose":true,"rtl":true}) x 432,172 ops/sec ±1.03% (96 runs sampled)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@ljharb Like @kurtextrem mentioned in the other comment, inline static null proto is slower and we ended up with 8.7mb.
Code:
const parseOptions = options => {
if (!options) return Object.create(null);
if (typeof options !== 'object') return { __proto__: null, loose: true };
const opts = Object.create(null);
if (options.includePrerelease) opts.includePrerelease = true;
if (options.loose) opts.loose = true;
if (options.rtl) opts.rtl = true;
return opts;
};
Perf:
satisfies(1.0.6, 1.0.3||^2.0.0) x 341,924 ops/sec ±0.42% (93 runs sampled)
satisfies(1.0.6, 2.2.2||~3.0.0) x 357,034 ops/sec ±0.13% (97 runs sampled)
satisfies(1.0.6, 2.3.0||<4.0.0) x 384,200 ops/sec ±0.54% (92 runs sampled)
satisfies(1.0.6, 1.0.3||^2.0.0, {"includePrelease":true}) x 328,873 ops/sec ±0.98% (96 runs sampled)
satisfies(1.0.6, 2.2.2||~3.0.0, {"includePrelease":true}) x 345,029 ops/sec ±0.67% (96 runs sampled)
satisfies(1.0.6, 2.3.0||<4.0.0, {"includePrelease":true}) x 367,659 ops/sec ±0.37% (95 runs sampled)
satisfies(1.0.6, 1.0.3||^2.0.0, {"includePrelease":true,"loose":true}) x 316,462 ops/sec ±0.43% (94 runs sampled)
satisfies(1.0.6, 2.2.2||~3.0.0, {"includePrelease":true,"loose":true}) x 323,678 ops/sec ±0.90% (93 runs sampled)
satisfies(1.0.6, 2.3.0||<4.0.0, {"includePrelease":true,"loose":true}) x 339,436 ops/sec ±0.94% (96 runs sampled)
satisfies(1.0.6, 1.0.3||^2.0.0, {"includePrelease":true,"loose":true,"rtl":true}) x 287,875 ops/sec ±4.03% (88 runs sampled)
satisfies(1.0.6, 2.2.2||~3.0.0, {"includePrelease":true,"loose":true,"rtl":true}) x 314,779 ops/sec ±1.12% (94 runs sampled)
satisfies(1.0.6, 2.3.0||<4.0.0, {"includePrelease":true,"loose":true,"rtl":true}) x 332,332 ops/sec ±0.25% (96 runs sampled)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for clarifying. What i meant was, adding in to your static reference frozen objects a proto:null inline to those declarations.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
https://esbench.com/bench/64271fec6c89f600a57022e8 as mentioned, does not appear to be faster to initialize
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
hm, i can’t seem to run the benchmarks on iOS Safari. Have they been run on every browser, not just chrome? what about in node, where perf differs from chrome at times?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@ljharb About freeze with proto null:
satisfies(1.0.6, 1.0.3||^2.0.0) x 21,856 ops/sec ±0.60% (90 runs sampled)
satisfies(1.0.6, 2.2.2||~3.0.0) x 21,716 ops/sec ±0.79% (95 runs sampled)
satisfies(1.0.6, 2.3.0||<4.0.0) x 21,643 ops/sec ±0.76% (95 runs sampled)
satisfies(1.0.6, 1.0.3||^2.0.0, {"includePrelease":true}) x 21,216 ops/sec ±0.25% (93 runs sampled)
satisfies(1.0.6, 2.2.2||~3.0.0, {"includePrelease":true}) x 21,171 ops/sec ±0.41% (86 runs sampled)
satisfies(1.0.6, 2.3.0||<4.0.0, {"includePrelease":true}) x 21,113 ops/sec ±0.38% (92 runs sampled)
satisfies(1.0.6, 1.0.3||^2.0.0, {"includePrelease":true,"loose":true}) x 310,400 ops/sec ±4.69% (95 runs sampled)
satisfies(1.0.6, 2.2.2||~3.0.0, {"includePrelease":true,"loose":true}) x 361,617 ops/sec ±0.58% (89 runs sampled)
satisfies(1.0.6, 2.3.0||<4.0.0, {"includePrelease":true,"loose":true}) x 381,054 ops/sec ±0.07% (97 runs sampled)
satisfies(1.0.6, 1.0.3||^2.0.0, {"includePrelease":true,"loose":true,"rtl":true}) x 335,386 ops/sec ±0.44% (96 runs sampled)
satisfies(1.0.6, 2.2.2||~3.0.0, {"includePrelease":true,"loose":true,"rtl":true}) x 350,006 ops/sec ±0.62% (97 runs sampled)
satisfies(1.0.6, 2.3.0||<4.0.0, {"includePrelease":true,"loose":true,"rtl":true}) x 370,139 ops/sec ±0.09% (94 runs sampled)
Soooo slow, I don't even know why.
Just so it doesn't get lost, someone from Tencent wrote this one: https://github.com/Zzzen/fast-semver (https://twitter.com/callmetsing/status/1641810237404614660), which is much faster (original semver is 77% slower). |
I mentioned this on the Twitter thread, but given it's "only" 4x faster, I really think that an optimized version can be written in JS and be very close. It will just take some benchmarks and profiling to get to the bottom of things. Things like #458 and so on are related here. Given all of the users downstream appear to usually compare strings, that's the worst case path in the current implementation; it allocates loads of temporary objects to perform the comparisons and then they are thrown away. yarn berry already caches these objects to try and help with this, and I'm considering sending a PR to pnpm to do something similar as it results in a 33% speedup for the DT monorepo. If there were a fast path for string-ly comparison, that might end up being good, though that may be challenging. But, it's pretty straightfoward to fuzz test two implementations for differences. |
I tried to find more optimizations, the general ideas are:
Any of these ideas will take more time than I currently have, so I'll stop for now and do some more research another time. @jakebailey Can you do mockey-patching on semver used by pnpm and see if some of these optimizations helped? Both in terms of memory and speed. |
I have a PR I'm going to send later today to pnpm which simply caches But, I can try and pull this branch in and see if anything changes. |
Alright, so on pnpm main, this PR actually appears to be quite good! Comparing to pnpm/pnpm#6336:
Let me try the combo of both. |
Of course, it's not as fast as blindly cachinng the world, but that's not the best idea either. |
Also, if you're looking to improve performance, and relying on caching in hot paths, probably a good idea to upgrade to the latest version of lru-cache. I haven't tried it in this particular use case, but that might pretty dramatically cut down the gc overhead, since v6 of lru-cache creates a ton of tiny objects that live just long enough to fall out of the short-lived gc generation. |
My testing in general is pointing to the recreation of semver/range objects as being "the thing" that is making things slow down. I had tried using an LRU cache, but found it to not help because the repeated comparisons happened so far apart that their entries ended up being evicted. Caching the Range object (forever) is what pnpm and yarn now do, as those objects are seemingly the worst overall, but there are thankfully not enough ranges in the entire program usually for this to cause a problem. I'm working on a recreation of the API in a way that tries to avoid recreation as much as possible, but I think any short term improvements like making parseOptions faster are definitely a good idea. |
Sorry that I missed this, catching up now. @H4ad has done some very helpful work over on ssri already and I'm working through the last of them now. Thanks to all of you who are helping comment on this PR and are doing other performance testing/improvement. I would humbly make the same suggestion here that I made there, that the PRs be broken up logically by optimization type or code area for easier isolation of reviews. By all means continue the discussion here if you want to solve problems or come up w/ other solutions before then. |
@wraithgar PRs created, I think we had a really good conversion on this PR about the performance improvements, I'm going to close this PR now in favor of the other two, and I'm going to ask you to put all the new suggestions in the other PR, I've tried to link almost all of the solutions and describe all the benefits and drawbacks of the selected solution in the new PRs, then I think we can start a new conversion there about any other optimization or design decision. |
In case of someone wants to push more perf optimizations in this release, I found two interesting things. Those objects are created every time this method is called but the return is a boolean flag, so we could simply cache those values. Lines 76 to 92 in f4fa069
Above, a lot of comparisons could be early returned when node-semver/classes/comparator.js Lines 100 to 125 in f4fa069
I will probably investigate those changes as soon I finish the two PRs opened now, but in case someone wants to contribute, I'm happy to help with a coding review. |
If someone does end up looking at the |
I saw this tweet by @jakebailey, describing the time spent just parsing the options.
parsing-options.js
So I did a simple refactoring removing two loops from the parsing options, before:
After:
benchmark.js
range.js
Inside range.js, I saw this piece of code:
node-semver/classes/range.js
Line 84 in da08e01
Object.keys
is not the fastest method, so I create a function to simply return a key depending of the options.Before:
After:
benchmark.js
When we compare the performance improvements on
satisfies
function:Before:
After:
benchmark.js