Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bugfix] Ensure stability of filename cache-keys #909

Merged
merged 5 commits into from
Jul 12, 2024

Conversation

stefanpenner
Copy link
Member

@stefanpenner stefanpenner commented Jun 9, 2021

[Bugfix] Ensure stability of filename cache-keys

JSON.stringify(structure) isn’t inherently stable as it relies on various
internal details of how structure was created.

As written, if a given babel configuration is create in an dynamic manner,
it is possible for babel-loader to have spurious cache misses.

To address this, we can use one of the many stable stringify alternatives.

For this PR I have selected fast-json-stable-stringify
for that task, as it appears both popular and it’s benchmarks look promising.

This PR does not explicitly include tests, as testing this is both tricky
to test in this context, and the important tests are contained within
fast-json-stable-stringify itself.

Please Read the CONTRIBUTING Guidelines
In particular the portion on Commit Message Formatting

Please check if the PR fulfills these requirements

  • Tests for the changes have been added (for bug fixes / features)
  • Docs have been added / updated (for bug fixes / features)

What kind of change does this PR introduce? (Bug fix, feature, docs update, ...)

  • Bugfix
  • Breaking change: cache filenames are updated
  • Feature
  • Code style update (formatting, local variables)
  • Refactoring (no functional changes, no api changes)
  • Build related changes
  • CI related changes
  • Other... Please describe:

What is the current behavior? (You can also link to an open issue here)

cache keys are unstable, causing occasional cache misses

What is the new behavior?

cache keys should be stable

Does this PR introduce a breaking change?

  • Yes
  • No
  • Impact: improved cache hit frequency
  • Migration path for existing applications: none
  • Github Issue(s) this is regarding: none

@nicolo-ribaudo
Copy link
Member

nicolo-ribaudo commented Jun 9, 2021

Thanks! Is it possible that the CI failure is related to this change?

@stefanpenner
Copy link
Member Author

stefanpenner commented Jun 10, 2021

@nicolo-ribaudo failure is occurring locally too, I will investigate & address (most likely later tonight, or tomorrow).

@stefanpenner
Copy link
Member Author

@nicolo-ribaudo quick-update I'll do some additional debugging this afternoon, but my initial debugging pass suggests this test failure may actually be an example of an erroneous cache miss now corrected with this change. I'm not yet 100% convinced, but intend to continue investigation this afternoon.

@@ -376,7 +376,7 @@ test.cb("should allow to specify the .babelrc file", t => {

fs.readdir(t.context.cacheDirectory, (err, files) => {
t.is(err, null);
t.true(files.length === 2);
t.true(files.length === 1);
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This changes because the cached files now have an identical key.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well, it's good that we can actually see the improvement caused by this change 😄

@stefanpenner
Copy link
Member Author

stefanpenner commented Jun 10, 2021

@nicolo-ribaudo I believe in-fact the test failures where an example of an unstable cache key.

Copy link
Member

@nicolo-ribaudo nicolo-ribaudo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks!

@nicolo-ribaudo nicolo-ribaudo requested a review from JLHwung June 10, 2021 21:24
@stefanpenner
Copy link
Member Author

@hzoo / @nicolo-ribaudo what are the next steps? Anything I can help with?

@JLHwung
Copy link
Contributor

JLHwung commented Jun 11, 2021

fast-json-stable-stringify is a new dependency. I am auditing the source of fast-json-stable-stringify. The serialization is applied on every compile file so performance is crucial here.

I clone the repo and here the benchmark result. It seems that fast-stable-stringify is fastest.

fast-json-stable-stringify x 21,781 ops/sec ±1.09% (87 runs sampled)
json-stable-stringify x 16,813 ops/sec ±0.92% (92 runs sampled)
fast-stable-stringify x 25,183 ops/sec ±0.74% (93 runs sampled)
faster-stable-stringify x 20,960 ops/sec ±1.21% (93 runs sampled)

@stefanpenner
Copy link
Member Author

@JLHwung I can switch to that 1, give me a sec.

@JLHwung
Copy link
Contributor

JLHwung commented Jun 11, 2021

fast-stable-stringify seems not actively maintained: https://github.com/nickyout/fast-stable-stringify I lean to host our own fast-stable-stringify for babel libraries only.

@nicolo-ribaudo
Copy link
Member

There haven't been new commits, but there are no reported bugs. It could be "maintianed, but working".

We should only self-host it if we find a bug that we need to fix, and the maintainer doesn't accept PRs.

@stefanpenner
Copy link
Member Author

I agree with @nicolo-ribaudo's POV.

@JLHwung
Copy link
Contributor

JLHwung commented Jun 11, 2021

Designing general stable JSON.stringify is more difficult than designing specific stable serializer for Babel options. Yes I use the term serializer because we don't have to generate a valid JSON. After all we don't read the serialized results and parse them back. We want to make sure for most different options, the hash is different. So we can perform better than most stable JSON stringify. For example,

  1. We can stringify any object key v with "\"" + v + "\"".
  2. If value is a string, we use it without escaping " and \, they are required by JSON but we don't have to output json.

Combining these two tricks I see 150% improvement (named as fast-stable-stringify-2) on the benchmark and the new serializer is now 20% slower than the native JSON.stringify:

fast-json-stable-stringify x 22,088 ops/sec ±1.16% (87 runs sampled)
fast-stable-stringify x 23,703 ops/sec ±1.59% (85 runs sampled)
fast-stable-stringify-2 x 61,288 ops/sec ±1.91% (83 runs sampled)
Native JSON.stringify x 69,951 ops/sec ±1.15% (93 runs sampled)
The fastest is Native JSON.stringify

The serializer could be faster than the native JSON.stringify because we don't scan strings. We just make sure every object can map to a string.

I have pushed the branch in https://github.com/JLHwung/fast-stable-stringify/tree/babel

@stefanpenner
Copy link
Member Author

@JLHwung additional performance here sounds great.

@stefanpenner
Copy link
Member Author

@JLHwung et.al what are the next steps?

@JLHwung
Copy link
Contributor

JLHwung commented Jun 14, 2021

My personal opinion here: Since we don't have to use JSON stringify, I think we can host our own property stable config serializer. Now that the cache key is updated and the old cache should probably be invalidated, I think we should inform users on the next release note that we recommend running rm -rf ./node_modules/.cache/babel-loader before upgrading babel-loader.

@stefanpenner
Copy link
Member Author

@JLHwung let me know when your variant of stringification is published, and I can include it here...

@JLHwung
Copy link
Contributor

JLHwung commented Jun 14, 2021

@stefanpenner We don't have to publish a new package, since it is not JSON and meant to be used in babel-loader only, we can inline it in this repo.

@hzoo
Copy link
Member

hzoo commented Jun 15, 2021

I'm down with in-lining our changes as in JLHwung/fast-stable-stringify@f5b0ca9 in the repo

@stefanpenner
Copy link
Member Author

ok

@JLHwung JLHwung requested a review from liuxingbaoyu July 11, 2024 17:51
@JLHwung
Copy link
Contributor

JLHwung commented Jul 11, 2024

Sorry for the super late reply. I have rebased this PR and switched to our in-house config serializer. Because this PR will change the cache identification, babel-loader can not reused cache generated by previous versions, therefore I marked this PR as a breaking change.

stefanpenner and others added 3 commits July 11, 2024 14:02
`JSON.stringify(structure)` isn’t inherently stable as it relies on
various internal details of how `structure` was created.

As written, if a given babel configuration is create in an dynamic
manner, it is possible for babel-loader to have spurious cache misses.

To address this, we can use one of the many stable stringify
alternatives.

For this PR I have selected
[fast-stable-stringify](https://www.npmjs.com/package/fast-stable-stringify)
for that task, as it appears both popular and it’s benchmarks look
  promising.

This PR does not explicitly include tests, as testing this is both
tricky to test in this context, and the important tests are contained
within fast-stable-stringify itself.
The options will be serialized in the cache#filename function with the cache identifier, so we don't have to include options in the cache identifier.
@JLHwung JLHwung merged commit 3c63b4e into babel:main Jul 12, 2024
20 checks passed
@stefanpenner
Copy link
Member Author

Nice, great to see this land!

@JLHwung JLHwung mentioned this pull request Sep 4, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants