Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

module: resolve format for all situations with auto module detection on #53044

Closed

Conversation

dygabo
Copy link
Member

@dygabo dygabo commented May 18, 2024

triggered by #53015
solves: #53016

this should be a consistent fix to always resolve the module format correctly.
Enabling module detection by default made a few other tests need some adjustments because in this case they don't generate errors anymore. e.g. test-esm-cjs-exports.js instead of error becasue a .mjs imports a .js with ESM syntax it now successfully imports it and generates the warning that this should be fixed to avoid the performance penalty.

Kindly please review and let me know what you think (if changes are necessary).

make test && make lint => green

Co-authored-by: @GeoffreyBooth

@nodejs/loaders

@nodejs-github-bot
Copy link
Collaborator

Review requested:

  • @nodejs/loaders

@nodejs-github-bot nodejs-github-bot added c++ Issues and PRs that require attention from people who are familiar with C++. esm Issues and PRs related to the ECMAScript Modules implementation. needs-ci PRs that need a full CI run. labels May 18, 2024
@dygabo
Copy link
Member Author

dygabo commented May 18, 2024

if the solution is feasible and finds approval, the --experimental-detect-module could already be removed in this PR. wdyt?

@RedYetiDev
Copy link
Member

RedYetiDev commented May 18, 2024

If you set detect_module to true, do all the tests (except the expected failures) pass?

330984554-a8fbc807-8c45-408c-a62e-3f87cb6ee3b8

@RedYetiDev
Copy link
Member

Also,
🎉 Thank you for tackling this :-)

@RedYetiDev RedYetiDev added the loaders Issues and PRs related to ES module loaders label May 18, 2024
Copy link
Member

@RedYetiDev RedYetiDev left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1

Tip

While my review shows my support, I am not a core collaborator, and this review has no power / place in the approval process

Copy link
Member

@GeoffreyBooth GeoffreyBooth left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great work! Thanks for getting to the bottom of this.

src/node_options.h Outdated Show resolved Hide resolved
test/es-module/test-esm-cjs-exports.js Outdated Show resolved Hide resolved
lib/internal/modules/esm/get_format.js Outdated Show resolved Hide resolved
@dygabo
Copy link
Member Author

dygabo commented May 18, 2024

Great work! Thanks for getting to the bottom of this.

Will check your comments and split it into two PRs, will happen maybe beginning of next week.

@RedYetiDev
Copy link
Member

Great work! Thanks for getting to the bottom of this.

Will check your comments and split it into two PRs, will happen maybe beginning of next week.

Lovely!

@dygabo dygabo force-pushed the fix-for-unflagging-module-format-detection branch 2 times, most recently from 1d7b4de to b7ec307 Compare May 20, 2024 16:42
@dygabo
Copy link
Member Author

dygabo commented May 20, 2024

split done, this fixes the --experimental-detect-module edge cases, no unflagging of the option.
The unflagging is prepared and I can submit a PR after this one lands

test/es-module/test-esm-named-exports-detect-module.js Outdated Show resolved Hide resolved
lib/internal/modules/esm/get_format.js Outdated Show resolved Hide resolved
test/es-module/test-esm-detect-ambiguous.mjs Outdated Show resolved Hide resolved
lib/internal/modules/esm/get_format.js Show resolved Hide resolved
@dygabo dygabo marked this pull request as draft May 21, 2024 12:15
@dygabo dygabo force-pushed the fix-for-unflagging-module-format-detection branch from afb82c8 to b2f221f Compare May 26, 2024 17:35
@dygabo dygabo marked this pull request as ready for review May 27, 2024 12:40
@dygabo dygabo requested a review from GeoffreyBooth May 27, 2024 12:41
@dygabo dygabo force-pushed the fix-for-unflagging-module-format-detection branch from 33d3bed to f4d6382 Compare June 24, 2024 13:25
@mcollina mcollina added the request-ci Add this label to start a Jenkins CI on a PR. label Jun 24, 2024
Copy link
Member

@mcollina mcollina left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

still LGTM

@github-actions github-actions bot removed the request-ci Add this label to start a Jenkins CI on a PR. label Jun 24, 2024
@nodejs-github-bot
Copy link
Collaborator

@GeoffreyBooth GeoffreyBooth force-pushed the fix-for-unflagging-module-format-detection branch from 266c499 to 6445e1a Compare June 25, 2024 05:10
@GeoffreyBooth GeoffreyBooth force-pushed the fix-for-unflagging-module-format-detection branch from 6445e1a to 6132129 Compare June 25, 2024 05:15
@nodejs-github-bot
Copy link
Collaborator

@GeoffreyBooth GeoffreyBooth force-pushed the fix-for-unflagging-module-format-detection branch from 6132129 to 5753afc Compare June 25, 2024 05:18
@GeoffreyBooth GeoffreyBooth force-pushed the fix-for-unflagging-module-format-detection branch from 5753afc to bd945b6 Compare June 25, 2024 05:22
@nodejs-github-bot
Copy link
Collaborator

Local<String> code;
if (args[0]->IsUndefined()) {
CHECK(!filename.IsEmpty());
const char* filename_str = Utf8Value(isolate, filename).out();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The Utf8Value here can’t be temporary, it needs to outlive the const char* or otherwise it is use-after-free.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed

CHECK(!filename.IsEmpty());
const char* filename_str = Utf8Value(isolate, filename).out();
std::string contents;
int result = ReadFileSync(&contents, filename_str);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reading from disk from the native layer here doesn’t seem right - it’s not guaranteed that this is on disk and at a location pointed to by filename (the filename can be some artificial ID). The JS land should be refactored to only call this after source code is confirmed instead.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this comes as part of the trials to cover the module type resolution for the resolve. I would expect if any of that happens (correct me if I'm wrong) that ReadFileSync will return != 0 here and this would cause the early exit with an exception.

Additionally we call this from managed code here. I am guessing that at this stage the fileUrl was already validated but that is at this stage just an assumption. This test made me think that.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The exception would be surprising for a hook - resolve shouldn't be poking into the file system, that should be load's job.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

that is changed now, the comment above is obsolete. The function would return undefined on file access errors. But I am still looking for a way to remove this file access without losing the test.

lib/internal/modules/esm/loader.js Outdated Show resolved Hide resolved
@joyeecheung
Copy link
Member

joyeecheung commented Jun 25, 2024

IMO it doesn’t make sense to eagerly detect the format at resolve, the format should only be confirmed during loading, when the source becomes available. Until then it should be allowed to be undefined. For one it’s not guaranteed that the filename is actually readable from the fs at resolve time. Also for libraries like tsx that uses the defaultResolve to probe possible default entry points, doing the detection at resolve incurs unnecessary overhead on formats that are not recognizable by Node.js. That should be left to default load when the final source code is available.

@dygabo
Copy link
Member Author

dygabo commented Jun 25, 2024

Until then it should be allowed to be undefined

With current version it still is allowed and it might happen that it is undefined (if one of the cases occur where it cannot be determined during resolve, like the typescript case, or file not accessible). The more confusing part is IMO where like with the last test being added, it can be determined but the load changes that in the scenario that transforms ESM to cjs during a load hook execution.
But the current version on this branch covers correctly more cases than before (probably most of the cases except the esm/cjs conversion or the other way around).

@dygabo
Copy link
Member Author

dygabo commented Jun 26, 2024

@joyeecheung I tried today to modfiy this part according to your suggestion to have resolve in detect-module mode not setting anything in format if source is undefined (i.e. not try to read from file contained in url). Generally speaking, eliminating the resolve time "hint" detection everywhere would be a breaking change and it is IMO out of scope for this PR.

With current implementation the hint quality is better than it was before because of the detectModuleType call. There are still cases where it cannot be determined, like mentioned before, but no currently existing tests are affeced by those.

One example: tests like this one are not passing because they are based on loaders like this that expect the format to be determined by resolve. I did not find a quick solution for this. Do you or @GeoffreyBooth, @aduh95 have an idea on how to solve it?

@joyeecheung
Copy link
Member

joyeecheung commented Jun 26, 2024

Generally speaking, eliminating the resolve time "hint" detection everywhere would be a breaking change and it is IMO out of scope for this PR.

Doesn't this PR "improves" the format detection in resolve by trying harder (even though reading from the file system at resolve phase might be the wrong way to try it), and hence make the breaking change necessary for correcting this design flaw even more breaking than it already is? Format detection in resolve should be a best effort (tentative based on file extensions and package.json info) IMO, and we shouldn't go out of our ways to detect the format by reading the file and even parsing it at resolve phase. That leads to unnecessary overhead to the overall design especially if user hooks point to a non-existent URL or a file with unrecognizable format at resolve phase (for better error stack traces) and only override the format detection at load.

@dygabo
Copy link
Member Author

dygabo commented Jun 26, 2024

Format detection in resolve should be a best effort

I agree with this and also that the file read has an impact on the running time. But I am slightly out of ideas on how to solve the one failing test that I mentioned. I will give it one more try, see what solution I can find. Any suggestions on how to solve it are appreciated.

@joyeecheung
Copy link
Member

joyeecheung commented Jun 26, 2024

Is the failing test something like https://github.com/dygabo/node/blob/fix-for-unflagging-module-format-detection/test/fixtures/es-module-loaders/builtin-named-exports-loader.mjs#L31 ? In that case I think it's the test that's making wrong assumptions - the loader hook should not count on the context.format being ready - it could count on the result of nextLoad() containing a valid format, but not in the context that's provided before nextLoad is called. The doc also says that context.format can be undefined in load too.

@joyeecheung
Copy link
Member

joyeecheung commented Jun 26, 2024

In terms of format, I think there are actually two concepts:

  1. Assumed format
  2. Confirmed format

e.g. if it's a .js file not bound by a type field in package.json, it's assumed format is "commonjs" while confirmed format is "undefined" after resolve. Internally, the format that gets passed around is the confirmed format and is also what gets surfaced to user hooks. The problem is that this PR is going along with the direction that mixes the two into one in the load hook, then it's going to lead to detection with too many assumptions.

@GeoffreyBooth
Copy link
Member

e.g. if it’s a .js file not bound by a type field in package.json, it’s assumed format is “commonjs” while confirmed format is “undefined” after resolve.

I’ve been referring to “a .js file not bound by a type field in package.json” as an “ambiguous” file. I think what we have now is that resolve sometimes returns a format hint of commonjs for ambiguous files, whereas with detection enabled it should probably be returning no hint at all; and then we determine the format during load. I don’t think certain cases changing from resolve returning format: 'commonjs' to format: undefined should be considered a breaking change, because per our docs format is optional and hooks shouldn’t rely on it being present or set to a particular value. Do others agree with that? (And the hooks are still experimental, after all.)

Because it does feel like the right approach here is to update the test so that it no longer expects format: 'commonjs' for ambiguous files, and we do the detection during load rather than potentially reading the source twice or reading it too early.

@dygabo
Copy link
Member Author

dygabo commented Jun 27, 2024

Thinking about this a bit more over the past hours, I understand the concerns and am looking into alternative solution to propose.

  • "Module type detection" has somehow performance penalty in the title. Because it's an additional detection that needs to be done, that must cost something, otherwise I would not need it in the first place. We also have a warning in place that tells the user: ${url} parsed as an ES module because module syntax was detected; to avoid the performance penalty of syntax detection, add "type": "module" to ${pjsonPath}
  • additionally, I would expect projects that make such transformations during module loading that will change the module type detected during resolve tend to care less about the loading time, otherwise most of these could be done as a build/packaging step (but that is just an assumption, I'm sure that there are project that might choose that for convenience)
  • if we end up having less reliable "module type detection" during resolve with "module type detection" turned on than we had without "module type detection", seems to me as somehow missed goal of the feature.
  • not being allowed to look into the file even if we have the file URL reduces the real content of this PR to almost zero. The goal was to find a way to minimize the number of failed tests when detect_module is true. If we have to change the tests, it means the enablement of the flag is a breaking change. I am not in the position to implement breaking changes because I have no power of defending these decisions. I think this is breaking as soon as we have to modify the tests, independent of what is written in the documentation. Because probably examples from the node.js tests are copied just as much as official documentation.

I hope this makes sense. Having said that, I will continue to look for alternatives but this might take some time.

@GeoffreyBooth
Copy link
Member

Talking to @dygabo and @joyeecheung, I think we need to explore a solution that doesn’t involve reading from disk during the resolve step. I’m going to mark this as draft for now so that it doesn’t get merged in the meantime.

It could be that we simply change the hinted format returned by resolve when detection is enabled: for ambiguous files, where detection will eventually determine the format, instead of resolve returning commonjs we return undefined, and this is just a change (possibly breaking, depending on your point of view). If enabling detection in general is considered semver-major, because things that used to error would no longer do so, then we can add this to the list of semver-major changes that happen as part of unflagging detection. I think in a detection-enabled world, we would want a hint of commonjs only for unambiguous CommonJS—.cjs extension or "type": "commonjs"—just as I think the module hint is currently only for unambiguous ESM. It would feel wrong otherwise.

I think the next step is to open a PR that unflags detection and updates all the tests accordingly, including making this format change. In that PR we can analyze the tests that needed changing and why, to gauge the scale of the breaking changes. We can mark it as don’t-backport so that it’s released in 23.0.0 only, and after we see the reaction to Node 23 we might consider backporting it if we ultimately decide that the changes aren’t truly semver-major and that the changes to experimental features aren’t too drastic.

@GeoffreyBooth GeoffreyBooth marked this pull request as draft June 28, 2024 00:38
@dygabo
Copy link
Member Author

dygabo commented Aug 1, 2024

superseded and solved by #53619.

@dygabo dygabo closed this Aug 1, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
c++ Issues and PRs that require attention from people who are familiar with C++. esm Issues and PRs related to the ECMAScript Modules implementation. loaders Issues and PRs related to ES module loaders needs-ci PRs that need a full CI run.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

8 participants