Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve DX of importing ESM-compiled-to-CJS from native ESM #50981

Open
nicolo-ribaudo opened this issue Nov 30, 2023 · 25 comments
Open

Improve DX of importing ESM-compiled-to-CJS from native ESM #50981

nicolo-ribaudo opened this issue Nov 30, 2023 · 25 comments
Labels
esm Issues and PRs related to the ECMAScript Modules implementation. feature request Issues that request new features to be added to Node.js. module Issues and PRs related to the module subsystem. never-stale Mark issue so that it is never considered stale

Comments

@nicolo-ribaudo
Copy link
Contributor

nicolo-ribaudo commented Nov 30, 2023

What is the problem this feature will solve?

The current ESM-CJS interop has two problems when it comes to files authored as ESM and then compiled to CJS

Problem 1

Due to how ESM works, Node.js needs to statically detect the list of bindings exported form CJS modules. This static analysis is performed by cjs-module-lexer, which recognizes patterns produced by popular tools at the time that this helper library was authored.

This has two main limitations:

In addition to these limitations, cjs-module-lexer has to perform a full lexing pass on the imported CJS file, which has a non-zero cost.

Problem 2

The default export exposed by Node.js for CJS files is always module.exports. This was a good choice at the beginning, to give a way of importing CJS from ESM, but now that Node.js detects named exports it's increasing friction when using the two modules systems together.

Consider this library, authored as ESM and published as CJS:

// as ESM
export default function circ(radius) {
  return radius * PI;
}
export const PI = 3.14;
// compiled to CJS and published
exports.__esModule = true;
exports.default = function circ(radius) {
  return radius * PI;
};
const PI = exports.PI = 3.14;

When importing this library from ESM, the named exports PI "just works":

import { PI } from "lib";

console.log(PI);

However, to use the default export you need to first import the library and then, separately, grab the default export from it:

import _lib from "lib";
const circ = _lib.default;

console.log(circ);

Developers sometimes try to rewrite it to use the "destructuring import" syntax with default as if it was a named import, but it obviously still points to module.exports:

import { default as circ } from "lib";

Unfortunately Node.js cannot change it's behavior and use exports.default as the default export (when exports.__esModule is defined to signal that the CJS file was originally an ES module, as every single other tool does) for backwards compatibility.

What is the feature you are proposing to solve the problem?

Node.js should have some sort of comment/directive at the beginning of the file to let build tools communicate what is the list of exports that a file should expose.

For example, the library example above could be compiled by tools to the following:

"exports:default,PI";

exports.__esModule = true;
exports.default = function circ(radius) {
  return radius * PI;
};
const PI = exports.PI = 3.14;

Then Node.js would know that when imported as ESM this file should have a default export (pointing to exports.default) and a PI export (pointing to exports.PI). Tools would be free to change their output code, and the "exports:..." directive would give an unambiguous and easy-to-parse signal to Node.js about the original intention of the code author.

From a tool author perspective, I have a few opinion about how this should work more in details:

  • "exports:foo,bar"; should cause the module to only have foo and bar exports, and not a default export pointing to module.exports. This is so that adding a default export to the module will not be a breaking change.
  • the list of exports need a separator (I'm using , just because it looks nice), but , could also appear in an export name. For this reason, all \s and ,s need to be escaped:
    // input
    const foo = 3;
    export { foo as "abc \\ , def", foo };
    
    // CJS output
    "exports:abc \\\\ \\, def,foo"
    const foo = exports["abc \\ ,"] = exports.foo = 3;
  • it would be great if there was still a way to say "expose module.exports as the default export", so that tools could start emitting the new directive without breaking changes (and then they would stop enabling that flag when the compiled library is ready for a major release). For example:
    "exports:abc,def"
    // means
    export const abc = exports.abc;
    export const def = exports.def;
    "exports!:abc,def"
    // means
    export const abc = exports.abc;
    export const def = exports.def;
    export default module.exports;

Additionally, I noticed that cjs-module-exports also returns a list of modules re-exported with export * from. This is necessary for Node.js to get the list of re-exported bindings. We would need to have a separate directive for that. For example:

export const foo = 1, bar = 2;
export * from "./dep.js";
export * from "mod";

// becomes

"exports:foo,bar";
"exports*:./dep.js,mod";

What alternatives have you considered?

Instead of a directive this could be a comment, like for linking source maps. However, it is important that this comment/directive must be at the beginning of the file, so that Node.js doesn't need to scan/parse the whole file to find it.

Somebody was already thinking about this in the past (I vaguely remember a discussion about it in a Babel issue with a Node.js collaborator), but I cannot find any references to it.


Some people that might be interested in the discussion:
@guybedford, @lukastaegert (Rollup), @kdy1 (SWC), @TheLarkInn (Webpack), @patak-dev (Vite), @evanw (esbuild), @andrewbranch (TypeScript) me (Babel)

@nicolo-ribaudo nicolo-ribaudo added the feature request Issues that request new features to be added to Node.js. label Nov 30, 2023
@dnalborczyk
Copy link
Contributor

Somebody was already thinking about this in the past (I vaguely remember a discussion about it in a Babel issue with a Node.js collaborator), but I cannot find any references to it.

@nicolo-ribaudo is this possibly what you are looking for?

babel/babel#7294
babel/babel#6455

@lukastaegert
Copy link

I really like this.

If we want to make it a directive, though, it should probably start with "use ..." and be limited to the first lines in a module, similar to other directives like "use strict" or "use asm" (acorn has special handling for directives as long as they are only separated by white-space and comments from the module start).

@GeoffreyBooth
Copy link
Member

@nodejs/loaders

@GeoffreyBooth GeoffreyBooth added module Issues and PRs related to the module subsystem. esm Issues and PRs related to the ECMAScript Modules implementation. labels Dec 1, 2023
@ljharb
Copy link
Member

ljharb commented Dec 1, 2023

This is interesting - i assume i could also use it in normal CJS to ensure that the ESM-exposed exports are what i intend?

@nicolo-ribaudo
Copy link
Contributor Author

@dnalborczyk Yes, thanks for the links! It looks like one of the main problems was that reading the pragma was slower than just doing export default module.exports;. We are not in the opposite situation, where Node.js is by default doing much more work than just "reading a pragma" for CJS imports :)


@lukastaegert

and be limited to the first lines in a module [...] (acorn has special handling for directives as long as they are only separated by white-space and comments from the module start)

Yes, that's exactly the definition of a directive :) I'm not attached to the specific wording, but I proposed it without use just because exports: is probably already unambiguous enough.


@ljharb Yes, obviously there would be no way for Node.js to distinguish between a hand-written file and a tool-generated file. However, I would expect it to be much easier to have this directive auto-generated by tools than having to manually keep it in sync with your exports.

@Jarred-Sumner
Copy link

I don’t have an opinion on the specifics of how this should work but I like the idea of an explicit way for build tools to specify this behavior. Bun uses __esModule and this is good and bad. It’s bad for node compatibility. It’s good for “I want this code to work”.

@aduh95
Copy link
Contributor

aduh95 commented Dec 1, 2023

Quoting cjs-module-lexer README:

Detection patterns for this project are frozen. This is because adding any new export detection patterns would result in fragmented backwards-compatibility. Specifically, it would be very difficult to figure out why an ES module named export for CommonJS might work in newer Node.js versions but not older versions. This problem would only be discovered downstream of module authors, with the fix for module authors being to then have to understand which patterns in this project provide full backwards-compatibily. Rather, by fully freezing the detected patterns, if it works in any Node.js version it will work in any other. Build tools can also reliably treat the supported syntax for this project as a part of their output target for ensuring syntax support.

I think this concerns are still valid, if we made a change on how we parse CJS exports, it could land on Node.js v18.20.0 and not any further, which would certainly cause confusion in the ecosystem. I'm not saying this is an absolute show stopper, but we need to address this problem and come up with a plan to avoid ecosystem fragmentation.

@nicolo-ribaudo
Copy link
Contributor Author

@aduh95 The main problem with changing the detection patterns in cjs-module-lexer is that they are very subtle changes that affect code that is expected to behave identically.

For example, you might have this code that works both in Node.js 16 and Node.js 18:

Object.defineProperty(exports, "foo", {
  get() { return 2 }
});

let's say you decide to rewrite it to

Object.defineProperty(exports, "foo", {
  get: () => 2
});

and you are lucky because Node.js 18 updated their detection patterns to also detect arrow functions in getters: it keeps working, and given that the old and new code are expected to have the same behavior there is nothing that tells you "hey, you are using a new feature that doesn't work in older versions, be careful".

This new marker would be explicit, and not based on which patterns you follow when writing new code, and thus it's discoverability (and incompatibility risk) is similar to any other new feature that does not throw an error when used in older Node.js versions. Think, for example, about introducing new event names, new parameters to existing functions, or new properties to option bags.

In addition, a directive that can explicitly describe the expected shape of the module would never need to be updated to detect different patterns (unless ESM gets new features, in which case the directive can be updated at the same time as when introducing those new features in Node.js). Thus, there won't be need for a "this detection is frozen even though it doesn't support X" anymore just because there won't be a need to change it anymore.

@guybedford
Copy link
Contributor

guybedford commented Dec 1, 2023

Note that es-build already does this hinting itself, but it hints to cjs-module-lexer directly.

We even have a test for this pattern in cjs-module-lexer here - https://github.com/nodejs/cjs-module-lexer/blob/main/test/_unit.js#L19:

0 && (module.exports = {a, b, c}) && __exportStar(require('fs'));

Effectively this provides an existing way to achieve the same result as what is asked for in this PR.

I've experimented with string systems a lot in the past for metadata, and I'm all for it fwiw.

@GeoffreyBooth
Copy link
Member

Effectively this provides an existing way to achieve the same result as what is asked for in this PR.

With one big benefit, in that it works in maintained all versions of Node (and probably several EOL ones, as far back as whenever this was added). Though there’s no reason not to include both the new directive and the 0 && style. That’s what I would expect most bundlers would do, to target as many old versions of Node as possible.

The proposed directive has two relative benefits that I can see:

  1. It’s more human-readable than the 0 && approach, and feels more intentional/standardized as opposed to looking hacky.

  2. The proposal instructs Node to return early as soon as the directive is parsed, which should yield performance benefits for large files.

With regard to the second point, about the relative performance improvements:

  • We should measure to see just how much gain we’re talking about here. This could be approximated by creating a set of test files where the 0 && version is in the first line, and measure parsing time in current Node and in a version of Node where cjs-module-lexer stops after the first line.

  • The “return early” aspect introduces potential incorrectness or inconsistencies. For example, what if the directive lists one set of exports but either a 0 && line or the actual source of the file provide a different set of exports? If we stop parsing as soon as the directive is found, we might end up with exports that don’t actually exist in the file if the directive was written incorrectly (or maliciously). This also introduces the possibility of different exports in different versions of Node.

Does the proposed directive allow defining a module as having no default export? That would introduce the possibility of a user importing the default export in current Node, and then after a Node upgrade their code stops working because the new directive is read instead and it says that a particular module has no default export. This also goes against the error message you get when you try to import a named export that cjs-module-lexer didn’t find, where we provide example code that shows importing the default export and destructuring off of it.

If Node needs to parse the entire file to ensure correctness and consistency, then there wouldn’t be any performance benefit and what remains is the human-readability improvement. But if bundlers output both this new directive and the 0 && one anyway, then Node doesn’t really need to parse the new directive because it already supports the 0 && version; and bundlers are free to output an unused directive if they choose, like a comment or metadata that might be useful to other tools.

@ljharb
Copy link
Member

ljharb commented Dec 1, 2023

This also introduces the possibility of different exports in different versions of Node.

note that this is already somewhat possible by using various "exports" field formats, but this would certainly make it easier.

@nicolo-ribaudo
Copy link
Contributor Author

@guybedford

Effectively this provides an existing way to achieve the same result as what is asked for in this PR.

No, that hinting provides no way to make this code work when lib is compiled to CommonJS:

// app.js
import one, { two } from "lib";
console.log(one, two);
// lib
export default 1;
export const two = 2;

One of the two goals of this proposal, as described in the original post, is to provide an opt-in way of making the ESM-CJS interop work with default exports.


@GeoffreyBooth

With regard to the second point, about the relative performance improvements:

cjs-module-lexer already includes benchmarks, I could re-run them with support for the directive and share the data I get.

If we stop parsing as soon as the directive is found, we might end up with exports that don’t actually exist in the file if the directive was written incorrectly (or maliciously).

This is already the case, in current Node.js. If you have this CJS module:

exports.foo = 2;

function f(exports) { exports.bar = 3 }

then Node.js will expose foo and bar exports, with the bar export being undefined (because module.exports.bar is undefined)

Does the proposed directive allow defining a module as having no default export? That would introduce the possibility of a user importing the default export in current Node, and then after a Node upgrade their code stops working because the new directive is read instead and it says that a particular module has no default export.

Yes, as explained one of the goals of this proposal if to allow having the default import be exports.default, and to make it forward-compatible to introduce a default export it provides a mode to have no default export.

Generating a directive that disables the export default module.exports is a breaking change for tools (which means that most tools would enable it behind an option for the time being), and enabling that mode is a breaking change for libraries. It would be good for Node.js to also only start accepting by default this directive in a major version, to make sure that the default behavior can only change due to a major upgrade of something.

For the error message, we could update to only suggest that when using the pattern-matching detection and not the directive-based one :)

@dnalborczyk
Copy link
Contributor

dnalborczyk commented Dec 1, 2023

If we stop parsing as soon as the directive is found, we might end up with exports that don’t actually exist in the file if the directive was written incorrectly (or maliciously).

This is already the case, in current Node.js. If you have this CJS module:

exports.foo = 2;

function f(exports) { exports.bar = 3 }

@nicolo-ribaudo the way I understand the comment is in case the exports don't have any symmetry, e.g.

"exports:default,PI,foo";

exports.__esModule = true;
exports.default = function circ(radius) {
  return radius * PI;
};
const PI = exports.PI = 3.14;

but I guess foo would be just undefined in this case?

@GeoffreyBooth
Copy link
Member

@nicolo-ribaudo the way I understand the comment is in case the exports don’t have any symmetry, e.g.

Yes, this was my intent. And I guess the case of “directive has exports that don’t exist” is less problematic than the inverse. If the directive defines foo that doesn’t exist, sure, it can be undefined. But if the directive lacks foo but foo is exported, then we have the situation where a version of Node that’s reading the directive doesn’t see foo but older versions of Node potentially do see foo, unless Node parses the full file looking for exports regardless of whether or not it found a directive.

@nicolo-ribaudo
Copy link
Contributor Author

nicolo-ribaudo commented Dec 1, 2023

Oh I see. I think that case is very similar to a CommonJS file that is being detected as exporting foo, bar and baz adding an ESM entry point (through package.json#exports) that only exports foo and bar. This directive would basically be an inline way equivalent to writing a separate ESM file manually re-exporting values form the CommonJS version. When package.json#exports was introduced, we didn't have too many problems with people using it in newer Node.js version surprised that it wasn't working in older ones :)

I would expect this directive to be introduced by tools, and as such it would always match the expected list of exports. In case where it's manually written and somebody forgets to add a value, it's similar to any other accidental bug that prevents some code from working in some Node.js versions.

@andrewbranch
Copy link

As a way to help identify named exports that cjs-module-lexer can’t find, this sounds great.

Regarding the way it affects default import binding, my primary concern for TypeScript is it provides another esoteric way that a .d.ts file can fail to represent a .js file.

Suppose this proposal lands, and TypeScript updates its own CommonJS emit to enable this directive by default. We would also update our declaration file emit to contain the very same directive, however it ends up looking:

// dist/index.d.cts
"exports:default,PI";

declare function circ(radius: number): number;
export default circ;
export const PI: 3.14;

Now, when someone imports this library from an ES module under settings for Node.js, we would see the directive and know to type their usage accordingly:

import circ from "geometry-utils";
//     ^?
// function circ(radius: number): number
//
// (not: { default: function circ(radius: number): number })

So this is no problem at all for libraries that generate their JS and type declarations with tsc. But there are lots of libraries that use tsc to emit declarations and some other tool to emit JavaScript—not to mention the ~9k libraries that are typed by hand on DefinitelyTyped, or the non-tsc declaration emitters we may see pop up after --isolatedDeclarations lands.

The problem becomes what to do when we see the same declaration file without the directive. Can we be confident that the lack of a directive in the declaration file indicates a lack of directive in the JavaScript file? In practice, no. At least for a while, this will be just as likely to mean that somebody used a new version of Rollup and an old version of tsc, or the DefinitelyTyped typings haven’t yet been updated to include a directive that exists upstream, or any other way a library author with a TypeScript build pipeline more complicated than tsc --declaration can subtly mess it up.

While I think it’s really regrettable that transpiled CJS libraries behave one way in Node.js and another everywhere else, I’m not sure that further fragmentation of interop strategies is going to be a net positive.

@guybedford
Copy link
Contributor

guybedford commented Dec 2, 2023

@nicolo-ribaudo your proposal would be breaking the invariant that a CJS module is always represented by the default export. import x from 'cjs' <=> x = require('x') <=> x = module.exports kind of thing.

We didn't support __esModule in Node.js by default because CJS exports objects can not exhaustively be represented by namespaces - they are objects that can be proxies, have getters and setters, have dynamic or computed properties, etc etc a whole bunch of edge cases that we couldn't assume at the time to perfectly align with static export analysis, due to differences from the dynamic runtime exports population expected by bundlers (and as you have pointed out, the static analysis can both under and over classify).

I'm sympathetic to the approach and solution, I just wish you'd have brought this up 5 years ago, or been involved in the discussions then. Because what we have now is a fairly strong reliable invariant at this point and to change that is to introduce a new edge case, that will have new interop concerns like the ones @andrewbranch brings up, throughout the ecosystem.

My question specifically is how we would deal with the following:

  • Any new Node.js major with this new interop will provide divergent behaviour with previous versions of Node.js, so that adopting the new syntax will be a compat footgun from the start.
  • All tooling in the ecosystem will have to adapt to this new syntax.

Any tool that doesn't, or any old application that doesn't will just break. I don't see how this can be done in a way that will not cause more confusion for users.

We worked very hard in the ESM integration in Node.js to ensure that for the most part, if code works in modern versions, it will work back to older versions (apart from the one cjs-module-lexer change you described).

For Babel at least, I'd really hope it can finally move to transpiling the case of

export default 'x';

into:

module.exports = 'x';

This way, it would be possible to mark mixing named and default exports as a linter style warning, and use one or the other pattern. That Babel still uses __esModule for default-only module exports causes unnecessary friction in the single default export case.

@Jack-Works
Copy link

this is interesting but you still need a hint for export * from 'x' (unless you have cross-file knowledge)

Copy link
Contributor

There has been no activity on this feature request for 5 months. To help maintain relevant open issues, please add the never-stale Mark issue so that it is never considered stale label or close this issue if it should be closed. If not, the issue will be automatically closed 6 months after the last non-automated comment.
For more information on how the project manages feature requests, please consult the feature request management document.

@github-actions github-actions bot added the stale label May 31, 2024
Copy link
Contributor

There has been no activity on this feature request and it is being closed. If you feel closing this issue is not the right thing to do, please leave a comment.

For more information on how the project manages feature requests, please consult the feature request management document.

@github-actions github-actions bot closed this as not planned Won't fix, can't repro, duplicate, stale Jun 30, 2024
@legendecas legendecas reopened this Jul 2, 2024
@legendecas legendecas added never-stale Mark issue so that it is never considered stale and removed stale labels Jul 2, 2024
@guybedford
Copy link
Contributor

Perhaps we could define a package.json field for this feature:

{
  "cjsNamedExports": {
    "./path/to/module.cjs": ["exportName"]
  }
}

@nicolo-ribaudo would be interested to hear your thoughts further here.

@joyeecheung
Copy link
Member

joyeecheung commented Jul 23, 2024

A counterpart of problem 2 was being discussed in #53848 (comment) and I wonder if we should introduce a marker for both transpiled ESM and real ESM to indicate the intent for Node.js to unwrap a single default export when the module is loaded natively by require(esm) or import cjs.

Suppose the module is authored as:

// dist/mod.mjs
export default { foo: 'bar' };
export const unwrapSingleDefaultExport = true;

Which gets transpiled to:

// dist/mod.cjs
exports.default =  { foo: 'bar' };
exports.unwrapSingleDefaultExport = true;
exports.__esModule = true;

For users using the native loaders:

// We should probably consider unwrapping for `require(cjs)` in the presence of unwrapSingleDefaultExport too,
// so that all of them are the same.
require('./dist/mod.cjs');  // { default: { foo: 'bar' }, unwrapSingleDefaultExport: true, __esModule: true }
require('./dist/mod.mjs');  // { foo: 'bar' }

import mod from './dist/mod.cjs';  // { foo: 'bar' }
import mod2 from './dist/mod.mjs';   // { foo: 'bar' }

import * as mod3 from './dist/mod.cjs';  // { default: { foo: 'bar' }, unwrapSingleDefaultExport: true, __esModule: true }
import * as mod4 from './dist/mod.mjs';  // { default: { foo: 'bar' }, unwrapSingleDefaultExport: true }

For users transpiling ESM to CJS code:

import mod from './dist/mod.cjs'; 
import mod2 from './dist/mod.mjs'; 

Would be:

function wrapper(e) { return e && e.__esModule ? e : { default: e }; }
const mod = wrapper(required('./dist/mod.cjs')).default;  // { foo: 'bar' }
const mod2 = wrapper(required('./dist/mod.mjs')).default;  // { foo: 'bar' }

The configuration of unwrapSingleDefaultExport (can be other names) can be done either by the library author or by the transpiler (not by default, but optional). If the author is using an older version of the transpiler, they can just export it themselves, otherwise they can configure the transpiler to emit this for them. For ESM -> CJS transpiled packages with a single default export, this would be a breaking change for the library - but on the other hand they can also ship native ESM for newer versions of Node.js with this unwrapping, and have a unified experience for import cjs/require(esm)/import esm/require(cjs) across different Node.js versions. Otherwise shipping native ESM may be a more breaking change for them already due to not being able to unwrap the single default export for require(esm).

@joyeecheung
Copy link
Member

joyeecheung commented Jul 23, 2024

Actually, unwrapSingleDefaultExport should always be declared by the author, and there is no need for the transpiler to get involved other than faithfully converting that as exports.unwrapSingleDefaultExport = true. This is probably fine as the library author is usually in control of how they want their single default export to be returned, no matter it's a in-file declaration, or a transpiler config, or a pakcage.json field - though the in-file declaration would probably be the most performant and simplest option. If we could go back in time I think Node.js should've made unwrapSingleDefaultExport the default for ESM (skip when there are other named exports) and always implied by the presence of __esModule, but as OP said, ship has sailed due to backward compat, so this now needs to be opt-in by the library author.

@guybedford
Copy link
Contributor

This issue is specifically about importing ESM compiled to CJS in ESM. I think if we combine this with the issue of CJS requiring ESM unwrapping we could over-constrain the problem space here.

Specifically it's not clear what unwrapSingleDefaultExport does for an import statement - since an import * as ns from 'x' cannot bind ns to anything other than an ES module namespace without breaking the ecma-262 spec.

I definitely agree that reconsidering a flag for the CJS unwrapping of ESM on its own merit makes a good companion discussion to this issue, and I've posted a new issue to discuss this explicitly for require(esm) in #54085.

@joyeecheung
Copy link
Member

joyeecheung commented Jul 29, 2024

Specifically it's not clear what unwrapSingleDefaultExport does for an import statement - since an import * as ns from 'x' cannot bind ns to anything other than an ES module namespace without breaking the ecma-262 spec.

The name already makes it clear that it unwraps the default export, not the star export. So with import * as ns from 'x', ns.default is module.exports.default in the transpiled CJS module (that is, export default in the original ESM) if the marker is set. Previously, export default in the original ESM i.e. module.exports.default in the transpiled CJS can only be ns.default.default which is what the problem is IIUC. The ESM importer have to do two unwrapping to get the thing orignally authored as export default if it comes from transpiled ESM, whereas to import real ESM they only need to do one, creating a confusing disparity.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
esm Issues and PRs related to the ECMAScript Modules implementation. feature request Issues that request new features to be added to Node.js. module Issues and PRs related to the module subsystem. never-stale Mark issue so that it is never considered stale
Projects
None yet
Development

No branches or pull requests