Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

module: Private internal exports subpaths #33780

Closed

Conversation

guybedford
Copy link
Contributor

@guybedford guybedford commented Jun 7, 2020

This PR defines private exports subpaths to be those exports starting with ./#, and then throws an ERR_PRIVATE_PACKAGE_PATH when attempting to resolve these exports from outside of the package, so that they can only be resolved via package self-resolution.

A package that defines these mappings today, for example:

{
  "exports": {
    ".": "./main.js",
    "./#internal": "./internal.js"
  }
}

will already work correctly under the current exports implementation in Node.js 12 such that pkg/#internal can be imported and required from outside of the package and from within the package using package own name self-resolution.

This PR only adds a new validation phase to these resolutions that throws the error when importing from outside of the package, thereby creating a new private encapsulation layer. This keeps the implementation simple and backwards compatible as opposed to defining a whole new type of private resolution.

Motivation

Package exports provide a number of features both to the external API of the package imported by consumers of the package (import 'pkg/x') and also to the internal resolutions used by the package author themselves when using self-resolution (import 'pkg/x' from within modules of the package itself).

The issue is that any time a package author might want to take advantage of exports features using own-name self-resolution such as for internal directory mappings, avoiding the need for file extensions, or providing internal conditional resolutions, by using the "exports" field they must make these modules public by default.

For these use cases, it can be useful to allow package authors to have private internal resolutions that only apply for internal self-resolution but aren't defined as part of the public exports interface of the package.

The # symbol has been chosen as its meaning matches that of private fields in classes.

This effectively then also provides a solution to the browser field "internal file mappings" use case for the exports field, allowing packages to swap out internal modules based on environment conditions, thus fully replacing the functionality of the browser field for arbitrary condition handling in exports.

Prior art:

Checklist
  • make -j4 test (UNIX), or vcbuild test (Windows) passes
  • tests and/or benchmarks are included
  • documentation is changed or added
  • commit message follows commit guidelines

@nodejs-github-bot nodejs-github-bot added errors Issues and PRs related to JavaScript errors originated in Node.js core. esm Issues and PRs related to the ECMAScript Modules implementation. labels Jun 7, 2020
@guybedford
Copy link
Contributor Author

//cc @nodejs/modules-active-members

@ljharb
Copy link
Member

ljharb commented Jun 7, 2020

We still haven't discussed this in the modules group; it continues to feel rushed when the work is done on a PR before the semantics have been discussed. Perhaps this could be marked as a draft until that happens?

@GeoffreyBooth
Copy link
Member

GeoffreyBooth commented Jun 7, 2020

feel rushed when the work is done on a PR

The PR is only 58 lines added, 10 removed. That's almost trivial. I think we can review both the code and the design here; on the design side it's also not that big of a change.

@bmeck
Copy link
Member

bmeck commented Jun 7, 2020

Could we get a test around import of import "pkg#foo" since that is a technically different URL from pkg/#foo.

@sokra
Copy link
Contributor

sokra commented Jun 7, 2020

Giving my 2 cents here for inspiration on the discussion:

Even while there seem like no technical issues with using # as symbol, I want to note that # in URLs and ESM is usually used for URL fragments.

Regarding dx ergonomics I dislike that the developer has to repeat the package name everytime they use a private reference. This makes the source code package name dependent.

It feels a bit off to me that private things also use the exports field, which is intended to expose the public api of a package.

As alternative I propose using an "imports" field and use import "@/some/private/thing" as syntax (without the own package name).

"imports": {
  "@/some/private/thing": {
    "condition": "./x"
  }
}

@DerekNonGeneric
Copy link
Contributor

I want to note that # in URLs and ESM is usually used for URL fragments.

Specifiers are URLs, so that is a problem!

@GeoffreyBooth
Copy link
Member

Specifiers are URLs, so that is a problem!

Yes and no. Relative and absolute specifiers are URLs, but “bare” specifiers ('pkg') and specifiers that start with bare specifiers ('pkg/something') are not URLs, I think. @jkrems?

@jkrems
Copy link
Contributor

jkrems commented Jun 8, 2020

Correct. Specifiers are generally not URLs. They are specifiers. Specifiers that start with specific characters (e.g. “./“) or can be parsed as valid URL strings will resolve “trivially” to URLs. But the specifiers themselves aren’t URLs.

E.g. bare specifiers are allowed to have characters that wouldn’t be valid in URLs. They just need to go away during resolution to a module URL.

@DerekNonGeneric
Copy link
Contributor

DerekNonGeneric commented Jun 8, 2020

I thought we established that specifiers can be URLs towards the bottom of #49448.

  • file://, node://, data://

Additionally, when web browsers resolve a module specifier, the first thing they do is …

  1. Apply the URL parser to specifier. If the result is not failure, return the result.

Refs: https://html.spec.whatwg.org/multipage/webappapis.html#resolve-a-module-specifier

Something perhaps worth considering in regards to the application of the URL parser: will it cause complications when resolving module specifiers w/ URL fragments (i.e., interpret the specifier as an invalid URL and thus result in failure)?

@ljharb
Copy link
Member

ljharb commented Jun 8, 2020

“exports” is about public exports; it doesn’t make sense to me to put internal/private things in there (and doing so seems like it will dramatically complicate resolution tooling) Why wouldn’t we want another field for this, one self-reference also respects?

@jkrems
Copy link
Contributor

jkrems commented Jun 8, 2020

I maybe should've highlighted that part more but your quote from the HTML spec is what I was referring to with the bolded bit here:

Specifiers that start with specific characters (e.g. “./“) or can be parsed as valid URL strings will resolve “trivially” to URLs.

Bare specifiers (which is the only position where exports applies) are - by definition - not valid URLs. If they were, then the URL parsing step would catch them and prevent them from being interpreted using exports. The order is roughly:

  1. Does the specifier start with special relative-URL markers (/, ./)? Interpret as relative URL. Abort.
  2. Does the URL parser accept the specifier? Interpret as absolute URL. Abort.
  3. At this point, URL characters no longer matter. It's a "bare specifier". In node, we interpret this as a "package reference".

will it cause complications when resolving module specifiers w/ URL fragments

At least as spec'd, there's no such thing as a bare specifier with URL fragments (or query arguments). foo#bar, foo?bar and foo are three entirely separate bare specifiers with no connection to each other. This is a position shared by import maps and exports (at least afaik the only specs that are relevant to node at this point):

@jkrems
Copy link
Contributor

jkrems commented Jun 8, 2020

Why wouldn’t we want another field for this, one self-reference also respects?

I think that would be possible. The downsides are:

  1. Awkward to document/teach. Both fields would likely share the same syntax and would have to prevent collision. So one would either have to duplicate all the definitions from the other or refer to the other, making reading the docs a bit more confusing imo.
  2. Higher implementation complexity, for similar reasons. This could be alleviated by merging the two fields on load but then we'd have to be careful not to accidentally allow the "wrong" kind of mapping in each of them.
  3. One more name to (potentially) bikeshed, in addition to the specifier syntax which we'd still need.

I don't have a strong opinion either way but I think the approach in this PR ("maximally incremental") is viable and has one of the lowest possible teaching overheads - which I like. "If you want the exports entry to only be visible to self-reference, start it with ./#" is really short and succinct.

@ljharb
Copy link
Member

ljharb commented Jun 8, 2020

What about someone trying to migrate seamlessly from "exports", but they have a file that starts with #? This already works today.

@jkrems
Copy link
Contributor

jkrems commented Jun 8, 2020

What about someone trying to migrate seamlessly from "exports", but they have a file that starts with #? This already works today.

Yes, that would break. But I think giving them the choice between renaming the file to not start with # or not using exports would be fair. Given how likely that kind of filename is (at least from what I've seen) and the effort required to stop using the filename, that feels like a reasonable constraint.

@bmeck
Copy link
Member

bmeck commented Jun 8, 2020

Do "exports" explicitly support fragments on the left hand side during resolution or is that a bit of undefined behavior? In the docs it doesn't seem clear and it would be surprising if it picked up fragments given that require does not work with anything except a path. If the left hand side is not provided fragments it seems this might not be a breaking change, but might be a problem of defining encoding. To rephrase via example:

require('./#foo'); // resolves in "." -> path "./#foo"
import('./#foo'); // resolves in "." -> URL pathname "./" hash fragment "#foo"
import(encodeURIComponent('./#foo')); // resolves in "." -> URL pathname "./#foo"

With this PR, the semantics of # get a little confusing as bare URLs are not really URLs that I can tell by replacing . with pkg:

require('pkg/#foo'); // resolves in pkg, but # segment now is something new (internal export fragment?), not a hash fragment
import('pkg/#foo'); // resolves in pkg, but # segment now is something new, not a hash fragment
import(encodeURIComponent('pkg/#foo')); // this still gets the file starting with #?

I'm a bit concerned about the overlap here being a bit unclear. ./,/,../ all have well defined meanings and share the fragment syntax with other URL string types (note: not all URL strings are full URLs). This PR would decouple the "bare URL specifier" into something different and I think that without clarity on how to do things like include a hash fragment in the bare URL specifier it needs some heavy explanation of what is going on.

@sokra
Copy link
Contributor

sokra commented Jun 8, 2020

In webpack import "pkg/module?abc" resolves to /.../node_modukes/pkg/module.js?abc.
I would expect this to behave similar when there is an exports field in the package.json. So intuitively I would not pass the query to the exports field.

We didn't implement fragments at all, but I would expect them to behave similar to query string.

So I think it a little bit weird to use # as char here. We have the free choice, why not just use a different one?
I see the intended alignment with private class fields, but don't think this makes it worth it.

@jkrems
Copy link
Contributor

jkrems commented Jun 8, 2020

I created this gist to illustrate the expected behavior for URL-style specifiers when using exports: https://gist.github.com/jkrems/42ffe551614a2c41200f1e9055efcac5

When using exports, exact match is just that: An exact match. Any change to the specifier (like appending ? or #) means that it no longer matches that exports entry. If webpack currently omits parts of the specifier in its exports lookup, that would be a bug relative to how exports is specified and implemented in node. There shouldn't be any preprocessing or normalization for the strings that appear in the exports field itself.

We have the free choice, why not just use a different one?

We could pick a different character but I'm not sure there's one that expresses "this is private" more clearly than #. Most characters that can easily be typed have established meaning in the web/bundler ecosystem, in unix shell environments, and/or in URLs. We could go with "./🤫" or similar options but I'm not sure people would like that any better.

@bmeck
Copy link
Member

bmeck commented Jun 8, 2020

I'd also lean on wanting to avoid making the syntax more complex with characters having multiple meanings. I think supporting this feature might also not be worth its weight if the complexity in explaining how exactly it overlaps is too high. Maybe we could get a better understanding of if we could make an invalid URL or overlap that is highly defined like when import maps tried to use | in their specifiers. Right now you can import things that are not publicly exposed, but this PR is about aiding when you have a complex package that is large enough that you want relatively easy to understand specifiers to access internals. Another solution is to do things like put a nested node_modules entry which is not publicly exposed and wouldn't make export maps more complex. Understanding the need to make specifier syntax more complex and why alternatives are insufficient would help me to understand the need for the feature. Right now it seems to be lacking on some of that explanation. I do think an "imports" like feature that does internal redirecting would be more desirable currently.

@guybedford
Copy link
Contributor Author

The "imports" field always was the original design for this use case, which grew out of an original proposal posted in March 2019 to the exports spec repo at jkrems/proposal-pkg-exports#30, and finally leading to jkrems/proposal-pkg-exports#40 in August 2019.

The concern we have with the "imports" proposal is the weight of having yet another package.json field here and an associated implementation. This PR as you can see is incredibly minimal and adds no new logic apart from a validation step.

There is certainly middle ground to be explored here though and I could get behind the "imports" field if there is interest from Node.js and tools - I agree it is a cleaner semantic separation.

My concern was just if it would be realistically possible to ship a whole new "imports" field at this stage in the game. But if people are feeling optimistic I'm all for that.

@guybedford
Copy link
Contributor Author

The problem we are dealing with here is that tools like Webpack are now supporting both internal rewriting of modules with the "browser" field while also supporting the "exports" field. In my mind that is a clear failure that the exports field is not supporting all the use cases that are necessary for package authors to get the features they need from conditional loading.

For this reason I think it is important that we move forward on a solution here soon, but I can't really tell where consensus might lie at this point since.

I suggested this PR as it is simple and straightforward and gets us there. But I do not know how to progress with the objections further - if I work on a PR to support "imports" it seems that is likely to get objection too?

Can the objectors please suggest a path forward here?

@ljharb
Copy link
Member

ljharb commented Jun 19, 2020

My personal objection/concern is largely around conflating the "exports" field with internal-only usage; for me a separate field would be the only path forward.

The "browser" field has always supported internal rewriting of modules, so I'm not sure why it's surprising that the "browser" condition is being used that way.

@jkrems
Copy link
Contributor

jkrems commented Jun 19, 2020

My personal objection/concern is largely around conflating the "exports" field with internal-only usage; for me a separate field would be the only path forward.

I think it depends on how abstractly exports is understood. It already is true that not all exports mappings are always used. It's potential paths behind the package name that may or may not be visible in a certain importing context. This just adds another importing context (beyond "it used require vs. it used import" etc.) with mappings that are only visible in this new context. I personally like that all possible mappings of the package are in a single field and not spread out across the package.json.

But I don't have strong opinions on imports vs. extending exports.

@guybedford
Copy link
Contributor Author

The "browser" field has always supported internal rewriting of modules, so I'm not sure why it's surprising that the "browser" condition is being used that way.

The problem is that this means the browser field provides useful features the exports field does not - there the exports field is simply not feature-complete, and cannot be considered a replacement for the browser field.

My personal objection/concern is largely around conflating the "exports" field with internal-only usage; for me a separate field would be the only path forward.

Can I ask then explicitly if anyone objects to an "imports" field implementation? @sokra yourself included too here :)

@bmeck
Copy link
Member

bmeck commented Jun 19, 2020

To reiterate since it seems there is some silence on stances. I'd heavily support "imports" rather than this.

@sokra
Copy link
Contributor

sokra commented Jun 19, 2020

"imports" 👍

It's like an "imports map" that affects only this package.

@ljharb
Copy link
Member

ljharb commented Jun 19, 2020

Could even be internalImports or internals to emphasize that it only affects the package itself?

@guybedford
Copy link
Contributor Author

@ljharb for the initial implementation I would like to stick with exactly what was originally proposed in https://github.com/jkrems/proposal-pkg-exports/blob/master/README.md#3-imports-field. I do personally "imports" for its relation to "exports" and I think the nature of imports as internal and exports as external is somewhat captured by those terms already as well.

@guybedford
Copy link
Contributor Author

Closing as the imports field has been merged in #34117.

@guybedford guybedford closed this Jul 14, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
errors Issues and PRs related to JavaScript errors originated in Node.js core. esm Issues and PRs related to the ECMAScript Modules implementation.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

8 participants