Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

esm: treat modules with no extension as .js #34177

Open
wants to merge 2 commits into
base: main
Choose a base branch
from
Open

Conversation

srolel
Copy link

@srolel srolel commented Jul 2, 2020

with this change, extensionless modules loaded with esm loader will be
loaded with the same behavior as .js files:

  1. if "type": "module" is specified, load as an esm module
  2. otherwise, load as cjs module

Fixes: #34049
Fixes: #33226

Checklist
  • make -j4 test (UNIX), or vcbuild test (Windows) passes
  • tests and/or benchmarks are included
  • documentation is changed or added
  • commit message follows commit guidelines

@nodejs-github-bot nodejs-github-bot added the esm Issues and PRs related to the ECMAScript Modules implementation. label Jul 2, 2020
@guybedford
Copy link
Contributor

Note this PR should also include a change to the docs / spec in the esm docs as well.

This seems like a good compromise on the problem of extensionless binary files to me, although won't support binaries like foo.component as JS files of course.

//cc @nodejs/modules-active-members

@GeoffreyBooth
Copy link
Member

GeoffreyBooth commented Jul 3, 2020

This was prevented (at the moment) because @bmeck and others wanted a way to preserve extensionless main entry points as Wasm, and potentially other formats in the future.

In the case of Wasm, at least, that file type has a magic string we could detect. So there could be an algorithm like:

if extensionless
  has recognized magic string?
    load as that type
  else
    if type: module
      load as esm javascript
    else
      load as cjs javascript

The downside of this algorithm is that any future format that we want to support as an extensionless file must include a detectable magic string. But if people are okay with that limitation, then I think this algorithm could work (and therefore this PR).

@ljharb
Copy link
Member

ljharb commented Jul 3, 2020

There’s a third option, which would be a “type” value that only controlled extensionless files - either a “wasm” value, or a separate key entirely.

It seems a little concerning to me to have “type” ever determine anything besides “what a .js file is”.

@bmeck
Copy link
Member

bmeck commented Jul 3, 2020

@ljharb that approach was taken in #31388 which is what caused the revert to drop support for extension-less files. There was objection to adding more "type" field values resulting in #31415 .

I will state that I am personally not comfortable with magic byte detection, some file types have trailing forms of detection so you could have a dual match e.g. files that are both valid Zip archives and JPEG as a classic example. In fact things like HTTP Signed Exchanges also support trailing detection. I think adding a new "type" would be preferable and seems a fine path. To me, "type" was never about just determining a single file extension type (including the empty extension for extension-less files).

@ljharb
Copy link
Member

ljharb commented Jul 3, 2020

Right - what I'm saying is, type isn't meant to be a binary for esm/cjs, it's supposed to be a field that can support all the module formats we want, and will determine how .js is parsed. (I'm also not comfortable with any form of "magic" detection)

This PR is totally viable - if what we want is to make type determine not just "how .js is parsed", but also "how extensionless files are parsed". It seems concerning to me, however, to conflate the two. Would there never be a use case where someone wants an extensionless file and a .js file in the same directory parsed differently?

@bmeck
Copy link
Member

bmeck commented Jul 3, 2020

@ljharb often WASM includes a JS wrapper around it for glue and things like WASM feature detection to have different builds for more modern WASM engines. I can imagine people wanting to use .js for packages acting both as a library and wanting the glue but having a standard extension-less entry point that is just a WASM resource. E.G.

\- cli // binary WASM, no hashbang allowed
\- cli.js // can instantiate `cli` via global, can include a hashbang
\- api.js // command module / JS API glue
\- package.json // exports: "./api.js" , bin: {"cli": "cli.js"} // use cli.js because it is more robust, api.js might directly spawn `cli` without it
\- ...

You get into all sorts of situations so I think trying to enumerate them all would be difficult.

@devsnek
Copy link
Member

devsnek commented Jul 3, 2020

Regardless of whatever horrible things we do with type, node should support wasm files without extensions.

@ljharb
Copy link
Member

ljharb commented Jul 3, 2020

@devsnek yes, i agree! that's why i'm suggesting that we should support extensionless files in any module format, and that it perhaps deserves a distinct mechanism than "type", which is only for .js files.

@devsnek
Copy link
Member

devsnek commented Jul 3, 2020

@ljharb I mean it should support it in the absence of configuration, using wasm bytecode magic

@bmeck
Copy link
Member

bmeck commented Jul 3, 2020

@ljharb why can't we use "type"? nothing in the key's name implies it affects .js files at all, and a union type of string | SomeKindOfMap would match the similar string shorthands we have for things like exports.

@ljharb
Copy link
Member

ljharb commented Jul 3, 2020

@bmeck i agree "type"'s naming doesn't box it in - but it seems concerning to me to conflate two kinds of files in the same key (extensionless, and .js).

@guybedford
Copy link
Contributor

@bmeck, to try and summarize your concern here, it seems you are saying that "type" is not suitable because there could be a need to have a divergence between .js extensions and files without an extension.

And your concern with checking the wasm header for the magic string seems to be that not all possible binary formats in future that Node.js might want to support such as web bundles will support header detection? I was not aware that web bundles were undetectable from their binary format. Do you have further information here?

@bmeck
Copy link
Member

bmeck commented Jul 6, 2020

it seems you are saying that "type" is not suitable because there could be a need to have a divergence between .js extensions and files without an extension.

I was intending to state quite the opposite. I think "type" is fine to overload; right now it is a string which does not have any direct naming related to a specific file extension (or that it even relates to a file extension / a singular one at that) in either the key or the values, and we could easily transform it into an object of some kind similar to how "exports" works.

@bmeck
Copy link
Member

bmeck commented Jul 6, 2020

And your concern with checking the wasm header for the magic string seems to be that not all possible binary formats in future that Node.js might want to support such as web bundles will support header detection? I was not aware that web bundles were undetectable from their binary format. Do you have further information here?

  1. A variety of formats support being trailing files to be read from end to start, Zip, Web bundles, generally archives of some type to allow for append only structuring (though not necessarily without duplication).
  2. It would add a new dimension to the question "how do I know what type of file this is?"
  3. Security concerns, re things like:

It is critical that the rules for distinguishing if a resource is text or binary never determine the computed MIME type to be a scriptable MIME type, as this could allow a privilege escalation attack.

From https://mimesniff.spec.whatwg.org/#sniffing-a-mislabeled-binary-resource

@ljharb
Copy link
Member

ljharb commented Jul 6, 2020

@bmeck to be clear, I think having it be an object is totally fine; my concern is overloading the string form to mean two things.

@bmeck
Copy link
Member

bmeck commented Jul 6, 2020

@ljharb to me the string form mapping multiple values seems fine personally. I'm don't understand why the value in a string form should be limited only to mapping 1 extension but the object form could map many. If the value of the field can map many in 1 form, why not the other?

@ljharb
Copy link
Member

ljharb commented Jul 6, 2020

To be fair, if the object form exists, then it's not as important to me what the string form does.

@guybedford
Copy link
Contributor

Thanks @bmeck for the clarifications. To dig in a little more here...

  1. A variety of formats support being trailing files to be read from end to start, Zip, Web bundles, generally archives of some type to allow for append only structuring (though not necessarily without duplication).

Surely even zip files and web bundles have a well-defined detectable file header? And if we know we have one of these file types, then interpreting the internals appropriately seems like a fine approach to me (eg using an index file mechanism, where that index file can in turn be sniffed).

In all of the above sniffing continues to work out fine, even in these hypothetical future use cases IMO. I still don't see where anything might break down here around header sniffing.

  1. It would add a new dimension to the question "how do I know what type of file this is?"

Binary files have standard headers. Sniffing works.

  1. Security concerns, re things like:

I really do not understand this argument. Running node file is by definition an executable action.

@bmeck
Copy link
Member

bmeck commented Jul 13, 2020

Surely even zip files and web bundles have a well-defined detectable file header? And if we know we have one of these file types, then interpreting the internals appropriately seems like a fine approach to me (eg using an index file mechanism, where that index file can in turn be sniffed).

Yes, but that "header" is not guaranteed to be at the start of the file (E.G. in the case of things like https://en.wikipedia.org/wiki/Self-extracting_archive) and sometimes is even towards the end of the archive format (E.G. Zip archives).

In all of the above sniffing continues to work out fine, even in these hypothetical future use cases IMO. I still don't see where anything might break down here around header sniffing.

I do not know what you mean "work out fine".

Binary files have standard headers. Sniffing works.

It does something, qualifying that it does something isn't really a note about if it is a good choice or not. In the current implementation, files are always associated with meta-data somehow and not content. If we choose sniffing, and only sniffing some file extensions we are changing quite a few designs in ways that are not simply about "working".

We now need to have all the relevant content body parts to perform sniffing rather than just URL + metadata. I think having sniffing only for some files also complicates the issue, instead of asking about "what is the current metadata, and what is url" it becomes "what is the current metadata, what urls do sniffing, what is the url, and if the url is sniffed what is the relevant body for the url".

This all seems to be much more complicated than just stating that you can have the metadata define what the URL pattern is. We already do this for file extensions. We also would not have a good way to state for data: urls that they should be sniffed.

I really do not understand this argument. Running node file is by definition an executable action.

The idea put forth that caused that callout is a threat of privilege escalation, a similar escalation concern is put forth as the driving factor of https://github.com/tc39/proposal-import-conditions , and my concern matches that. The reliance on sniffing leads towards a path that could become a collision where a file is valid for multiple sniffing patterns.


Overall, I remain steadfast that we should not do sniffing. It seems to be much more complicated for questionable gain versus alternatives:

  1. We can do something like esm: add package type wasm #31388 and keep "type":"module" to define extension
    less files to be ESM.
  2. We can do something that expands "type" to map to multiple configurable file formats.

Both of those alternatives satisfy the use case of this PR without causing either a new dimension required to analyze what format a resource is in and without concerns of collision. We explicitly knew that "type" may grow to have new values and that was the reason we chose not to use a boolean. Such alternative solutions seem to match that design decision.

@bmeck
Copy link
Member

bmeck commented Jul 13, 2020

Alternatively we could look at something like X-Content-Type-Options and defaulting it to a nosniff equivalent, that would prevent my blocking concerns regarding WASM but not solve how to use WASM without sniffing. This likely would need to be enabled on a per context level and would cause incompatibility in the ecosystem depending on if the flag was enabled or not.

@guybedford
Copy link
Contributor

Yes, but that "header" is not guaranteed to be at the start of the file (E.G. in the case of things like https://en.wikipedia.org/wiki/Self-extracting_archive) and sometimes is even towards the end of the archive format (E.G. Zip archives).

I think that using the example of Node.js executing a self-extracting archive in order to argue against a feature which has historically been supported is quite a low-leverage argument personally.

We also would not have a good way to state for data: urls that they should be sniffed.

We are talking about file paths here specifically. Nothing about this format detection applies to URLs.

The idea put forth that caused that callout is a threat of privilege escalation, a similar escalation concern is put forth as the driving factor of https://github.com/tc39/proposal-import-conditions , and my concern matches that. The reliance on sniffing leads towards a path that could become a collision where a file is valid for multiple sniffing patterns.

You can only have a privilege of escalation when you start from a lower level privilege. node app is the highest level of privilege there is in Node.js to indicate which file to execute. Thus there exists no escalation since it is already "escalated".

Overall, I remain steadfast that we should not do sniffing. It seems to be much more complicated for questionable gain versus alternatives:

We can do something like #31388 and keep "type":"module" to define extension less files to be ESM.
We can do something that expands "type" to map to multiple configurable file formats.

I am strongly against both alternatives unfortunately.

@bmeck
Copy link
Member

bmeck commented Jul 14, 2020

I think that using the example of Node.js executing a self-extracting archive in order to argue against a feature which has historically been supported is quite a low-leverage argument personally.

It is merely an example of how archives are not necessarily read from start of file. This kind of stuff about archives being trailing rather than leading in the body of a file shows up all over with security concerns and can be seen when you try to upload archives to various email providers that they reject them. I'm not convinced that mime sniffing is a good idea as well considering that similar methods are not used by browser and/or even operating systems (in totality, some detection does occur E.G. https://en.wikipedia.org/wiki/File_format#Magic_number but limited even then regarding things like execve )

We are talking about file paths here specifically. Nothing about this format detection applies to URLs.

I think we should have some solution that allows resources to be generated for non-file formats. data: and potentially https:. I do not think stating it is out of scope is convincing since using sniffing requires actually sniffing the body which generally is never done for remote files that might evaluate.

You can only have a privilege of escalation when you start from a lower level privilege. node app is the highest level of privilege there is in Node.js to indicate which file to execute. Thus there exists no escalation since it is already "escalated".

WASM touts that it is designed to execute with limited and sandboxed privileges by only using injected capabilities unlike JS, that alone appears to make there be different levels of privilege.

I am strongly against both alternatives unfortunately.

Are you also against the nosniff compromise 3rd option? It would be good to understand what design constraints we have to work with given that those do not.


Alternatively we could allow files without extensions to be treated as ESM in "type":"module" and CJS in "type":"commonjs" and state that we have no intent to allow WASM without extensions at this time due to conflicting concerns.

@ljharb
Copy link
Member

ljharb commented Jul 14, 2020

@guybedford can you elaborate on why you're strongly against both alternatives? (i'm mildly against a "type" string also applying to extensionless, ftr, so i'm in particular asking about the map)

@guybedford
Copy link
Contributor

I'm not saying sniffing should be used in all cases - we are specifically talking about extensionless files and the "Node.js main runpath" which is the node x path.

Let's define somewhat the security constraints though. There are two cases node x and import 'x'. I personally see no problem special casing the first as the higher privilege. I can see that import 'x' for extensionless doing sniffing may be an escalation.

This was why my original solution to this problem was always to explicitly special case node x and not the import x case, as discussed before.

I think we should have some solution that allows resources to be generated for non-file formats

MIME types for http/https URLs seems the sensible way to do this. This can be implemented with loaders and the getFormat hook. I would like to focus the discussion here to file URLs specifically which do use file extensions as the source of truth.

WASM touts that it is designed to execute with limited and sandboxed privileges by only using injected capabilities unlike JS, that alone appears to make there be different levels of privilege.

It just feels a little bit unfair to expect a full discussion over a comprehensive WASM security model in order to consider progress on this use case.

Alternatively we could allow files without extensions to be treated as ESM in "type":"module" and CJS in "type":"commonjs" and state that we have no intent to allow WASM without extensions at this time due to conflicting concerns.

I could get behind this. It still seems to leave behind the use case of node app where app is WASM, and I would still recommend entry point sniffing specifically as the non-escalation for that use case in future.

@guybedford can you elaborate on why you're strongly against both alternatives? (i'm mildly against a "type" string also applying to extensionless, ftr, so i'm in particular asking about the map)

@ljharb "type": "wasm" should not change .js interpretations so it feels like a somewhat overloaded exception to me. Having a map means defining a special type or mime database, which then itself needs to be defined into the loader API. I am weary of new namespaces without well-defined semantics, and a lot of work would need to be done to prove any well-defined semantics. Eg what if different loaders have different interpretations for the same strings in the database, who manages the database, and if it does use MIME types how it handles the differences between the MIME type database use cases and the Node.js loader use cases which aren't necessarily defined in the same ways and may have many to one or one to many relations where ambiguities or lack of meaning exists such as what content type to use for typescript or coffescript.

@devsnek
Copy link
Member

devsnek commented Jul 16, 2020

People distribute wasm cli binaries without an extension. It can be the only type we sniff by default, and nothing else is allowed to lack an extension, but it really needs to be there.

@bmeck
Copy link
Member

bmeck commented Jul 16, 2020

There are two cases node x and import 'x'.

There is a 3rd case with worker_threads spawning x as the main entrypoint which sits somewhere in the middle and after user code has started executing. Additionally we have --require etc in a similar input level. I think it is not clear to me how we are intending to treat these as separate when modification can occur in the same level as import for a variety of these.

I would like to focus the discussion here to file URLs specifically which do use file extensions as the source of truth.

I am not convinced that this should only apply to files. The extensions is not even the source of truth given package types altering formats for a given extension.

It just feels a little bit unfair to expect a full discussion over a comprehensive WASM security model in order to consider progress on this use case.

I am just pointing out the claims of different levels of trust as a counterpoint to the claim that loading everything is of the same authority.

I would still recommend entry point sniffing specifically as the non-escalation for that use case in future.

This would make me uncomfortable as I do not want sniffing per multiple reasons listed above. If the goal is to keep either solution as the future I do not think we should take that path and we must resolve not to pursue either path in the near or even mid term future unless something changes.

which then itself needs to be defined into the loader API

We currently don't even expose the package type, so this doesn't seem to match up with the state of determining formats.

work would need to be done to prove any well-defined semantics

what are you seeking to see / what work can be done? It seems if formats do not have well defined semantics with our current internal database adding sniffing also would make things more complex semantically.

what if different loaders have different interpretations for the same strings in the database,

This is an already existing problem with our loader designs. I do not see how exposing the database and/or allowing alternative databases changes the existing issue with loaders returning non-builtin formats.

who manages the database

Node, likely referencing WHATWG and/or IANA. I don't see how why this is different from the fact that getFormat is not controlled by a single entity currently.

and if it does use MIME types how it handles the differences between the MIME type database use cases and the Node.js loader use cases which aren't necessarily defined in the same ways and may have many to one or one to many relations ...

I do not understand this sentence. MIME is not actually associated with any 1-1 mapping and only lists common file extensions. If this is about the ambiguity of looking up a format for a given extension, we already do not expose this capability and which is already ambiguous if you do not read package.json files.

ambiguities or lack of meaning exists such as what content type to use for typescript or coffescript.

This already is a problem with getFormat and is unrelated to sniffing and/or exposing a way to configure the default getFormat's values to my knowledge.

@guybedford
Copy link
Contributor

My counter arguments are simply to try and shift this in the direction of debating solutions over debating problems.

We already support extensionless mains today for CommonJS - node x will execute CommonJS, but you cannot import that same file through the ES module system.

Unfortunately I don't see any proposal which offers a valid counter here.

@guybedford
Copy link
Contributor

To be very clear - extensionless bin files should not have to rely on out-of-band metadata since they are "file-system-portable" unlike packages which comprise a number of files and hence can contain their own metadata in the package.json.

We either solve this use case for Node.js or we do not.

@bmeck
Copy link
Member

bmeck commented Jul 16, 2020

Unfortunately I don't see any proposal which offers a valid counter here.

What makes an argument valid? I don't understand what is being sought. I think all 5 mentioned solutions all allow node x to evaluate under ESM, some don't allow x to be WASM though.

To be very clear - extensionless bin files should not have to rely on out-of-band metadata since they are "file-system-portable" unlike packages which comprise a number of files and hence can contain their own metadata in the package.json.

Extension-less files are not currently portable for a variety of reasons. If we do use "type" to alter how they are seen they would also remain non-universally portable due to needing to know the package scope in which they reside. Regardless of which path we move forward with, package.json metadata is seeking to be used for "type".

@guybedford
Copy link
Contributor

guybedford commented Jul 16, 2020 via email

@bmeck
Copy link
Member

bmeck commented Jul 16, 2020

@guybedford I would be uncomfortable with that as it means CLI determines the format of something rather than it being static; with something like #34177 (comment) , even though there is a flag it could be dynamically checked and doesn't apply to a single exception point within the system but rather configures the system as a whole. That has fewer overall possible permutations for a graph's interpretation than a single node being treated specially.

@guybedford
Copy link
Contributor

guybedford commented Jul 16, 2020 via email

@bmeck
Copy link
Member

bmeck commented Jul 16, 2020

@guybedford we dropped support for extension-less imported files due to WASM it seems like this PR shouldn't land unless we come to some agreement about a directions for WASM. I've stated my concerns with sniffing, but remain unclear likewise about exposing a MIME database since we already use one internally and loaders already suffer from the arguments about standardizing/coordinating formats. If we wanted to try and state that standardization and coordination of formats is a necessity for our loaders, we should move towards altering our loader API; I don't see the same pushback about our existing APIs that I'm seeing in this thread regarding lack of those guarantees.

@guybedford
Copy link
Contributor

@bmeck my primary concern with a MIME database is that it doesn't solve the issue of being able to execute a file directly. The concept of a universal executor is that you can apply it to a binary. This is what operating systems do. If we require that files must always be located alongside appropriate metadata that places a direct constraint on the possibilities of the future of Node.js in not supporting a simple node x workflow. By ruling out sniffing we rule out Node.js as a universal configurationless executor.

@guybedford
Copy link
Contributor

I still continue to not understand the arguments against sniffing.

The two cases which you claim are possible escalations are:

  1. node x executing as JS upgrading to executing as WASM could be a possible escalation because WASM has WASI-level capabilities.
  2. import 'x' excuting as JS upgrading to executing as WASM could be a possible escalation for the same reason.

In both cases, JS has the ability to import fs functions, and has full permissions to everything in Node.js. WASI does not have any permissions over and above these permissions. Thus there is no escalation.

Where am I wrong?

@targos
Copy link
Member

targos commented Jul 17, 2020

People distribute wasm cli binaries without an extension. It can be the only type we sniff by default, and nothing else is allowed to lack an extension, but it really needs to be there.

@devsnek do you have an example at hand?

@bmeck
Copy link
Member

bmeck commented Jul 17, 2020

This is what operating systems do.

This is not what other runtimes do and I remain concerned about type confusion in this space. Node is not the same as an operating system. I agree that using things like binfmt_misc can allow magic bytes to select an executable for Linux, but that is not the case for Windows or OSX.

Java has --source as a CLI flag as a means of executing single file apps like you describe.

Python has zipapp and to drop the .pyz more closely follows the self executing archive approach.

Ruby has Traveling Ruby as the most common thing these days. Also the executing archive approach.

Node has pkg and the like as well generating bundles.

I even gave a talk about this idea years ago for node.

To me, it appears this need to have a runtime able to execute a standalone is solved either by making a full binary or providing metadata.

A big thing for me here is that bundling methods ensure that the runtime is compatible with the code since it was bundled at the time of build. Even if Node were to be considered a "universal executor" it wouldn't necessarily be safe to update node if it breaks these non-runtime bearing executables.

I think the idea that we are trying to solve a problem of single file executables is false and we do need to take into account existing solutions that other ecosystems have and are not trying to change and that we don't even have a way to do this due to using metadata anyway (see how make test fails in node core if we have a package.json with the wrong type above it / or above the global npm install prefix).

If we require that files must always be located alongside appropriate metadata that places a direct constraint on the possibilities of the future of Node.js in not supporting a simple node x workflow. By ruling out sniffing we rule out Node.js as a universal configurationless executor.

I would state the fact that node x by default must execute as CJS instead of ESM and this PR is about allowing node x to execute as ESM is a red flag to this argument. I do not believe our claimed first class support for ESM would fulfill your "configurationless executor" desires here.

On that Note, I think that moving to need not just the metadata of the CJS/ESM split but also to actually read the file is a much larger concern for me than you. By using file bodies for format determination, static tools must now analyze all those files' bodies and we still don't have a clean way to deal with the fact that we also need the metadata still.

I still continue to not understand the arguments against sniffing.

  1. Other runtimes don't do it which make me suspect.
  2. Type confusion seems a potential problem even if we don't see one immediately due to no longer having single source of truth about how to determine a format.
  3. Sniffing doesn't fully satisfy all our formats.

Where am I wrong?

The problem is you are setting up exact claims and I have a general unease given that I do see things exploited by mime sniffing regarding executables. My claims are that this is not as simple as the first file ever loaded. You could import 'bin/foo' and that is even the intent for command Modules of WASM, a sane usage of worker_threads, etc. On the web for example mime sniffing is explicitly used for things that do not have execution privileges and you see CVEs regarding things like the JPEG parser still. Claiming that sniffing is used in such a context would be an extreme red flag because it appears they have no desire to do so. This also leads me to think that we should not do sniffing when executable privileges are involved. It isn't that you are wrong about your given claims, it is me being conservative and not wanting to increase complexity without good cause.


I do believe that the fact that we have forced ourselves to rely on metadata isn't necessarily terrible which it appears to have some disagreement here. However, a lot of things simply are not installed as a single file and are generally in a directory with a package.json they often also include resources like HTML template files, translation files, etc. If the goal is to load a single file we could also look towards something like the other languages or even ship a node-exec-bundle which doesn't allow loading code outside of a bundle does seem sane to me.

I truly do think if the problem is about executing single files in a configuration less manner it likely needs to be a different issue and not part of this.

@bmeck
Copy link
Member

bmeck commented Jul 17, 2020

@targos I know https://github.com/wasmerio/wasmer will execute files without extensions as WASM but do not see it doing any sniffing, it just runs and errors on wrong file formats.

@devsnek
Copy link
Member

devsnek commented Jul 17, 2020

@targos you can find some stuff on wapm.io. the idea is that you throw the binaries in your PATH like any other binary, and register them to run with wasmtime/lucet/etc using binfmt_misc.

@bmeck
Copy link
Member

bmeck commented Jul 17, 2020

@devsnek that works for linux but not for Windows or OSX though.

@devsnek
Copy link
Member

devsnek commented Jul 17, 2020

@bmeck there is a binfmt compat thing people use on macos (i used to use it but i don't own a mac anymore), i don't know about windows.

@GeoffreyBooth
Copy link
Member

I still continue to not understand the arguments against sniffing.

  1. Other runtimes don't do it which make me suspect.

I'm trying to think of a relevant parallel to node file.js / node file.wasm, where the first file is a string and the second is binary bytecode. The only think I can come up with is Java, but I don't think you can do both java file.java or java file.class (and even then, a .class is a compiled form of .java, not an entirely different format). Are there any examples of other runtimes that have anything similar to .js / .wasm?

  1. Type confusion seems a potential problem even if we don't see one immediately due to no longer having single source of truth about how to determine a format.

How do you define type confusion? Sniffing not being able to determine the format?

I think the algorithm proposed above is simple enough: if extensionless, sniff for known magic strings and run if detected as a known type; else run as JavaScript, either as CommonJS or as ESM per package.json. Yes, this limits support for future extensionless types to those that have magic strings or are otherwise somehow sniffable; but I think this is an acceptable tradeoff.

  1. Sniffing doesn't fully satisfy all our formats.

Which formats?

And I guess the question is, what's the alternative? I think I'd rather have sniffing for formats that are detectable over not sniffing at all, or requiring a package.json nearby with additional metadata. If the whole point is to provide support for extensionless binaries, we shouldn't require them to be in a folder with other files nearby; if a folder is necessary, then one could easily make an extensionless symlink to a file with an extension, or make an extensionless shell script that runs exec node file.wasm or whatever.

@bmeck
Copy link
Member

bmeck commented Jul 19, 2020

Are there any examples of other runtimes that have anything similar to .js / .wasm?

Python has .pyc , .py, and .pyz.
Java has .class, .java, and .jar.
Ruby has .rb.yarb, and .rb .
Some WASM runtimes have .wasm (binary encoded) and .wat (text encoded).

In general most VM based runtimes have some form of compiled vs interpreted format.

How do you define type confusion? Sniffing not being able to determine the format?

Well... we can declare sniffing works anyway we want, even if it doesn't support every file of a specific type. E.G. https://drive.google.com/file/d/15gY2pB_kmlI8SXFV0tPjtnVaIldmluWQ/view?usp=sharing is both a valid JPEG and a valid ZIP file. I uploaded this as a .both extension but MIME sniffing chose the trailing ZIP as the correct format for example. My fear is we are inviting these kind of confusions that other runtimes have seemed to be avoiding. Web bundles are intended to allow the same kind of trailing like other archive formats.

Which formats?

If we sniff WASM we can state we have WASM regardless of the package scope. However, we don't use sniffing for example to determine ESM vs CJS. If we sniff WASM it allows for WASM and either CJS or ESM in a given situation, but you cannot have CJS and ESM in the same. If we did things like allow configuring default package type in the CLI ( rejected in #32394 ) we could still serve both, but we are once again using metadata.

And I guess the question is, what's the alternative?

I think we should re-evaluate what we are trying to solve first of all. Having node applications able to be distributed as single files can be satisfied without assuming that an extension-less file that is sniffed to determine format. See the examples above.

I think I'd rather have sniffing for formats that are detectable over not sniffing at all, or requiring a package.json nearby with additional metadata.

I simply don't think sniffing for WASM is a good idea regardless. I also don't think sniffing for WASM is going to be a use case that really falls into the single file as a full binary situation usually. For some cases a WASM file will be the entire application but there are a lot of cases where it won't match up since many application uses multiple files.

If the whole point is to provide support for extensionless binaries, we shouldn't require them to be in a folder with other files nearby;

I'd agree! I just don't think a single file works for the majority of actual use cases to distribute a binary! Usually you want an application in a bundle of some kind, such as a zip archive approach used by all those places I linked above.

if a folder is necessary, then one could easily make an extensionless symlink to a file with an extension,

This is the most common form of distributing Node applications, most node applications have multiple files which means they are using a folder somehow. In fact, if the application use "type":"module" it must use a folder.

or make an extensionless shell script that runs exec node file.wasm or whatever.

This is what NPM does on windows actually instead of a symlink (look for .cmd files).

@GeoffreyBooth
Copy link
Member

In general most VM based runtimes have some form of compiled vs interpreted format.

Yes, but all of those examples are a runtime taking as input the text, compiled or compressed version of its own source code (like plaintext Java source, compiled Java, or zipped Java). I'm asking if there's an example of a runtime taking as input different types of sources, like if the java command could also take as input a .py file. That's what wasm feels like to me.

I think the original ask of this PR is valid: since we allow Node to execute extensionless JavaScript as CommonJS, we should allow the same as ESM via "type": "module". I think doing so is expected and overdue, and doesn't foreclose someday supporting extensionless wasm via sniffing. I assume that currently, extensionless wasm errors, as Node would try to run it as CommonJS JavaScript; and that would continue to be the case after this PR. So even if someday we find some other solution for extensionless wasm besides sniffing, I would think that we could land this PR now and worry about the wasm case later.

@ljharb
Copy link
Member

ljharb commented Jul 19, 2020

since we allow Node to execute extensionless JavaScript as CommonJS, we should allow the same as ESM

is, i think, unarguably valid! the

via "type": "module"

part i don't think is quite as straightforward or obvious.

@bmeck
Copy link
Member

bmeck commented Jul 19, 2020

I assume that currently, extensionless wasm errors, as Node would try to run it as CommonJS JavaScript; and that would continue to be the case after this PR. So even if someday we find some other solution for extensionless wasm besides sniffing, I would think that we could land this PR now and worry about the wasm case later.

  1. We had this feature revoked due to trying to support WASM in the past.

I don't think we should land this feature until plans for WASM or the use case are more clarified. This smells like a mistake to me for a quick fix rather than understanding some things that don't match up in the arguments. For example, the claim is that:

we should allow the same as ESM via "type": "module"

However, for some, the intent of such support is to allow so called configuration-less executables / single file applications; to others, the intent is merely to support files in a package context (thus not single file by nature) to be extension-less and have the ability to load first class module formats. These are 2 very different use cases being conflated and I think this is starting to show as a main point of contention. Single file application by nature is about having a file without a package.json and thus are not relevant to this PR style of approach, is not common in the wild except for one off scripts, does not support a variety of things. I think we need to separate what we want to allow to be the "same". This PR is about configuring the package.json to allow ESM to hold a certain format in a package scope. In the past the exact PR that was rejected and caused ESM without an extension support to be removed was the same as this PR but for WASM.

  1. People are against using the same approach to support WASM as what this PR implements.

In fact, people are stating that meta-data is undesirable for such single file applications. This conflict needs to be teased out as like @ljharb I am not convinced of this argument style that package.json for loading ESM is good since for WASM the same kind of workflow is seen as undesirable. This leads to questions about other formats for which metadata might be more desirable than sniffing (which people seem to disagree on if WASM should be sniffed vs configured), questions about what are the problems with sniffing, etc.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
esm Issues and PRs related to the ECMAScript Modules implementation.
Projects
None yet
8 participants