module: add hook for global preload code #32068

jkrems · 2020-03-03T17:16:44Z

This work is meant to unblock progress on moving hook execution out of the main thread (and global scope). It creates an official place where hooks may interact with the global scope, hopefully allowing us to remove the need for random global side effects elsewhere.

So far this only applies the preload once before the initial root module is executed. It felt a bit too opinionated to also apply it to all vm-created contexts etc.. I assume that if somebody wants that for their hook, they could patch vm.createContext and friends to do it. This matches the power of the top-level code in today's loaders that also won't run for every newly created context.

Add test and/or document support in worker threads.
Remove unused context argument.
Change preload code to run in an implicit function scope instead using the last expression.

See: #31229

/cc @nodejs/modules-active-members

Checklist

make -j4 test (UNIX), or vcbuild test (Windows) passes
tests and/or benchmarks are included
documentation is changed or added
commit message follows commit guidelines

devsnek · 2020-03-03T17:27:19Z

this seems very hacky. can you go more into detail about use cases and why we need to pass around stings of code that need to end with functions? As far as I can tell, one could, for example, just import VM and run code in the top level of their loader, and that would do the same thing as this, without node having to expose such an odd API.

jkrems · 2020-03-03T17:53:34Z

One fundamental design decision implied here is: there are multiple people who want to move loader hooks to run on a separate thread. See the linked PR for the progress on that front. In that context I’m not sure how your suggestion of using the vm module inside of the loader would work. It would be a vm module in a different isolate without clear access to the global scope that the app is running in.

There are two known alternatives for the current approach:

Instead of providing a source string, provide a file name or URL. It gets a bit wonky with how that would interact with the require cache and/or module map.
Make the function wrapper implicit, just like with CommonJS. It would remove the ability to declare certain kinds of globals but that may not be a deal breaker.

The use cases are instrumenting core modules and setting up any kind of global state required by code generated by other hooks. E.g. if transformSource assumes the existence on certain global runtime functions. Another use case is global attenuation / locking down the runtime before user code runs.

Right now loader hooks may just rely on the fact that they happen to run in the same global scope as all other code but when moving to threads, that is no longer true.

devsnek · 2020-03-03T18:01:18Z

could we just allow hooks to explicitly load a module? we couldn't return the namespace in the threaded design but it would let them evaluate code in the main context in a much more natural way.

jkrems · 2020-03-03T18:16:11Z

could we just allow hooks to explicitly load a module? we couldn't return the namespace in the threaded design but it would let them evaluate code in the main context in a much more natural way.

Let's say I have a loader that wants to remove globalThis.process (for whatever reason). How would that work? Would it add a noop resolve hook that then as a side effect loads a module to delete the property? If the hook thread is shared across isolates, would it keep track of which thread/context the resolve request is part of to not run it multiple times (and block resolution)? I don't think that's really natural for a lot of these cases.

The point here is to allow running code before any normal application code runs. And in many cases in a way that doesn't leak into what the application sees. So an import is very unlikely to work because it would also leak into the import map. Unless it's a magical import. Similar for a require because it shouldn't be the default that the author needs to remember to clean up the cache.

Yes, the code could be loaded from a file instead of from a string. But that wouldn't change the fact that it would have to be reliably triggered once before the user code runs.

Another alternative that was discussed at some point was to say that we'll expect any loader that requires access to the global scope to use two separate flags: --require preload.js --loader hooks.mjs. But that would be pretty unfortunate from a usability perspective.

devsnek · 2020-03-03T18:17:57Z

@jkrems ~~with TLA its just await MainContext.import('delete-process') in the top of your loader~~ (missed shared isolate case)

export async function init(main) {
  await main.import('delete-process');
}

So an import is very unlikely to work because it would also leak into the import map.

this shouldn't ever be explicitly observable. only export const firstImportedAt = Date.now() or things in that vein should be able to expose that.

jkrems · 2020-03-03T18:30:55Z

with TLA its just await MainContext.import('delete-process') in the top of your loader

Not sure what MainContext is and also not sure why it even has to be async and require TLA. And I assume it wouldn't actually import (as in: use the module map etc.) because then we'd be back to leaking information into user code.

So, your suggestion would be:

Add a new API that can be imported only by loaders, e.g.import MainContext from '<magic loader string>';. This would work because loaders already run in their own isolated module map.
Add a method to it that can be used to run an arbitrary file at an arbitrary point in time. Hopefully sync or we are blocked on TLA before we can actually ship loaders in any version of node.
The file itself can't be a module because otherwise it would be forced into strict mode, making it a lot harder to write instrumentation code. Also, making it a module risks messing with bootstrap timing of the app.

So it would be something like this?

import MainContext from 'node-loader:main-context';
import { fileURLToPath } from 'url';

MainContext.runFileSync(fileURLToPath(new URL(import.meta.url, './preload.js')));

Where preload.js would contain the same content as above. Now for the next issue: If the hook thread is shared, the module body doesn't execute each time a new thread starts and would need to run the preload. How would that be handled?

devsnek · 2020-03-03T18:33:18Z

@jkrems i edited my comment a bit ago to address the multi-isolate case

jkrems · 2020-03-03T18:34:12Z

this shouldn't ever be explicitly observable.

If during normal execution import('<random string>') fails but it succeeds when the loader is active, it is observable. As soon as the loader or its preload is in the module map, it is observable.

devsnek · 2020-03-03T18:36:12Z

@jkrems i don't understand how you're observing that something was imported, esm is explicitly designed to preclude such a thing. (if our impl is leaking that's a side channel attack and we should fix it)

jkrems · 2020-03-03T18:43:51Z

Okay, so your code becomes more fully:

export async function init(main) {
  await main.runFileSync(fileURLToPath(new URL(import.meta.url, './preload.js')));
}

This feels like a more complicated version of one of the alternatives I mentioned at the top:

export async function getGlobalPreloadPath() {
  return fileURLToPath(new URL(import.meta.url, './preload.js'));
}

Which in turn is a slightly more verbose version of just inlining the preload code.

i don't understand how you're observing that something was imported

Since it was provided by the loader and would otherwise - generally - not exist, the fact that I can import the URL at all is information.

import('file:///guessed/path/of/secret/agent/code.js').catch(err => {
  // run evil code because the agent won't catch us.
});

Also, when trying to mess with global state, there was an explicit request for having something that can run sloppy mode code. Which makes modules a problematic choice. The design goal was that having the preload has no visible impact on user code until very explicitly opting into it. This rules out leaking into the module map or even having a true require by default that may leak into the require cache.

devsnek · 2020-03-03T18:49:32Z

Since it was provided by the loader and would otherwise - generally - not exist, the fact that I can import the URL at all is information.

This is irrelevant, because the loader itself exists (i can just fs.stat for node_modules/loader). The thing you actually want to make sure of is that user code can't check whether a module has already been imported.

Sloppy code

import vm in your delete-process module?

devsnek · 2020-03-03T19:00:36Z

I want to really explicitly address this "module map" case. The problem you seem to be having is not that the file would be in the module cache (since that can't be observed) but rather that the file can be imported at all. That would seem to motivate inlining the code in your loader as a string, which is what you suggest up at the top, which is what prompts me to say "ok i'll just check if your loader exists then". This is what further motivates me to say the entire case is irrelevant, because then a loader author would just not make "the loader exists" part of their security assumptions (indeed, user code that wants to exploit the globals being mutable doesn't gain anything from knowing the make-everything-immutable loader exists)

jkrems · 2020-03-03T19:45:50Z

This is irrelevant, because the loader itself exists (i can just fs.stat for node_modules/loader).

No you can't because the preload code ran and can do all kinds of things, including hiding certain APIs or patching them to prevent access to certain directories (or replace them with virtual/in-memory locations only visible to the user code). We can play through the cat-and-mouse here but the point will always remain: There are cases where the fact that a certain key in the module map is "blocked" leaks and no amount of trickery can completely hide that fact.

I personally don't care about that leakage but it exists and afaik @bmeck and possibly others care.

import vm in your delete-process module?

Sure, that's possible. But it optimizes the API for... something. And I'm not sure what that something is. The current API is optimized for "patches and instrumentation that have minimal visible impact on what userland apps may see". Making it a module doesn't seem to offer any advantage, just making everything trickier to manage and harder to use..?

devsnek · 2020-03-03T19:47:43Z

There are cases where the fact that a certain key in the module map is "blocked" leaks and no amount of trickery can completely hide that fact.

As i've said multiple times now, this is false. There is no way to check if something is in the module map.

jkrems · 2020-03-03T19:55:02Z

As i've said like multiple times, now this is false. There is no way to check if something is in the module map.

Right, I disagreed and gave a pretty explicit example for doing it. If you want something more complete:

// loader
fs.writeFileSync('/tmp/preload.mjs', PRELOAD);
MainContext.import('file:///tmp/preload.mjs');
fs.unlinkFileSync('/tmp/preload.mjs');

// how do I prevent the user code from checking that my preload code ran?
// how do I ensure that arbitrary code still works, even if it would want to load a different
// generated source text from /tmp/preload.mjs?

devsnek · 2020-03-03T19:59:07Z

@jkrems your loader can just route the module trying to import /tmp/preload.mjs to ERR_MODULE_NOT_FOUND. Modules in the cache are keyed by (referrer, specifier), not resolved url.

In any case like i said, even if you delete preload.mjs, you can't delete your loader, and i can just check that the file exists, which leads to the conclusion i posted like an hour ago (which is don't base your security on the consumer not knowing you exist, trivially you can load up the inspector and iterate over every single script and module)

jkrems · 2020-03-03T20:02:16Z

your loader can just route the module trying to import /tmp/preload.mjs to ERR_MODULE_NOT_FOUND.

Which would be... observable. That's the whole point. If the app writes its own /tmp/preload.mjs, it would break the app (cause an observable error or difference in behavior) if it doesn't get a new module with fresh evaluation and those new contents.

Keep in mind that this is about the ideal of preventing observable effects on arbitrary code. I agree that in likely or frequent cases, it wouldn't be observed in practice.

devsnek · 2020-03-03T20:02:53Z

@jkrems its ERR_MODULE_NOT_FOUND because you deleted it, if they created their own it would resolve to that.

devsnek · 2020-03-03T20:07:19Z

If you somehow hide the fact that your loader does actually exist (which is a really dumb thing to base your security on, please don't), you can do import('data:text/javascript,${MY_SOURCE}') if you really want.

jkrems · 2020-03-03T20:08:06Z

if they created their own it would resolve to that.

How? If there already is an entry in the module map for file:///tmp/preload.mjs due to the loader code running first, what would the behavior be? Afaik there's only three options:

Prevent access by rerouting resolution (observable).
Put it into the module map under a generated different URL (likely observable due to import.meta.url and/or relative resolution unless we make other changes).
Grant access to the loader-injected module (observable).

This isn't just about security. It's also about breaking random code which is something an APM-case for example would care about.

devsnek · 2020-03-03T20:09:03Z

How? If there already is an entry in the module map for file:///tmp/preload.mjs due to the loader code running first, what would the behavior be? Afaik there's only three options:

the cache is not keyed by resolved url, it is keyed by (referrer, specifier) pairs. i just said this like two messages ago.

bmeck · 2020-03-03T20:35:33Z

I think the argument about loading vm.* isn't appealing since the intent of polyfills/--require mutations is to modify the main context.

I think the argument about wanting it to be a CJS or ESM module / anything on disk is likely not appealing as I cannot tell where the constraint that it needs to come off disk comes from. You can just use import() to get a hold of what you want/need and access disk if that is desirable.

I need to think on the function completion value and what happens if async work does occur. I did have serious problems getting timing to work for async behaviors with the bootstrap while trying to make a --import flag but would be curious if we can find some way to allow async behavior in this hook eventually.

I don't find the Module Map to be an issue as a loader can completely overwrite the behavior of loading stuff via import() and a simple prefix would fix things so that you know where via/when it was imported then.

Overall, I like this design as it enables a lot of things that are hard to manage while not being cluttered. The sloppy mode behavior is generally to enable the lexical contour of globals (welcome to hell) that some APIs do want to use and/or are specified as how standards define how things work. Some things also use it for various behavior that isn't present in strict mode such as globalThis polyfills.

As experimental, I think this API is fine and don't see any immediate changes needed as this is behind a flag and we can experiment with it in the wild some.

jkrems · 2020-03-03T20:40:01Z

the cache is not keyed by resolved url, it is keyed by (referrer, specifier) pairs.

This isn’t about the spec. It’s about the module map as it exists in node. It is keyed by URL unless there was a major change I missed recently. So as far as I know the cache is not keyed by these pairs. But happy to be proven wrong with links to nodejs source code showing otherwise.

bmeck · 2020-03-03T20:44:32Z

There are 2 levels of module mappings (I often call them global (node, per context) and local (ecma262, per source text)) please be specific so I don't accidentally get confused T_T.

doc/api/esm.md

lib/internal/process/esm_loader.js

guybedford

Thanks for working on this, looks good in principle.

lib/internal/modules/esm/loader.js

test/es-module/test-esm-loader-side-effect.mjs

doc/api/esm.md

lib/internal/modules/esm/loader.js

jkrems · 2020-03-09T18:23:51Z

Updated:

Uses a wrapper function instead of immediate global code.
Test includes running a worker thread.
The unused context argument was removed.

lib/internal/modules/esm/loader.js

bmeck

looks ready to me

nodejs-github-bot · 2020-03-16T17:52:52Z

CI: https://ci.nodejs.org/job/node-test-pull-request/29870/

bmeck · 2020-03-17T18:14:53Z

CI failure appears unrelated.

If there are no objections I'd like to land this tomorrow and get back to #31229 which is pending this capability.

doc/api/esm.md

lib/internal/modules/esm/loader.js

nodejs-github-bot · 2020-03-20T20:27:00Z

CI: https://ci.nodejs.org/job/node-test-pull-request/29958/

PR-URL: #32068 Reviewed-By: Bradley Farias <[email protected]> Reviewed-By: Geoffrey Booth <[email protected]>

bmeck · 2020-03-23T13:40:01Z

Landed in 07a1fb9

PR-URL: #32068 Reviewed-By: Bradley Farias <[email protected]> Reviewed-By: Geoffrey Booth <[email protected]>

PR-URL: nodejs#32068 Reviewed-By: Bradley Farias <[email protected]> Reviewed-By: Geoffrey Booth <[email protected]>

Backport-PR-URL: #32610 PR-URL: #32068 Reviewed-By: Bradley Farias <[email protected]> Reviewed-By: Geoffrey Booth <[email protected]>

jkrems requested review from bmeck and guybedford March 3, 2020 17:16

nodejs-github-bot added esm Issues and PRs related to the ECMAScript Modules implementation. process Issues and PRs related to the process subsystem. labels Mar 3, 2020

GeoffreyBooth reviewed Mar 3, 2020

View reviewed changes

doc/api/esm.md Show resolved Hide resolved

bmeck reviewed Mar 5, 2020

View reviewed changes

lib/internal/process/esm_loader.js Show resolved Hide resolved

guybedford reviewed Mar 9, 2020

View reviewed changes

jkrems added the modules-agenda Issues and PRs to discuss during the meetings of the Modules team. label Mar 9, 2020

mhdawson mentioned this pull request Mar 9, 2020

Node.js Modules Team Meeting 2020-03-11 nodejs/modules#494

Closed

jkrems removed the modules-agenda Issues and PRs to discuss during the meetings of the Modules team. label Mar 11, 2020

bmeck reviewed Mar 16, 2020

View reviewed changes

lib/internal/modules/esm/loader.js Outdated Show resolved Hide resolved

bmeck approved these changes Mar 16, 2020

View reviewed changes

jkrems force-pushed the bootstrap-hook branch from b2a6a6b to 9ea3e8b Compare March 16, 2020 16:13

jkrems added the author ready PRs that have at least one approval, no pending requests for changes, and a CI started. label Mar 16, 2020

GeoffreyBooth reviewed Mar 17, 2020

View reviewed changes

doc/api/esm.md Outdated Show resolved Hide resolved

lib/internal/modules/esm/loader.js Show resolved Hide resolved

module: add hook for global preload code

30f243f

jkrems force-pushed the bootstrap-hook branch from 9ea3e8b to 30f243f Compare March 17, 2020 21:15

GeoffreyBooth approved these changes Mar 17, 2020

View reviewed changes

bmeck pushed a commit that referenced this pull request Mar 23, 2020

module: add hook for global preload code

07a1fb9

PR-URL: #32068 Reviewed-By: Bradley Farias <[email protected]> Reviewed-By: Geoffrey Booth <[email protected]>

bmeck closed this Mar 23, 2020

MylesBorins pushed a commit that referenced this pull request Mar 24, 2020

module: add hook for global preload code

a3eaf39

PR-URL: #32068 Reviewed-By: Bradley Farias <[email protected]> Reviewed-By: Geoffrey Booth <[email protected]>

MylesBorins mentioned this pull request Mar 24, 2020

v13.12.0 proposal #32376

Merged

MylesBorins pushed a commit that referenced this pull request Mar 24, 2020

module: add hook for global preload code

7356c43

PR-URL: #32068 Reviewed-By: Bradley Farias <[email protected]> Reviewed-By: Geoffrey Booth <[email protected]>

MylesBorins pushed a commit to MylesBorins/node that referenced this pull request Apr 3, 2020

module: add hook for global preload code

07b77a3

PR-URL: nodejs#32068 Reviewed-By: Bradley Farias <[email protected]> Reviewed-By: Geoffrey Booth <[email protected]>

MylesBorins pushed a commit that referenced this pull request Apr 3, 2020

module: add hook for global preload code

62b5bd4

Backport-PR-URL: #32610 PR-URL: #32068 Reviewed-By: Bradley Farias <[email protected]> Reviewed-By: Geoffrey Booth <[email protected]>

codebytere mentioned this pull request Apr 4, 2020

v12.16.2 proposal #32313

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

module: add hook for global preload code #32068

module: add hook for global preload code #32068

jkrems commented Mar 3, 2020 •

edited

Loading

devsnek commented Mar 3, 2020

jkrems commented Mar 3, 2020

devsnek commented Mar 3, 2020

jkrems commented Mar 3, 2020

devsnek commented Mar 3, 2020 •

edited

Loading

jkrems commented Mar 3, 2020

devsnek commented Mar 3, 2020

jkrems commented Mar 3, 2020

devsnek commented Mar 3, 2020 •

edited

Loading

jkrems commented Mar 3, 2020

devsnek commented Mar 3, 2020

devsnek commented Mar 3, 2020 •

edited

Loading

jkrems commented Mar 3, 2020 •

edited by bmeck

Loading

devsnek commented Mar 3, 2020 •

edited

Loading

jkrems commented Mar 3, 2020 •

edited

Loading

devsnek commented Mar 3, 2020 •

edited

Loading

jkrems commented Mar 3, 2020

devsnek commented Mar 3, 2020

devsnek commented Mar 3, 2020 •

edited

Loading

jkrems commented Mar 3, 2020

devsnek commented Mar 3, 2020

bmeck commented Mar 3, 2020

jkrems commented Mar 3, 2020

bmeck commented Mar 3, 2020

guybedford left a comment

jkrems commented Mar 9, 2020

bmeck left a comment

nodejs-github-bot commented Mar 16, 2020

bmeck commented Mar 17, 2020

nodejs-github-bot commented Mar 20, 2020

bmeck commented Mar 23, 2020

module: add hook for global preload code #32068

module: add hook for global preload code #32068

Conversation

jkrems commented Mar 3, 2020 • edited Loading

Checklist

devsnek commented Mar 3, 2020

jkrems commented Mar 3, 2020

devsnek commented Mar 3, 2020

jkrems commented Mar 3, 2020

devsnek commented Mar 3, 2020 • edited Loading

jkrems commented Mar 3, 2020

devsnek commented Mar 3, 2020

jkrems commented Mar 3, 2020

devsnek commented Mar 3, 2020 • edited Loading

jkrems commented Mar 3, 2020

devsnek commented Mar 3, 2020

devsnek commented Mar 3, 2020 • edited Loading

jkrems commented Mar 3, 2020 • edited by bmeck Loading

devsnek commented Mar 3, 2020 • edited Loading

jkrems commented Mar 3, 2020 • edited Loading

devsnek commented Mar 3, 2020 • edited Loading

jkrems commented Mar 3, 2020

devsnek commented Mar 3, 2020

devsnek commented Mar 3, 2020 • edited Loading

jkrems commented Mar 3, 2020

devsnek commented Mar 3, 2020

bmeck commented Mar 3, 2020

jkrems commented Mar 3, 2020

bmeck commented Mar 3, 2020

guybedford left a comment

Choose a reason for hiding this comment

jkrems commented Mar 9, 2020

bmeck left a comment

Choose a reason for hiding this comment

nodejs-github-bot commented Mar 16, 2020

bmeck commented Mar 17, 2020

nodejs-github-bot commented Mar 20, 2020

bmeck commented Mar 23, 2020

jkrems commented Mar 3, 2020 •

edited

Loading

devsnek commented Mar 3, 2020 •

edited

Loading

devsnek commented Mar 3, 2020 •

edited

Loading

devsnek commented Mar 3, 2020 •

edited

Loading

jkrems commented Mar 3, 2020 •

edited by bmeck

Loading

devsnek commented Mar 3, 2020 •

edited

Loading

jkrems commented Mar 3, 2020 •

edited

Loading

devsnek commented Mar 3, 2020 •

edited

Loading

devsnek commented Mar 3, 2020 •

edited

Loading