-
Notifications
You must be signed in to change notification settings - Fork 29.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
module: add hook for global preload code #32068
Conversation
this seems very hacky. can you go more into detail about use cases and why we need to pass around stings of code that need to end with functions? As far as I can tell, one could, for example, just import VM and run code in the top level of their loader, and that would do the same thing as this, without node having to expose such an odd API. |
One fundamental design decision implied here is: there are multiple people who want to move loader hooks to run on a separate thread. See the linked PR for the progress on that front. In that context I’m not sure how your suggestion of using the vm module inside of the loader would work. It would be a vm module in a different isolate without clear access to the global scope that the app is running in. There are two known alternatives for the current approach:
The use cases are instrumenting core modules and setting up any kind of global state required by code generated by other hooks. E.g. if transformSource assumes the existence on certain global runtime functions. Another use case is global attenuation / locking down the runtime before user code runs. Right now loader hooks may just rely on the fact that they happen to run in the same global scope as all other code but when moving to threads, that is no longer true. |
could we just allow hooks to explicitly load a module? we couldn't return the namespace in the threaded design but it would let them evaluate code in the main context in a much more natural way. |
Let's say I have a loader that wants to remove The point here is to allow running code before any normal application code runs. And in many cases in a way that doesn't leak into what the application sees. So an Yes, the code could be loaded from a file instead of from a string. But that wouldn't change the fact that it would have to be reliably triggered once before the user code runs. Another alternative that was discussed at some point was to say that we'll expect any loader that requires access to the global scope to use two separate flags: |
@jkrems export async function init(main) {
await main.import('delete-process');
}
this shouldn't ever be explicitly observable. only |
Not sure what So, your suggestion would be:
So it would be something like this? import MainContext from 'node-loader:main-context';
import { fileURLToPath } from 'url';
MainContext.runFileSync(fileURLToPath(new URL(import.meta.url, './preload.js'))); Where |
@jkrems i edited my comment a bit ago to address the multi-isolate case |
If during normal execution |
@jkrems i don't understand how you're observing that something was imported, esm is explicitly designed to preclude such a thing. (if our impl is leaking that's a side channel attack and we should fix it) |
Okay, so your code becomes more fully: export async function init(main) {
await main.runFileSync(fileURLToPath(new URL(import.meta.url, './preload.js')));
} This feels like a more complicated version of one of the alternatives I mentioned at the top: export async function getGlobalPreloadPath() {
return fileURLToPath(new URL(import.meta.url, './preload.js'));
} Which in turn is a slightly more verbose version of just inlining the preload code.
Since it was provided by the loader and would otherwise - generally - not exist, the fact that I can import the URL at all is information. import('file:///guessed/path/of/secret/agent/code.js').catch(err => {
// run evil code because the agent won't catch us.
}); Also, when trying to mess with global state, there was an explicit request for having something that can run sloppy mode code. Which makes modules a problematic choice. The design goal was that having the preload has no visible impact on user code until very explicitly opting into it. This rules out leaking into the module map or even having a true |
This is irrelevant, because the loader itself exists (i can just fs.stat for
import vm in your |
I want to really explicitly address this "module map" case. The problem you seem to be having is not that the file would be in the module cache (since that can't be observed) but rather that the file can be imported at all. That would seem to motivate inlining the code in your loader as a string, which is what you suggest up at the top, which is what prompts me to say "ok i'll just check if your loader exists then". This is what further motivates me to say the entire case is irrelevant, because then a loader author would just not make "the loader exists" part of their security assumptions (indeed, user code that wants to exploit the globals being mutable doesn't gain anything from knowing the make-everything-immutable loader exists) |
No you can't because the preload code ran and can do all kinds of things, including hiding certain APIs or patching them to prevent access to certain directories (or replace them with virtual/in-memory locations only visible to the user code). We can play through the cat-and-mouse here but the point will always remain: There are cases where the fact that a certain key in the module map is "blocked" leaks and no amount of trickery can completely hide that fact. I personally don't care about that leakage but it exists and afaik @bmeck and possibly others care.
Sure, that's possible. But it optimizes the API for... something. And I'm not sure what that something is. The current API is optimized for "patches and instrumentation that have minimal visible impact on what userland apps may see". Making it a module doesn't seem to offer any advantage, just making everything trickier to manage and harder to use..? |
As i've said multiple times now, this is false. There is no way to check if something is in the module map. |
Right, I disagreed and gave a pretty explicit example for doing it. If you want something more complete: // loader
fs.writeFileSync('/tmp/preload.mjs', PRELOAD);
MainContext.import('file:///tmp/preload.mjs');
fs.unlinkFileSync('/tmp/preload.mjs');
// how do I prevent the user code from checking that my preload code ran?
// how do I ensure that arbitrary code still works, even if it would want to load a different
// generated source text from /tmp/preload.mjs? |
@jkrems your loader can just route the module trying to import /tmp/preload.mjs to ERR_MODULE_NOT_FOUND. Modules in the cache are keyed by (referrer, specifier), not resolved url. In any case like i said, even if you delete preload.mjs, you can't delete your loader, and i can just check that the file exists, which leads to the conclusion i posted like an hour ago (which is don't base your security on the consumer not knowing you exist, trivially you can load up the inspector and iterate over every single script and module) |
Which would be... observable. That's the whole point. If the app writes its own Keep in mind that this is about the ideal of preventing observable effects on arbitrary code. I agree that in likely or frequent cases, it wouldn't be observed in practice. |
@jkrems its ERR_MODULE_NOT_FOUND because you deleted it, if they created their own it would resolve to that. |
If you somehow hide the fact that your loader does actually exist (which is a really dumb thing to base your security on, please don't), you can do |
How? If there already is an entry in the module map for
This isn't just about security. It's also about breaking random code which is something an APM-case for example would care about. |
the cache is not keyed by resolved url, it is keyed by (referrer, specifier) pairs. i just said this like two messages ago. |
I think the argument about loading I think the argument about wanting it to be a CJS or ESM module / anything on disk is likely not appealing as I cannot tell where the constraint that it needs to come off disk comes from. You can just use I need to think on the function completion value and what happens if async work does occur. I did have serious problems getting timing to work for async behaviors with the bootstrap while trying to make a I don't find the Module Map to be an issue as a loader can completely overwrite the behavior of loading stuff via Overall, I like this design as it enables a lot of things that are hard to manage while not being cluttered. The sloppy mode behavior is generally to enable the lexical contour of globals (welcome to hell) that some APIs do want to use and/or are specified as how standards define how things work. Some things also use it for various behavior that isn't present in strict mode such as As experimental, I think this API is fine and don't see any immediate changes needed as this is behind a flag and we can experiment with it in the wild some. |
This isn’t about the spec. It’s about the module map as it exists in node. It is keyed by URL unless there was a major change I missed recently. So as far as I know the cache is not keyed by these pairs. But happy to be proven wrong with links to nodejs source code showing otherwise. |
There are 2 levels of module mappings (I often call them global (node, per context) and local (ecma262, per source text)) please be specific so I don't accidentally get confused T_T. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for working on this, looks good in principle.
Updated:
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
looks ready to me
CI failure appears unrelated. If there are no objections I'd like to land this tomorrow and get back to #31229 which is pending this capability. |
PR-URL: #32068 Reviewed-By: Bradley Farias <[email protected]> Reviewed-By: Geoffrey Booth <[email protected]>
Landed in 07a1fb9 |
PR-URL: #32068 Reviewed-By: Bradley Farias <[email protected]> Reviewed-By: Geoffrey Booth <[email protected]>
PR-URL: #32068 Reviewed-By: Bradley Farias <[email protected]> Reviewed-By: Geoffrey Booth <[email protected]>
PR-URL: nodejs#32068 Reviewed-By: Bradley Farias <[email protected]> Reviewed-By: Geoffrey Booth <[email protected]>
Backport-PR-URL: #32610 PR-URL: #32068 Reviewed-By: Bradley Farias <[email protected]> Reviewed-By: Geoffrey Booth <[email protected]>
This work is meant to unblock progress on moving hook execution out of the main thread (and global scope). It creates an official place where hooks may interact with the global scope, hopefully allowing us to remove the need for random global side effects elsewhere.
So far this only applies the preload once before the initial root module is executed. It felt a bit too opinionated to also apply it to all vm-created contexts etc.. I assume that if somebody wants that for their hook, they could patch
vm.createContext
and friends to do it. This matches the power of the top-level code in today's loaders that also won't run for every newly created context.context
argument.See: #31229
/cc @nodejs/modules-active-members
Checklist
make -j4 test
(UNIX), orvcbuild test
(Windows) passes