-
Notifications
You must be signed in to change notification settings - Fork 29.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Should internal modules be affected by monkey-patching? #18795
Comments
I think one of the most important points that you made in #18773 is:
That is, every time we move code from/to C++ we break any patching userland has for the said call. Now, another good thing to consider is that there have been cases where overriding built-ins has been tremendously useful when polyfilling - for example https://github.com/drses/weak-map/blob/master/weak-map.js (the WeakMap polyfill). |
if users want to patch fs.readFile or something that's cool (although I'm personally apposed to it without hooks) but changing Object.prototype.toString shouldnt change how fs.readFile works |
I'm in favor of Node core's internals being resistant to monkey patching. I don't think we need to expose the untampered functions to users though. |
Probably, but it's more subtle than that - going back to the the WeakMap example case - users changed Note: Another thing we can do is to explicitly specify that changing built ins is dangerous and doing so is not supported in Node.js . I'm not sure that's a good idea as it violates "least surprise" and the current behavior is unspecified anyway. Note that fixing this and ensuring builtins are used should be a pretty big code change and will create overhead in maintaining the code since basically built-in instance methods won't be used. If we do this we need to make sure it doesn't impact performance (or improves it, since technically it's provable which function it is now), and that it's not problematic maintenance wise. |
I feel like this is a deep rabbit hole. Anyone could override anything, so now any and all object/prototype methods need to be saved up front, which is crazy because there are so many and they're used throughout the codebase. For example: Even if it was decided it's a good idea to do this, we'd need to also maintain a giant list of methods we use anywhere in node core. |
@mscdex are you concerned about memory use or time spent upon startup? One way to do it is to simply shallow clone these prototype objects. |
@hashseed I'm not even considering performance implications at this point. Just having to touch almost every line of code in node core, getting new collaborators to understand the change, maintaining these methods/prototypes, etc. |
I wonder if there is a way from V8 land to "lock" the way Node core uses these objects - but I can't think of anything out of the box. |
I don't have a problem with that if it's for the greater good. Most of it is rote search-and-replace, perhaps even mechanical replace. |
Could make for a good Code-and-Learn exercise :-) |
My position on this aligns with @mscdex. There's some parts that are not completely unintuitive because people have likely seen Also, it's very likely that if we're going down this route, my preference is to try & find a path that leads to the core being able to lock or have our own versions of these primitives, as mentioned by @benjamingr.
I know this is a more complicated route (that might not be feasible with what we have available right now) but at least it would keep the code maintainable and easy to understand for new contributors. Having to write (And if we're comparing this to the C++ layer, having our own versions of Object, Promise, etc. would more closely resemble the C++ side than storing references to all the namespaced & prototype functions.) Also, to further elaborate, this would still need to go hand in hand with being more defensive on anything being passed in from the users. For example, if a user is allowed to pass in a settings object then we shouldn't assume that it has This aligns with the recent transition to using |
This does not work, if you take user-defined objects as arguments and leak internal versions via return values. |
As I mention above, we would still need to be more defensive for user-provided objects but that's a much lesser level of change than having to be defensive about the entirety of internal Object, String, etc. usage. User-provided callbacks were one of the main culprits and then we have some infrequent
Hence me saying this might not be possible currently. That said, if our internal versions could be frozen but without the currently present performance impact, then it would technically be possible, no? The fact is that it's hard to know where to draw the line. What about Buffer? It's something that's provided by node itself but it's crucial to the inner workings of at least half of core. And if Buffer is included, what else is? |
in my mind i kind of split this into two example cases: a) util.inspect getting property keys for A i would use unsaved prototype methods to allow users to heck with it, but for B there is no benefit to the repl (in fact it can become impossible to use) if someone overrides String.prototype.replace for some reason, so we would use a saved proto method here. like hashseed said its ok to continue to have planned monkey patching but it shouldn't just randomly happen. |
I think @hashseed's point is that this is generally underspecified in our whole API and that we should probably consider specifying this. (feel free to comment if I misrepresented your intention) That is, that an effort to go over Node's API in its entirety and specify (at least for internal use) what methods may impact it would be beneficial. I agree with that. I think we should also discuss how we should do this (split work?) and get some initial data.
I'm really not sure, I honestly can't think of a case where I'd want to override this but I wouldn't forsee the WeakMap for The issue is that this effects older versions of Node (new versions of Node rarely need polyfills). |
One other thing to keep in mind here is the non-trivial burden that going all in with this will creating on backporting changes to LTS lines if this mechanism is not also backported. |
I don't know how the shallow cloning of prototypes would work, but currently even if you save |
@mscdex |
Yes, this was kind of what I was hinting at earlier with having to touch the entire codebase. |
Right, and having to code like that would be pretty absurd IMHO, even if you wrote a wrapper for it. |
as a reference I created https://gist.github.com/devsnek/76aab1dcd169c96b952fbe8c74404475. around 350 functions are saved. L27 exposes it to users. |
We would also need to store the constructors to avoid |
Another way I can think of is to store copies of builtin methods under symbols. In internal modules you would invoke builtins via symbols. |
One possible approach that carries along a whole range of it's own complexities is to use a separate context (e.g. as in |
Unfortunately that does not work. When you call |
I just checked with @bmeurer that TurboFan handles const ReflectApply = Reflect.apply;
function uncurryThis(func) {
return (thisArg, ...args) => ReflectApply(func, thisArg, args);
}
var StringStartsWith = uncurryThis(String.prototype.startsWith);
var s = "Hi Matteo";
function test1() {
var r = 0;
console.time("monkey-patchable");
for (var i = 0; i < 1E7; i++) {
if (s.startsWith(i)) r++;
}
console.timeEnd("monkey-patchable");
return r;
}
function test2() {
var r = 0;
console.time("cached original");
for (var i = 0; i < 1E7; i++) {
if (StringStartsWith(s, i)) r++;
}
console.timeEnd("cached original");
return r;
}
test1();
test2(); Is it really that much more unreadable to rename the function and move the receiver to the first argument? You don't have to use some
I'd argue that you should be writing async/await in most cases anyways. |
I'm referring to primordials in the context of the Node helpers written to handle handle methods from unknown origins. The term used for those helpers was primordials. I pointed to the docs for the source of the name generics, not using the actual |
I think I'll pick this up at the next open source event we're doing and get 10 or so community members messing with functions and reporting how Node behaves in different scenarios - as well as searching GitHub and NPM for initial data. I think that should get us the initial data we need to present to the TSC in order to make a push for a more consistent API in terms of overriding builtins. This will likely be mid-late March (and sounds like a fun activity people unfamiliar with our code base can help with), if anyone has an earlier time frame feel free to "steal" this. |
I'm going to self assign, I don't think this needs TSC attention until we have said data but feel free to disagree or bring other opinions. |
@benjamingr is this something you're interested in doing at a community event? |
@devsnek still interested in this but have been a little overwhelmed. |
@hashseed thanks, that's useful. I should have probably gotten around to this by now but I ended up working on promise-use-cases in the last event. I'll do my best to get to this in the next event (June 16th). (Random note: I think domenic asked not to be pinged in the nodejs repos by the way - so we should probably not ping him without checking) |
That proposal isn’t solely necessary, fwiw - node could load a single file at the start of the process that caches all the intrinsics and provides a module to expose them internally. The challenge - with that or with domenic’s proposal - would be enforcing the pattern’s usage via linting and code review. |
@ljharb see domenic/get-originals#14 - I'm thinking about automatic conversion here rather than changing our code. The proposal is something we'd potentially want to use after we've mapped how our internals react to these changes. |
I'm not sure how you could reliably automatically convert prototype methods; babel has the same challenge. |
@ljharb I'm not sure what you mean? If you have a concern regarding automatic conversion please do speak up in the get-originals repo. |
Should this remain open? Or is this a discussion that has run its course and can be closed? |
I’d still like to see a policy decision that trends towards robustness, if possible. |
This patches changes the `safe_globals` internal module into a script that gets run during bootstrap and saves JavaScript builtins (primordials) into an object that is available for all other builtin modules to access lexically later. PR-URL: nodejs#25816 Refs: nodejs#18795 Reviewed-By: Bradley Farias <[email protected]> Reviewed-By: Anna Henningsen <[email protected]> Reviewed-By: Gus Caplan <[email protected]>
This patches changes the `safe_globals` internal module into a script that gets run during bootstrap and saves JavaScript builtins (primordials) into an object that is available for all other builtin modules to access lexically later. PR-URL: #25816 Refs: #18795 Reviewed-By: Bradley Farias <[email protected]> Reviewed-By: Anna Henningsen <[email protected]> Reviewed-By: Gus Caplan <[email protected]>
Given that there has not been any further discussion on this in 1.5 years and given that we've started moving to the use of |
Many internal modules call methods defined on prototype objects that can be modified (or monkey-patched) in userland. For example, by overriding
Object.prototype.getOwnProperty
, it is possible to affect async hooks.One way to fix this is to keep copies of these functions at boostrap, and use these untampered copies inside internal modules.
Optionally, these copies can be exported through a new internal module so that user code can get hold of untampered copies if they intend to.
Advantages of this approach:
Advantages of current code:
I think it is desirable to have consensus over whether internal modules should be affected by monkey-patching. This does not have to be a either/or decision.
Coming from the JavaScript spec, there are JavaScript builtins that are affected by monkey-patching too, intentionally. For example, by overriding
RegExp.prototype.exec
, the behavior of other RegExp builtins can be changed. Yet another example is the species constructor. Generally though, JavaScript builtins call into internal builtins that cannot be altered.I think it is important to clearly specify and document which global state (i.e. result of monkey-patching) affect behavior of internal modules rather than having implicit APIs though monkey-patching. Personally, I think that - unless specified otherwise - monkey-patching should not affect internal modules. It needs to be a conscious design choice.
FWIW this problem would become very apparent if Node.js had a formal specification for its internal modules :)
Ref: #18773
CC: @benjamingr @devsnek
The text was updated successfully, but these errors were encountered: