Should internal modules be affected by monkey-patching? #18795

hashseed · 2018-02-15T12:41:43Z

Many internal modules call methods defined on prototype objects that can be modified (or monkey-patched) in userland. For example, by overriding Object.prototype.getOwnProperty, it is possible to affect async hooks.

One way to fix this is to keep copies of these functions at boostrap, and use these untampered copies inside internal modules.

Optionally, these copies can be exported through a new internal module so that user code can get hold of untampered copies if they intend to.

Advantages of this approach:

The way internal modules are implemented remain implementation details and can be changed without risking breaking user code that somehow rely on monkey-patching.
Internal modules behave predictably.
Usage of JavaScript builtins no longer introduce hidden APIs to override behavior to internal modules.

Advantages of current code:

Internal modules can be monkey-patched, e.g. for instrumentation.
Internal modules implementation are easier to read.
No additional overhead during bootstrap.

I think it is desirable to have consensus over whether internal modules should be affected by monkey-patching. This does not have to be a either/or decision.

Coming from the JavaScript spec, there are JavaScript builtins that are affected by monkey-patching too, intentionally. For example, by overriding RegExp.prototype.exec, the behavior of other RegExp builtins can be changed. Yet another example is the species constructor. Generally though, JavaScript builtins call into internal builtins that cannot be altered.

I think it is important to clearly specify and document which global state (i.e. result of monkey-patching) affect behavior of internal modules rather than having implicit APIs though monkey-patching. Personally, I think that - unless specified otherwise - monkey-patching should not affect internal modules. It needs to be a conscious design choice.

FWIW this problem would become very apparent if Node.js had a formal specification for its internal modules :)

Ref: #18773
CC: @benjamingr @devsnek

The text was updated successfully, but these errors were encountered:

benjamingr · 2018-02-15T13:48:56Z

I think one of the most important points that you made in #18773 is:

All APIs that V8 offer in C++ use the original builtin. Including v8::Object::GetOwnPropertyNames. Let's say, just hypothetically, Node.js core wants to move the implementation of some functions from JS to C++, it then must not use v8::Object::GetOwnPropertyNames, but instead load and call global.Object.getOwnPropertyNames. Otherwise it would observably change behavior.

That is, every time we move code from/to C++ we break any patching userland has for the said call.

Now, another good thing to consider is that there have been cases where overriding built-ins has been tremendously useful when polyfilling - for example https://github.com/drses/weak-map/blob/master/weak-map.js (the WeakMap polyfill).

devsnek · 2018-02-15T13:54:45Z

if users want to patch fs.readFile or something that's cool (although I'm personally apposed to it without hooks) but changing Object.prototype.toString shouldnt change how fs.readFile works

cjihrig · 2018-02-15T13:56:09Z

I'm in favor of Node core's internals being resistant to monkey patching. I don't think we need to expose the untampered functions to users though.

benjamingr · 2018-02-15T13:57:03Z

@devsnek

if users want to patch fs.readFile or something that's cool (although I'm personally apposed to it without hooks) but changing Object.prototype.toString shouldnt change how far.readFile works

Probably, but it's more subtle than that - going back to the the WeakMap example case - users changed GetOwnPropertyNames since a new ("hidden") key was added to an object. That key had to be "secret" and not impact other methods - so if util.inspect was "tamper proof" then it would have been impossible to polyfill WeakMap entirely.

Note: Another thing we can do is to explicitly specify that changing built ins is dangerous and doing so is not supported in Node.js .

I'm not sure that's a good idea as it violates "least surprise" and the current behavior is unspecified anyway. Note that fixing this and ensuring builtins are used should be a pretty big code change and will create overhead in maintaining the code since basically built-in instance methods won't be used.

If we do this we need to make sure it doesn't impact performance (or improves it, since technically it's provable which function it is now), and that it's not problematic maintenance wise.

mscdex · 2018-02-15T14:34:20Z

I feel like this is a deep rabbit hole. Anyone could override anything, so now any and all object/prototype methods need to be saved up front, which is crazy because there are so many and they're used throughout the codebase. For example: string.trim(), string.toLowerCase(), string.slice(), Object.keys(), Number.isNaN(), etc.

Even if it was decided it's a good idea to do this, we'd need to also maintain a giant list of methods we use anywhere in node core.

hashseed · 2018-02-15T14:37:03Z

@mscdex are you concerned about memory use or time spent upon startup? One way to do it is to simply shallow clone these prototype objects.

mscdex · 2018-02-15T14:58:58Z

@hashseed I'm not even considering performance implications at this point. Just having to touch almost every line of code in node core, getting new collaborators to understand the change, maintaining these methods/prototypes, etc.

benjamingr · 2018-02-15T15:01:27Z

I wonder if there is a way from V8 land to "lock" the way Node core uses these objects - but I can't think of anything out of the box.

bnoordhuis · 2018-02-15T15:06:22Z

Just having to touch almost every line of code in node core

I don't have a problem with that if it's for the greater good. Most of it is rote search-and-replace, perhaps even mechanical replace.

jasnell · 2018-02-15T15:09:37Z

Could make for a good Code-and-Learn exercise :-)

apapirovski · 2018-02-15T15:20:24Z

My position on this aligns with @mscdex. There's some parts that are not completely unintuitive because people have likely seen hasOwnProperty.call(obj, 'something') before, but then there's stuff like Promise.prototype.then monkey-patching and needing to have PromiseThen that is honestly going to be nightmare to explain to new contributors.

Also, it's very likely that uncurryThis carries a performance penalty so switching over the entirety of core to use it will lead to a broad performance penalty for the whole of Node.js.

if we're going down this route, my preference is to try & find a path that leads to the core being able to lock or have our own versions of these primitives, as mentioned by @benjamingr.

I wonder if there is a way from V8 land to "lock" the way Node core uses these objects - but I can't think of anything out of the box.

I know this is a more complicated route (that might not be feasible with what we have available right now) but at least it would keep the code maintainable and easy to understand for new contributors. Having to write PromiseThen(promise, fn); is — to me — not an appealing proposition.

(And if we're comparing this to the C++ layer, having our own versions of Object, Promise, etc. would more closely resemble the C++ side than storing references to all the namespaced & prototype functions.)

Also, to further elaborate, this would still need to go hand in hand with being more defensive on anything being passed in from the users. For example, if a user is allowed to pass in a settings object then we shouldn't assume that it has hasOwnProperty or that it even has a non-null prototype.

This aligns with the recent transition to using Reflect.apply on user provided callbacks.

hashseed · 2018-02-15T15:30:48Z

having our own versions of Object, Promise, etc. would more closely resemble the C++ side than storing references to all the namespaced & prototype functions

This does not work, if you take user-defined objects as arguments and leak internal versions via return values.

apapirovski · 2018-02-15T15:36:21Z

This does not work, if you take user-defined objects as arguments

As I mention above, we would still need to be more defensive for user-provided objects but that's a much lesser level of change than having to be defensive about the entirety of internal Object, String, etc. usage. User-provided callbacks were one of the main culprits and then we have some infrequent hasOwnProperty access, but not much more than that.

leak internal versions via return values.

Hence me saying this might not be possible currently. That said, if our internal versions could be frozen but without the currently present performance impact, then it would technically be possible, no?

The fact is that it's hard to know where to draw the line. What about Buffer? It's something that's provided by node itself but it's crucial to the inner workings of at least half of core. And if Buffer is included, what else is?

devsnek · 2018-02-15T15:52:38Z

in my mind i kind of split this into two example cases:

a) util.inspect getting property keys
b) repl using string replacement

for A i would use unsaved prototype methods to allow users to heck with it, but for B there is no benefit to the repl (in fact it can become impossible to use) if someone overrides String.prototype.replace for some reason, so we would use a saved proto method here.

like hashseed said its ok to continue to have planned monkey patching but it shouldn't just randomly happen.

benjamingr · 2018-02-15T16:00:28Z

in my mind i kind of split this into two example cases:

I think @hashseed's point is that this is generally underspecified in our whole API and that we should probably consider specifying this. (feel free to comment if I misrepresented your intention)

That is, that an effort to go over Node's API in its entirety and specify (at least for internal use) what methods may impact it would be beneficial. I agree with that.

I think we should also discuss how we should do this (split work?) and get some initial data.

b) repl using string replacement

I'm really not sure, I honestly can't think of a case where I'd want to override this but I wouldn't forsee the WeakMap for util.inspect either.

The issue is that this effects older versions of Node (new versions of Node rarely need polyfills).

jasnell · 2018-02-15T16:09:15Z

One other thing to keep in mind here is the non-trivial burden that going all in with this will creating on backporting changes to LTS lines if this mechanism is not also backported.

mscdex · 2018-02-15T16:25:15Z

I don't know how the shallow cloning of prototypes would work, but currently even if you save hasOwnProperty early, someone could still override Function.prototype.call after the fact and still completely disrupt your hasOwnProperty.call(...).

benjamingr · 2018-02-15T16:26:34Z

@mscdex call(hasOwnProperty, ...) :D Although I see your point

mscdex · 2018-02-15T16:26:40Z

One other thing to keep in mind here is the non-trivial burden that going all in with this will creating on backporting changes to LTS lines if this mechanism is not also backported.

Yes, this was kind of what I was hinting at earlier with having to touch the entire codebase.

mscdex · 2018-02-15T16:27:33Z

@mscdex call(hasOwnProperty, ...) :D Although I see your point

Right, and having to code like that would be pretty absurd IMHO, even if you wrote a wrapper for it.

devsnek · 2018-02-15T16:57:54Z

as a reference I created https://gist.github.com/devsnek/76aab1dcd169c96b952fbe8c74404475. around 350 functions are saved. L27 exposes it to users.

apapirovski · 2018-02-15T17:03:14Z

We would also need to store the constructors to avoid global.Array = function() {/* user code */} and pass them into the internal wrappers.

hashseed · 2018-02-15T17:59:58Z

Another way I can think of is to store copies of builtin methods under symbols. In internal modules you would invoke builtins via symbols.

jasnell · 2018-02-15T18:12:59Z

One possible approach that carries along a whole range of it's own complexities is to use a separate context (e.g. as in vm.runInNewContext()) for internal code vs. user code. While it would be disruptive in a number of cases, it would prevent a number of issues being discussed here.

hashseed · 2018-02-15T18:24:54Z

One possible approach that carries along a whole range of it's own complexities is to use a separate context

Unfortunately that does not work. When you call arg.getOwnProperty, you call the one passed with the arg object on the prototype, not the one on the object prototype on from the current context.

hashseed · 2018-02-16T08:46:22Z

I just checked with @bmeurer that TurboFan handles Reflect.apply. With the following micro-benchmark I observe some regression on Node 8, but no difference at all on current Node master.

const ReflectApply = Reflect.apply;

function uncurryThis(func) {
  return (thisArg, ...args) => ReflectApply(func, thisArg, args);
}
var StringStartsWith = uncurryThis(String.prototype.startsWith);

var s = "Hi Matteo";
function test1() {
  var r = 0;
  console.time("monkey-patchable");
  for (var i = 0; i < 1E7; i++) {
    if (s.startsWith(i)) r++;
  }
  console.timeEnd("monkey-patchable");
  return r;
}

function test2() {
  var r = 0;
  console.time("cached original");
  for (var i = 0; i < 1E7; i++) {
    if (StringStartsWith(s, i)) r++;
  }
  console.timeEnd("cached original");
  return r;
}

test1();
test2();

Is it really that much more unreadable to rename the function and move the receiver to the first argument? You don't have to use some .call pattern here, and there seems to be no performance penalty.

Having to write PromiseThen(promise, fn); is — to me — not an appealing proposition.

I'd argue that you should be writing async/await in most cases anyways.

devsnek · 2018-02-28T15:54:17Z

In the context of this conversation i think primordials are the big global objects that javascript (and we) create (Array, Symbol) and generics are the methodology from js1.6 where you could use Array.every (which doesn't exist anymore) on anything iterable. In this case generics are going to be much slower than the uncurryThis we are looking at currently because v8 appears has specific perf case for uncurryThis and generics have to work with many types calling down to other behaviors

jdalton · 2018-02-28T15:58:01Z

I'm referring to primordials in the context of the Node helpers written to handle handle methods from unknown origins. The term used for those helpers was primordials. I pointed to the docs for the source of the name generics, not using the actual Array.every (that doesn't exist in v8). I'm saying I implemented generics in the same way as uncurryThis. Who knew the name would throw folks off here 😋

benjamingr · 2018-02-28T16:27:45Z

I think I'll pick this up at the next open source event we're doing and get 10 or so community members messing with functions and reporting how Node behaves in different scenarios - as well as searching GitHub and NPM for initial data.

I think that should get us the initial data we need to present to the TSC in order to make a push for a more consistent API in terms of overriding builtins.

This will likely be mid-late March (and sounds like a fun activity people unfamiliar with our code base can help with), if anyone has an earlier time frame feel free to "steal" this.

benjamingr · 2018-02-28T16:28:06Z

I'm going to self assign, I don't think this needs TSC attention until we have said data but feel free to disagree or bring other opinions.

devsnek · 2018-04-22T08:03:08Z

@benjamingr is this something you're interested in doing at a community event?

benjamingr · 2018-04-22T08:13:00Z

@devsnek still interested in this but have been a little overwhelmed.

hashseed · 2018-05-31T08:23:02Z

There is a proposal by @domenic that would address this.

benjamingr · 2018-05-31T08:29:29Z

@hashseed thanks, that's useful. I should have probably gotten around to this by now but I ended up working on promise-use-cases in the last event. I'll do my best to get to this in the next event (June 16th).

(Random note: I think domenic asked not to be pinged in the nodejs repos by the way - so we should probably not ping him without checking)

ljharb · 2018-05-31T15:28:27Z

That proposal isn’t solely necessary, fwiw - node could load a single file at the start of the process that caches all the intrinsics and provides a module to expose them internally. The challenge - with that or with domenic’s proposal - would be enforcing the pattern’s usage via linting and code review.

benjamingr · 2018-06-01T07:06:16Z

@ljharb see domenic/get-originals#14 - I'm thinking about automatic conversion here rather than changing our code.

The proposal is something we'd potentially want to use after we've mapped how our internals react to these changes.

ljharb · 2018-06-01T07:10:05Z

I'm not sure how you could reliably automatically convert prototype methods; babel has the same challenge.

benjamingr · 2018-06-01T07:13:53Z

@ljharb I'm not sure what you mean? If you have a concern regarding automatic conversion please do speak up in the get-originals repo.

Trott · 2018-10-26T04:18:23Z

Should this remain open? Or is this a discussion that has run its course and can be closed?

ljharb · 2018-10-26T04:52:02Z

I’d still like to see a policy decision that trends towards robustness, if possible.

This patches changes the `safe_globals` internal module into a script that gets run during bootstrap and saves JavaScript builtins (primordials) into an object that is available for all other builtin modules to access lexically later. PR-URL: #25816 Refs: #18795 Reviewed-By: Bradley Farias <[email protected]> Reviewed-By: Anna Henningsen <[email protected]> Reviewed-By: Gus Caplan <[email protected]>

jasnell · 2020-06-26T00:26:30Z

Given that there has not been any further discussion on this in 1.5 years and given that we've started moving to the use of primordials and protecting key paths from monkeypatching, I believe this issue can be closed.

hashseed changed the title ~~Should internal modules be affected by monkey-patching.~~ Should internal modules be affected by monkey-patching? Feb 15, 2018

hashseed added discuss Issues opened for discussions and feedbacks. lib / src Issues and PRs related to general changes in the lib or src directory. labels Feb 15, 2018

devsnek mentioned this issue Feb 17, 2018

Consider freezing Object.prototype for security reasons #18839

Closed

benjamingr self-assigned this Feb 28, 2018

devsnek mentioned this issue Mar 4, 2018

Don't use userobject.call/userobject.apply #12956

Closed

fhinkel mentioned this issue Mar 17, 2018

lib: refactor prototype and Function.call.bind usage #18773

Closed

2 tasks

devsnek mentioned this issue Apr 22, 2018

prototypes and libraries are unsafe inside node core libs #17434

Closed

This was referenced Apr 29, 2018

esm: provide named exports for all builtin libraries #18131

Closed

repl: no RegExp side effects for static properties #18937

Closed

benjamingr mentioned this issue May 31, 2018

Automatic conversion of ""platform"" code domenic/get-originals#14

Open

refack mentioned this issue Sep 25, 2018

util: extract uncurryThis function for reuse #23081

Closed

2 tasks

joyeecheung mentioned this issue Jan 30, 2019

lib: save primordials during bootstrap and use it in builtins #25816

Closed

2 tasks

szmarczak mentioned this issue Jan 6, 2020

Make .globalAgent use cacheable-lookup szmarczak/cacheable-lookup#4

Closed

jasnell closed this as completed Jun 26, 2020

heri16 mentioned this issue Aug 2, 2020

Node.js Integration zerotier/libzt#10

Open

joyeecheung mentioned this issue Sep 12, 2023

Removing primordials from Node.js project nodejs/TSC#1438

Closed

benjamingr mentioned this issue Sep 13, 2023

New Strategic Initiative on Primordials nodejs/TSC#1439

Open

Should internal modules be affected by monkey-patching? #18795

Should internal modules be affected by monkey-patching? #18795

Comments

hashseed commented Feb 15, 2018

benjamingr commented Feb 15, 2018 • edited Loading

devsnek commented Feb 15, 2018 • edited Loading

cjihrig commented Feb 15, 2018

benjamingr commented Feb 15, 2018

mscdex commented Feb 15, 2018 • edited Loading

hashseed commented Feb 15, 2018

mscdex commented Feb 15, 2018 • edited Loading

benjamingr commented Feb 15, 2018

bnoordhuis commented Feb 15, 2018

jasnell commented Feb 15, 2018

apapirovski commented Feb 15, 2018 • edited Loading

hashseed commented Feb 15, 2018

apapirovski commented Feb 15, 2018

devsnek commented Feb 15, 2018 • edited Loading

benjamingr commented Feb 15, 2018 • edited Loading

jasnell commented Feb 15, 2018

mscdex commented Feb 15, 2018 • edited Loading

benjamingr commented Feb 15, 2018

mscdex commented Feb 15, 2018

mscdex commented Feb 15, 2018 • edited Loading

devsnek commented Feb 15, 2018 • edited Loading

apapirovski commented Feb 15, 2018

hashseed commented Feb 15, 2018

jasnell commented Feb 15, 2018

hashseed commented Feb 15, 2018

hashseed commented Feb 16, 2018

devsnek commented Feb 28, 2018 • edited Loading

jdalton commented Feb 28, 2018 • edited Loading

benjamingr commented Feb 28, 2018

benjamingr commented Feb 28, 2018 • edited Loading

devsnek commented Apr 22, 2018

benjamingr commented Apr 22, 2018

hashseed commented May 31, 2018 • edited Loading

benjamingr commented May 31, 2018

ljharb commented May 31, 2018

benjamingr commented Jun 1, 2018

ljharb commented Jun 1, 2018

benjamingr commented Jun 1, 2018

Trott commented Oct 26, 2018

ljharb commented Oct 26, 2018

jasnell commented Jun 26, 2020

benjamingr commented Feb 15, 2018 •

edited

Loading

devsnek commented Feb 15, 2018 •

edited

Loading

mscdex commented Feb 15, 2018 •

edited

Loading

mscdex commented Feb 15, 2018 •

edited

Loading

apapirovski commented Feb 15, 2018 •

edited

Loading

devsnek commented Feb 15, 2018 •

edited

Loading

benjamingr commented Feb 15, 2018 •

edited

Loading

mscdex commented Feb 15, 2018 •

edited

Loading

mscdex commented Feb 15, 2018 •

edited

Loading

devsnek commented Feb 15, 2018 •

edited

Loading

devsnek commented Feb 28, 2018 •

edited

Loading

jdalton commented Feb 28, 2018 •

edited

Loading

benjamingr commented Feb 28, 2018 •

edited

Loading

hashseed commented May 31, 2018 •

edited

Loading