Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Userland access to internalBinding (at one's own risk) #27061

Closed
bengl opened this issue Apr 2, 2019 · 34 comments
Closed

Userland access to internalBinding (at one's own risk) #27061

bengl opened this issue Apr 2, 2019 · 34 comments
Labels
feature request Issues that request new features to be added to Node.js.

Comments

@bengl
Copy link
Member

bengl commented Apr 2, 2019

At Intrinsic, we make use of process.binding('natives') in order to re-evaluate Node.js core modules inside our isolation environment. In order for these modules to work, we need to have the binding layer accessible. This is similar to how the natives module works.

The introduction of internalBinding was of no issue on its own, since we could just use process.binding to replicate its behaviour. However, there are now modules such as string_decoder that now having binding parts that are inacessible to userland. We can't simply require that users run with --expose-internals as our customers aren't always in control of CLI arguments. Also, our product is delivered as a Node.js module, rather than a separate binary, and we'd like to keep it that way.

Note: The natives module is subject to the same issue. Here is a quick example using RunKit.

We do have some workarounds involving re-implementing the exposed APIs, but we find this to be prone to errors and subject to extra maintenance for new versions of Node.js.

What we'd like to do is introduce a way of accessing internalBinding-provided code from userland. I know this seems counterintuitive, given the purpose of internalBinding, so it would make the most sense for it to be provided in C++ only, requiring some native code to actually get access to it. The difficulty in accessing it is an acknowledgement that we're not expecting the same level of support that the normal user-facing API has. (We would expect basically no support for the actual internal bindings, but we wouldn't expect this access to disappear.) We're certainly open to other suggestions, but we'd like a solution that's maintainable and sustainable.

What do folks think?

@devsnek devsnek added the feature request Issues that request new features to be added to Node.js. label Apr 2, 2019
@devsnek
Copy link
Member

devsnek commented Apr 2, 2019

i'm very -1 on exposing these bindings, the entire point of adding internalBinding was to prevent userland from accessing them. perhaps we can find another solution. what big picture functionality was process.binding giving you?

cc @addaleax @joyeecheung

@mscdex
Copy link
Contributor

mscdex commented Apr 2, 2019

-1 I agree it would be better to focus on actual needs rather than just blanket access to internals.

@bengl
Copy link
Member Author

bengl commented Apr 2, 2019

@devsnek @mscdex Our isolation layer sits at low level in order to better interpose on APIs. We evaluate code from process.binding('natives') (that is, stuff from the lib folder) inside our isolation environment. We provide shimmed versions of low-level functionality (such as process.binding itself) in order to enable these modules, as well as apply our fine-grained policies.

We're already shipping a product that uses process.binding for this purpose, so we're looking for a way to make this continue to work with the move toward internalBinding-only modules.

To reiterate, I'm not suggesting that anything added here would be any more supported than process.binding is today. That is, we'd be totally fine with any breakages of this API between releases (even patch releases).

@devsnek
Copy link
Member

devsnek commented Apr 2, 2019

why doesn't just replacing process.binding and the public interfaces work? the only way internal bindings are called is via the public interfaces anyway.

@bengl
Copy link
Member Author

bengl commented Apr 2, 2019

@devsnek We interpose at the low level in order to implement our fine-grained policies with less maintenance and performance overhead. Many of our policies are considerably easier to implement at that level due to normalization and the way some parts of the public API interact with others internally. Either performing those normalizations ourselves or re-implementing the interactions that already happen within the lib folder would result in unnecessary performance hits.

@devsnek
Copy link
Member

devsnek commented Apr 2, 2019

how exactly do you intercept these calls? if you're using stack traces/context tracking then some hooks for things like "on fs read" and "on socket open" might work.

@refack
Copy link
Contributor

refack commented Apr 2, 2019

As far as I remember the main issue we wanted to prevent was un-audited access to internals by 3rd party modules. That's why the only way to enable access was with a CLI flag. If you could think of a different way to give the process owner exclusive control, I think that is the main blocker.

@devsnek
Copy link
Member

devsnek commented Apr 2, 2019

There was also the issue of whenever we changed parts of the undocumented api, huge modules with millions of dependents would break and topple the ecosystem.

@bengl
Copy link
Member Author

bengl commented Apr 2, 2019

@devsnek We intercept these calls by construction: we re-evaluate lib folder code inside of our isolation contexts and intercept any function calls coming out of them. These isolation contexts are essential in order to maintain separation between differing sets of policies.

@devsnek
Copy link
Member

devsnek commented Apr 2, 2019

@bengl without more information on "isolation contexts" and "intercepting any function calls" and all of that i'm not sure how much help i personally can provide. is it possible to provide some more technical detail? (or even better, code)

@refack
Copy link
Contributor

refack commented Apr 2, 2019

huge modules with millions of dependents would break

Yes, that's what I meant with un-audited. The users of those modules (i.e. those how own the process) were not aware of the risk the module authors were taking on their behalf.

As I see it that's the problem with "at one's own risk", who is one ¯\(ツ)

@refack
Copy link
Contributor

refack commented Apr 2, 2019

Another idea that comes up once in a while is a drop-in modular stdlib. Write your own stdlib files, place them in a special place and the runtime will pick them up instead of the bundled ones. Again the key here is that only the owner can do that, and is aware of the risks and trade-offs.

@bengl
Copy link
Member Author

bengl commented Apr 3, 2019

Unfortunately I can't show code, since it's not an open source product. That being said, we use isolated JavaScript environments that do not include require or other Node.js things, and then we insert wrapped APIs deliberately, with our policy governing their actions. That's the environment that we evaluate the lib folder code in.

At this point, our (Intrinsic's) options are to either somehow have access to internalBinding or effectively the same functionality, or re-implement the bindings entirely ourselves. Re-implementing it ourselves at best introduces compatibility problems we're not comfortable with and at worst introduces performance hits our customers can't afford.

@refack Unfortunately owner-only solutions aren't really enough for us because we support environments where the people deploying Intrinsic aren't the process owners (e.g. serverless).

I certainly understand the reasoning behind introducing internalBinding, but since we need access to it in userland in order for our product to be performant, maintainable, and have the same feature set as it did before, I'm suggesting the introduction of an API that's more difficult to use than process.binding (C++ is a barrier for a lot of people), and even adding whatever warning messaging is desired in order to discourage people from using it en-masse.

@devsnek
Copy link
Member

devsnek commented Apr 3, 2019

@bengl is an environment a v8 isolate? a v8 context? something else? the more information we have the better we can help.

if you're using isolates, perhaps a new api on worker threads would work?

new Worker(path, {
  hooks: {
    // names and granularity tbd obviously
    fsOpen: (path) => { return true or false to allow operation },
    fsRead: (path) => {},
    ...
  },
});

we've also got policies in development which might internally need hooks that intrinsic could take advantage of (cc @bmeck on that)

@refack
Copy link
Contributor

refack commented Apr 3, 2019

@refack Unfortunately owner-only solutions aren't really enough for us because we support environments where the people deploying Intrinsic aren't the process owners (e.g. serverless).

Just thinking, do you have access to the top level package.json? Or to environment variables? (in short anything that is 1:1 with a process environment)

@bengl
Copy link
Member Author

bengl commented Apr 3, 2019

@devsnek We don't use Worker threads. You can think of our isolation contexts as being similar to v8 Contexts.

A global hook system (that's implemented at suitable places for our needs) seems helpful and may actually present a viable path, but there are some messy difficulties that would need to be dealt with like:

  • Performance impact when such hooks are disabled.
  • Having hook locations and signatures that allow for dealing with interactions between libraries (e.g. http uses net, https uses tls, etc.).

And I'm not sure these would end up being finished in a Node 12 timeframe.

@refack Read access for sure. We can instruct our customers to modify package.json but if possible we'd like to avoid that for our own UX. For environment variables, I know some environments do some filtering that might block it. Do you suppose there's something that can be signaled from a 3rd-party package, or is that the very thing you're trying to avoid here?

@joyeecheung
Copy link
Member

joyeecheung commented Apr 3, 2019

From what I can tell from the use cases, the requirements are:

  1. Access to a copy of native modules along with the bindings so that you can apply additional policy restrictions
  2. No additional CLI arguments should be required, the replacement should be done as a user land npm module

I don't think exposing internalBinding to general users is a good solution here, the primary purpose of this encapsulation is to prevent users from patching the original native modules (and, to a smaller extent, retrieving states of these) in core, but what you want is a copy of those, which is fine and not in conflict with the purpose of internalBinding.

I am wondering, is it possible to achieve what you need by using something similar to the node::LoadEnvironment methods we provide in node.h? It currently selects main scripts to execute based on parsed CLI arguments, but it's also possible to accept a script directly from user to execute here.

node/src/node.cc

Lines 449 to 455 in d5a5b99

void LoadEnvironment(Environment* env) {
CHECK(env->is_main_thread());
// TODO(joyeecheung): Not all of the execution modes in
// StartMainThreadExecution() make sense for embedders. Pick the
// useful ones out, and allow embedders to customize the entry
// point more directly without using _third_party_main.js
USE(StartMainThreadExecution(env));

The scripts are environment entry points that has access to internalBinding, and you can bootstrap from there to create your isolated environment. For reference, in the worker thread main script:

const {
threadId,
getEnvMessagePort
} = internalBinding('worker');

And this is what currently available to these special scripts (note that the require here can load internal scripts as well):

node/src/node.cc

Lines 374 to 394 in d5a5b99

MaybeLocal<Value> StartExecution(Environment* env, const char* main_script_id) {
EscapableHandleScope scope(env->isolate());
CHECK_NOT_NULL(main_script_id);
std::vector<Local<String>> parameters = {
env->process_string(),
env->require_string(),
env->internal_binding_string(),
FIXED_ONE_BYTE_STRING(env->isolate(), "markBootstrapComplete")};
std::vector<Local<Value>> arguments = {
env->process_object(),
env->native_module_require(),
env->internal_binding_loader(),
env->NewFunctionTemplate(MarkBootstrapComplete)
->GetFunction(env->context())
.ToLocalChecked()};
MaybeLocal<Value> result =
ExecuteBootstrapper(env, main_script_id, &parameters, &arguments);
return scope.EscapeMaybe(result);

An idea I have is to accept a function or a string containing script sources in node::LoadEnvironment (as shown in the TODO above) so the users can customize the entry point of Node.js with a bootstrapped environment. Although as of #26788, the environment created from node::CreateEnvironment also runs pre-execution for the main thread (including setups for policies, ESM loader hooks, preloaded CJS modules) for our cctest, but it's also possible to skip that and let the users decide what they want to call in the entry point.

@jasnell
Copy link
Member

jasnell commented Apr 3, 2019

I'm happy to see discussion happening rather than stopping with the initial -1's. The ability to intercept at a low level is important and necessary. Yes, it's also necessary that we started hiding things behind internalBinding to help stem off userland encroaching further into Node.js core internals. I like @bengl's original idea of a native API for installing an internal binding hook.

@devsnek
Copy link
Member

devsnek commented Apr 3, 2019

as long as this hook doesn't enable another library that exposes node internals, it seems fine. I'm just worried about a natives 2.0 module with like module.exports = require('./hook_internals.node')

@sam-github
Copy link
Contributor

it would make the most sense for it to be provided in C++ only, requiring some native code to actually get access to it

I think this specific information was missed in some of the conversation above. @bengl isn't requesting that js have access to internal bindings, but that there be a C++ API to do so. Also, he's OK if the API is unstable, ie, if it changes unexpectedly, as long as its usable and they can rework their own C++ to adapt to the various node.js versions.

I don't have comment on the particular form of that API, but I support exposing node in ways that allow innovative modules to do unexpected (by us) things, without forcing people to rebuild and redistribute their own node variant.

@bengl
Copy link
Member Author

bengl commented Apr 3, 2019

@joyeecheung This seems quite promising! It seems from what you're suggesting (please correct me if I'm wrong) that this would require that we create a new isolate/thread, initialize a node::Environment for it, and create our isolation contexts there. This might work, but we do provide access to things like process.[chdir|getuid|setuid|etc.]() from within our isolation contexts (governed by our policy of course!), and so we'd need access to those. As I understand, this is a (intentional) limitation of Worker threads, but not necessarily a problem if we do it all ourselves, right?

@bengl
Copy link
Member Author

bengl commented Apr 3, 2019

I forgot to mention: The way we compile our addon code, we can use headers in the src directory that aren't included in the Node.js release packages. If something is available there, we can use it in Intrinsic, but your average module on npm can't without precompiling using the whole src directory. @devsnek would this introduce enough of a barrier for natives-esque modules, while still enabling our use-case?

@joyeecheung
Copy link
Member

joyeecheung commented Apr 3, 2019

@bengl You can gain access to those just fine if the Environment is created with the kOwnsProcessState flag - this is not exposed to node.h at the moment, but we could take options to enable that in the implementation. You'll have to make sure that users cannot use the process in a way where two Environments that both own the process state run into races, of course. From the currently available APIs, only the main thread has this flag on while the Environments we create for workers have this off. This essentially gives you power to create a Worker-like environment from the C++ API but with less constraints.

@joyeecheung
Copy link
Member

joyeecheung commented Apr 3, 2019

@bengl If you have access to the internal headers (I believe it's also possible for the npm packages to do that if they download the correct headers from this repo and turn on NODE_WANT_INTERNALS though), I believe you can already try experimenting with node::Environment from env.h - you can create an Environment either with node::CreateEnvironment or copy code from there and delete the main thread execution bits if you don't need those, and then compile your entry point scripts similar to how it's done in node::StartExecution (but instead of reading source from a map in NativeModuleLoader, you provide your own strings to v8's script compiler), you can pass the require function and the internal binding loader to your script through env->native_module_require() and env->internal_binding_loader().

@bengl
Copy link
Member Author

bengl commented Apr 3, 2019

@joyeecheung If I'm understanding correctly, this still involves a separate isolate, right? Our customers in serverless environments might be impacted if we need to serialize in and out of the separate isolate due to the platform-provided entrypoint for inbound requests. Ccold-start times would also likely be impacted. Also, is any of this possible in Node 10? Right now we are working around it in Node 10 by re-implementing things.

How are people feeling about the idea of having some function exposing the internal bindings that isn't available in JS or in node.h but is provided by some header not included in the Node.js release packages?

@joyeecheung
Copy link
Member

joyeecheung commented Apr 4, 2019

this still involves a separate isolate, right?

Technically, you can create a new Environment from an existing isolate that's associated with another Environment, but I don't think we have ever tested this (cc @addaleax )

I think it is possible to run all your user code in a new Environment on another thread, while keeping the primary one for yourself and not running any user code there, then you can avoid context switching because the first isolate only exists to launch the other one, the feasibility of this depends on how your package is used though:

  • If it's a CLI, then sure, you are in full control of what is actually launched to execute user code.
  • if it's a preloaded package, then maybe (I am not sure if it's feasible to intercept all user input and redirect module loading there entirely).
  • If it's something that gets required by users explicitly, then no.

Also, is any of this possible in Node 10? Right now we are working around it in Node 10 by re-implementing things.

I do not think so, Node 10 is fairly old by master's standards, and LoadEnvironment/CreateEnvironment are more broken there (I am not sure who's actually using LoadEnvironment).

How are people feeling about the idea of having some function exposing the internal bindings that isn't available in JS or in node.h but is provided by some header not included in the Node.js release packages?

I think it is possible to store the internalBinding loader as a property behind a v8::Private in the global proxy (like what we currently do for things exported by per-context scripts), and add a function in a src/ header that allows you to retrieve this property from a given Node.js context, I am not sure if we should add cctest for that though, since that's not really something we are supposed to provide stability guarantee for, but it can be easily broken if we don't use it ourselves.

@bengl
Copy link
Member Author

bengl commented Apr 4, 2019

@joyeecheung First of all, thanks for your help here (same goes to others who have weighed in)!

I think it is possible to run all your user code in a new Environment on another thread, while keeping the primary one for yourself and not running any user code there, then you can avoid context switching because the first isolate only exists to launch the other one, the feasibility of this depends on how your package is used though.

Our package is something that gets required by users explicitly, and we'd like to keep it that way if at all possible. In serverless environments (e.g. Lambda), we don't even get to run before the platform (e.g. Amazon's code) has already set up inbound request handling on the main thread, so that's why I'm worried about serializing requests over to another thread/isolate. For Lambda, this may change with the Lambda Runtime API, but we also have other serverless platforms to support.

I do not think so, Node 10 is fairly old by master's standards ...

Whatever solution we land upon, if it doesn't make it into Node 10, that's probably okay, since we're already doing workarounds for Node 10, though they're not particularly ideal.

I think it is possible to store the internalBinding loader as a property behind a v8::Private in the global proxy (like what we currently do for things exported by per-context scripts), and add a function in a src/ header that allows you to retrieve this property from a given Node.js context.

This would be a great solution for us! Other folks in this thread: how does this sound? If no one objects to this, I can start working on a PR.

I am not sure if we should add cctest for that though, since that's not really something we are supposed to provide stability guarantee for, but it can be easily broken if we don't use it ourselves.

If it gets broken, we (Intrinsic) would find out fairly quickly and submit a PR to fix it. That being said if there are other people doing similar things to us (I'm not aware of any, but who knows!), it may be worth having a test for it.

@Trott
Copy link
Member

Trott commented Apr 5, 2019

I'm in favor of having some way to do this as long as it has a higher barrier to entry than process.binding().

@bmeck
Copy link
Member

bmeck commented Apr 5, 2019 via email

@joyeecheung
Copy link
Member

@bmeck To be fair..if they have the ability to access src headers with NODE_WANT_INTERNALS, then they already have access to env->internal_binding_loader() at least on master (and tons of other internals)

@bmeck
Copy link
Member

bmeck commented Apr 5, 2019 via email

@joyeecheung
Copy link
Member

joyeecheung commented Apr 5, 2019

@bmeck this is asking for v10 where internalBinding is not stored as a per-Environment strong persistent property (although it's possible to add that to v10 as well I believe, it's more like a matter of git conflicts because the per-Environment persistent list is quite different on v10).

@joyeecheung
Copy link
Member

joyeecheung commented Apr 5, 2019

Although, this reminds me...at least on master, one can already instantiate a new function with node::binding::GetInternalBinding, env->internal_binding_loader() is merely an instance wrapped with cache.

@Trott
Copy link
Member

Trott commented Oct 3, 2019

It seems like perhaps this should be closed. Feel free to re-open (or leave a comment requesting that it be re-opened) if you disagree. I'm just tidying up and not acting on a super-strong opinion or anything like that.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature request Issues that request new features to be added to Node.js.
Projects
None yet
Development

No branches or pull requests

9 participants