Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

bootstrap: include bootstrapped Environment in builtin snapshot #32984

Closed
wants to merge 13 commits into from

Conversation

joyeecheung
Copy link
Member

@joyeecheung joyeecheung commented Apr 21, 2020

This was the WIP I had been working on to give my updates in #17058 and promised to post in #26382 (comment) back in last December - I was not planning to open this until I get the issues mentioned in #17058 (comment) fixed in V8, but then #32761 came along with a significantly different design. So I rebased this against master and expanded the snapshot to include bootstrap/node.js for comparison, and post it here to demonstrate the alternative design decisions I suggested in #32761.

Similar to the MVP approach taken by #27321, this now contains only the essential code for getting builtin snapshot shipped in Node.js core - I tried my best to split the snapshot integration into smaller commits that do this step by step. User land snapshot integration would need more yaks shaved to move forward due to the natural constraints of the snapshots, so I left a few comments in the places where code need to be expanded when we do implement user land snapshot integration. We'll need to give more thoughts take care of the uncertainty of the object graph brought by user land snapshotting - in particular, how to properly reset the connection between JS and C++ states, as well as the C++ states and the external system states.

This also includes some new ideas I got after the discussions in #32761.

I've recorded the background and analysis of different options for these design decisions in https://docs.google.com/document/d/15bu038I36oILq5t4Qju1sS2nKudVB6NSGWz00oD48Q8/edit?usp=sharing some of them are implemented here, some in #32761. I think it might be better to keep design discussions in the doc instead of in PRs - so that we focus more on the pros and cons of these options from a higher level. A lot of the decisions are influenced by the previous discussions and prototypes in #17058

cc @addaleax (thanks for the patience for going over the design options with me and the inspiration for some of the ideas here!)

src: split the main context initialization from Environemnt ctor

So that it's possible to create an Environment not yet attached
to any V8 context. We'll use this to deserialize the V8 context
before attaching an Environment to it.

src: add an ExternalReferenceRegistry class

Add an ExternalReferenceRegistry class for registering static
external references.

To register the external JS to C++ references created in a binding
(e.g. when a FunctionTemplate is created):

  • Add the binding name (same as the id used for internalBinding()
    and NODE_MODULE_CONTEXT_AWARE_INTERNAL) to
    EXTERNAL_REFERENCE_BINDING_LIST in src/node_external_reference.h.

  • In the file where the binding is implemented, create a registration
    function to register the static C++ references (e.g. the C++
    functions in v8::FunctionCallback associated with the function
    templates), like this:

    void RegisterExternalReferences(
            ExternalReferenceRegistry* registry) {
      registry->Register(cpp_func_1);
    }
  • At the end of the file where NODE_MODULE_CONTEXT_AWARE_INTERNAL is
    also usually called, register the registration function with

    NODE_MODULE_EXTERNAL_REFERENCE(binding_name,
                                   RegisterExternalReferences);
    

tools: enable Node.js command line flags in node_mksnapshot

Pass the flags down to node_mksnapshot so that we can use them
when generating the snapshot (e.g. to debug or enable V8 flags)

src: snapshot Environment upon instantiation

This includes the initial Environment (without running bootstrap
scripts) into the builtin snapshot

src: make code cache test work with snapshots

Keep track of snapshotted modules in JS land, and move
bootstrap switches into StartExecution() so that
they are not included into part of the environment-independent
bootstrap process.

src: snapshot loaders

This runs lib/internal/bootstrap/loaders.js before creating
the builtin snapshot and deserialize the loaders from the
snapshot in deserialization mode.

src: reset zero fill toggle at pre-execution

The connection between the JS land zero fill toggle and the
C++ one in the NodeArrayBufferAllocator gets lost if the toggle
is deserialized from the snapshot, because V8 owns the underlying
memory of this toggle. This resets the connection at pre-execution.

bootstrap: build fast APIs in pre-execution

Fast APIs need to work with ArrayBuffers which we need
to rebuild connections to after deserializing them
from the snapshot. For now, postpone their creation
until pre-execution to simplify the process.

lib: initialize instance members in class constructors

Since V8 snapshot does not currently support instance member
initialization, initialize them in ordianry class constructors
for now so that these classes can be included in the snapshot.
This may be reverted once
https://bugs.chromium.org/p/v8/issues/detail?id=10704
is fixed and backported.

src: snapshot node

This runs lib/internal/bootstrap/node.js before creating
the builtin snapshot and deserialize the loaders from the
snapshot in deserialization mode.

  • make -j4 test (UNIX), or vcbuild test (Windows) passes
  • tests and/or benchmarks are included
  • documentation is changed or added
  • commit message follows commit guidelines

@joyeecheung joyeecheung added the blocked PRs that are blocked by other issues or PRs. label Apr 21, 2020
@nodejs-github-bot nodejs-github-bot added the lib / src Issues and PRs related to general changes in the lib or src directory. label Apr 21, 2020
@joyeecheung

This comment has been minimized.

@joyeecheung
Copy link
Member Author

joyeecheung commented Apr 21, 2020

I removed https://chromium-review.googlesource.com/c/v8/v8/+/2143983 in #32761 and this time the performance difference between the two prototype disappeared - I guess the performance difference should come with rehashing, then (which is the price we will have to pay either way, for security reasons) - because once it's rehashable, everything (not just the additional maps/sets) gets rehashed.. So the performance of the two approaches should be comparable.

                                                                                   confidence improvement accuracy (*)   (**)  (***)
 misc/startup.js mode='process' script='benchmark/fixtures/require-cachable' dur=1                -0.49 %       ±1.26% ±1.68% ±2.18%
 misc/startup.js mode='process' script='test/fixtures/semicolon' dur=1                            -0.67 %       ±1.87% ±2.50% ±3.25%

src/aliased_buffer.h Outdated Show resolved Hide resolved
src/aliased_buffer.h Outdated Show resolved Hide resolved
src/node.cc Outdated Show resolved Hide resolved
@nodejs-github-bot
Copy link
Collaborator

@nodejs-github-bot
Copy link
Collaborator

@nodejs-github-bot
Copy link
Collaborator

@joyeecheung joyeecheung changed the title WIP: include Environment in snapshot bootstrap: include Environment in snapshot May 8, 2020
@nodejs-github-bot
Copy link
Collaborator

@joyeecheung
Copy link
Member Author

https://chromium-review.googlesource.com/c/v8/v8/+/2143983 landed in the upstream, so this should work now. I'll open a separate PR for the V8 cherry picks.

@addaleax I removed a bunch of stuff not essential for the builtin snapshot (in particular, there's no need to take care of the BaseObject or any other embedder fields other than the Env pointer in Context for now), and split the snapshotting to smaller steps so that they are easy to revert / change. I think this is the minimal code we need to get the builtin snapshot shipped.

With NODE_DEBUG_NATIVE=mksnapshot this prints the following initializer list (also used for compilation) at startup which I think is readable enough.

{
// -- native_modules begins --
{
  "buffer",
  "events",
  "internal/assert",
  "internal/async_hooks",
  "internal/bootstrap/loaders",
  "internal/bootstrap/node",
  "internal/bootstrap/switches/does_own_process_state",
  "internal/bootstrap/switches/is_main_thread",
  "internal/buffer",
  "internal/console/constructor",
  "internal/console/global",
  "internal/constants",
  "internal/encoding",
  "internal/errors",
  "internal/fixed_queue",
  "internal/linkedlist",
  "internal/priority_queue",
  "internal/process/execution",
  "internal/process/per_thread",
  "internal/process/promises",
  "internal/process/signal",
  "internal/process/task_queues",
  "internal/process/warning",
  "internal/querystring",
  "internal/timers",
  "internal/url",
  "internal/util",
  "internal/util/debuglog",
  "internal/util/inspect",
  "internal/util/inspector",
  "internal/util/types",
  "internal/validators",
  "path",
  "timers",
},
// -- native_modules ends --
// -- async_hooks begins --
{
  0,  // async_ids_stack
  1,  // fields
  2,  // async_id_fields
  3,  // execution_async_resources
},
// -- async_hooks ends --
{ 5 },  // tick_info
{ 4 },  // immediate_info
// -- performance_state begins --
{
  6,  // root
  7,  // milestones
  8,  // observers
},
// -- performance_state ends --
9,  // stream_base_state
10,  // should_abort_on_uncaught_toggle
// -- persistent_templates begins --
{
  { "async_wrap_ctor_template", 0, 283 },
  { "async_wrap_object_ctor_template", 1, 284 },
  { "binding_data_ctor_template", 2, 285 },
  { "i18n_converter_template", 14, 286 },
  { "promise_wrap_template", 18, 287 },
},
// persistent_templates ends --
// -- persistent_values begins --
{
  { "async_hooks_after_function", 0, 11 },
  { "async_hooks_before_function", 1, 12 },
  { "async_hooks_binding", 2, 13 },
  { "async_hooks_destroy_function", 3, 14 },
  { "async_hooks_init_function", 4, 15 },
  { "async_hooks_promise_resolve_function", 5, 16 },
  { "buffer_prototype_object", 6, 17 },
  { "enhance_fatal_stack_after_inspector", 10, 18 },
  { "enhance_fatal_stack_before_inspector", 11, 19 },
  { "internal_binding_loader", 26, 20 },
  { "immediate_callback_function", 27, 21 },
  { "inspector_console_extension_installer", 28, 22 },
  { "native_module_require", 30, 23 },
  { "prepare_stack_trace_callback", 33, 24 },
  { "process_object", 34, 25 },
  { "primordials", 35, 26 },
  { "promise_reject_callback", 36, 27 },
  { "tick_callback_function", 39, 28 },
  { "timers_callback_function", 40, 29 },
  { "trace_category_state_function", 42, 30 },
  { "url_constructor_function", 44, 31 },
},
// -- persistent_values ends --
32,  // context
}

@joyeecheung joyeecheung changed the title bootstrap: include Environment in snapshot bootstrap: include bootstrapped Environment in builtin snapshot May 8, 2020
Copy link
Member

@addaleax addaleax left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let me make my objections here explicit:

  1. This PR does not extend well to userland snapshots. I think I’ve made it clear that that’s the big issue here for me, and I don’t see how this PR has changed in any way to address that.
  2. The added value of the intermediate representation objects is unclear, and they seem like they add a bunch of unnecessary complexity.
  3. I think the need to explicitly keep a list of all external reference entries is a code smell that we should get rid off.

@joyeecheung
Copy link
Member Author

This PR does not extend well to userland snapshots. I think I’ve made it clear that that’s the big issue here for me, and I don’t see how this PR has changed in any way to address that.

Can you elaborate on specifically where this makes user land snapshot unfeasible? This does what's necessary for builtin snapshots, and leave the user land snapshot as a TODO - the embdder field serialization can be added later when we do have embdder fields in the snapshot, and code can be added to convert the immediate representation between binary blobs instead of code snippets, I don't think any choices done here would make the user snapshot unfeasible. I can start a separate branch for a proof-of-concept user land snapshot based on this, but at the mean time, I would like to see things land one step at at time - the snapshot effort had been very gradual, and I had been trying to focus on what's essential for the time being in my commits without adding too much machinery that may or may not get in the way instead of helping for the next step.

The added value of the intermediate representation objects is unclear, and they seem like they add a bunch of unnecessary complexity.

For me they help clarifying "the schema in which these objects are serialized" and additionally, you can compare what's in the intermediate representation and what's in the actual object to find out what's not going to be serialized. They also provide a side-effect-free deserialization phase before we could unfold them into live objects. I see the snapshot as part of the program configuration, and IMO the deserialization of the configuration into meaningful constructs should be better kept side-effect-free and state-free, and separate from the steps that you actually use them to configure the system. I think the complexities worth it in this case, and they are not too much.

I think the need to explicitly keep a list of all external reference entries is a code smell that we should get rid off.

Are you talking about the list of bindings, or the list of pointer registrations in each binding?

@joyeecheung joyeecheung marked this pull request as ready for review May 8, 2020 04:32
@addaleax
Copy link
Member

addaleax commented May 9, 2020

This PR does not extend well to userland snapshots. I think I’ve made it clear that that’s the big issue here for me, and I don’t see how this PR has changed in any way to address that.

Can you elaborate on specifically where this makes user land snapshot unfeasible?

I don’t think it makes userland snapshotting unfeasible, and I didn’t claim that that is the case. My point is that the static serialization and deserialization here will require a non-trivial amount of extra work for implementing userland snapshotting, which is a drawback that #32761 does not have.

The added value of the intermediate representation objects is unclear, and they seem like they add a bunch of unnecessary complexity.

For me they help clarifying "the schema in which these objects are serialized" and additionally, you can compare what's in the intermediate representation and what's in the actual object to find out what's not going to be serialized. They also provide a side-effect-free deserialization phase before we could unfold them into live objects. I see the snapshot as part of the program configuration, and IMO the deserialization of the configuration into meaningful constructs should be better kept side-effect-free and state-free, and separate from the steps that you actually use them to configure the system. I think the complexities worth it in this case, and they are not too much.

Okay, I guess I’m okay with simply disagreeing here. This isn’t the main objection that I’m having anyway.

I think the need to explicitly keep a list of all external reference entries is a code smell that we should get rid off.

Are you talking about the list of bindings, or the list of pointer registrations in each binding?

The former of the two. I don’t see any issues with the way the individual pointers are registered.

@joyeecheung
Copy link
Member Author

I don’t think it makes userland snapshotting unfeasible, and I didn’t claim that that is the case.

Ah, sorry for the misunderstanding

My point is that the static serialization and deserialization here will require a non-trivial amount of extra work for implementing userland snapshotting, which is a drawback that #32761 does not have.

What do you mean by "static serialization and deserialization"? Emitting C++ snippets? That's not intended to extend to user land snapshotting, it's just something that user land snapshotting can build upon (while loosing "what you see is what you get" of the snapshot data since we wouldn't know what we'd see in user snapshots), but even it does (just to provide another option of bundling snapshots inside an executable) can you elaborate on what makes you think this would make the extra work non-trivial? (e.g. is there a specific use case?) I have given some thought into what a user land snapshot prototype would look like and so far I don't see where this would be significant, as far as I can tell the possible schema of the snapshot should be pretty deterministic and limited.

The former of the two. I don’t see any issues with the way the individual pointers are registered.

I guess I will just have to disagree here, I find it less easy to follow if the external reference registration works significantly different from how NODE_MODULE_CONTEXT_AWARE_INTERNAL works, and I don't consider this a code smell since this pattern has already been widely used in the code base as well as in V8 - using inconsistent machineries to make things work would be more confusing IMO.

@joyeecheung
Copy link
Member Author

BTW this will still be blocked while I working on a reland of https://chromium-review.googlesource.com/c/v8/v8/+/2192657

Copy link
Member

@addaleax addaleax left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am disheartened by how everything went here, including the review process on #32761, the way in which this resulted in competition and duplicate effort rather than collaboration, and, to be honest, the fact that the TSC as a whole doesn’t even care enough to weigh in more than an absolutely minimal way.

@joyeecheung, I’m not sure if you feel the same way, but I think this has been frustrating enough for both of us, and I don’t really want to add any more frustration to that, not for you and certainly not for myself. I’ll close #32761 and clear the way for this PR so that we can hopefully move on in some way, even if this PR doesn’t make the tradeoffs that make more sense to me.

@jasnell jasnell added the tsc-agenda Issues and PRs to discuss during the meetings of the TSC. label May 15, 2020
@jasnell
Copy link
Member

jasnell commented May 15, 2020

Before this moves forward I'd like to have a tsc discussion about the two prs. I've put it on the next @nodejs/tsc agenda

@ruyadorno
Copy link
Member

This don't land cleanly on v14.x should it be backported?

cjihrig pushed a commit that referenced this pull request Jul 23, 2020
So that it's possible to create an Environment not yet attached
to any V8 context. We'll use this to deserialize the V8 context
before attaching an Environment to it.

PR-URL: #32984
Reviewed-By: Anna Henningsen <[email protected]>
Reviewed-By: Daniel Bevenius <[email protected]>
cjihrig pushed a commit that referenced this pull request Jul 23, 2020
Add an ExternalReferenceRegistry class for registering static
external references.

To register the external JS to C++ references created in a binding
(e.g. when a FunctionTemplate is created):

- Add the binding name (same as the id used for `internalBinding()`
  and `NODE_MODULE_CONTEXT_AWARE_INTERNAL`) to
  `EXTERNAL_REFERENCE_BINDING_LIST` in `src/node_external_reference.h`.
- In the file where the binding is implemented, create a registration
  function to register the static C++ references (e.g. the C++
  functions in `v8::FunctionCallback` associated with the function
  templates), like this:

  ```c++
  void RegisterExternalReferences(
          ExternalReferenceRegistry* registry) {
    registry->Register(cpp_func_1);
  }
  ```
- At the end of the file where `NODE_MODULE_CONTEXT_AWARE_INTERNAL` is
  also usually called, register the registration function with

  ```
  NODE_MODULE_EXTERNAL_REFERENCE(binding_name,
                                 RegisterExternalReferences);
  ```

PR-URL: #32984
Reviewed-By: Anna Henningsen <[email protected]>
Reviewed-By: Daniel Bevenius <[email protected]>
cjihrig pushed a commit that referenced this pull request Jul 23, 2020
Pass the flags down to node_mksnapshot so that we can use them
when generating the snapshot (e.g. to debug or enable V8 flags)

PR-URL: #32984
Reviewed-By: Anna Henningsen <[email protected]>
Reviewed-By: Daniel Bevenius <[email protected]>
cjihrig pushed a commit that referenced this pull request Jul 23, 2020
This includes the initial Environment (without running bootstrap
scripts) into the builtin snapshot

PR-URL: #32984
Reviewed-By: Anna Henningsen <[email protected]>
Reviewed-By: Daniel Bevenius <[email protected]>
cjihrig pushed a commit that referenced this pull request Jul 23, 2020
Keep track of snapshotted modules in JS land, and move
bootstrap switches into StartExecution() so that
they are not included into part of the environment-independent
bootstrap process.

PR-URL: #32984
Reviewed-By: Anna Henningsen <[email protected]>
Reviewed-By: Daniel Bevenius <[email protected]>
cjihrig pushed a commit that referenced this pull request Jul 23, 2020
This runs `lib/internal/bootstrap/loaders.js` before creating
the builtin snapshot and deserialize the loaders from the
snapshot in deserialization mode.

PR-URL: #32984
Reviewed-By: Anna Henningsen <[email protected]>
Reviewed-By: Daniel Bevenius <[email protected]>
cjihrig pushed a commit that referenced this pull request Jul 23, 2020
The connection between the JS land zero fill toggle and the
C++ one in the NodeArrayBufferAllocator gets lost if the toggle
is deserialized from the snapshot, because V8 owns the underlying
memory of this toggle. This resets the connection at pre-execution.

PR-URL: #32984
Reviewed-By: Anna Henningsen <[email protected]>
Reviewed-By: Daniel Bevenius <[email protected]>
cjihrig pushed a commit that referenced this pull request Jul 23, 2020
Fast APIs need to work with ArrayBuffers which we need
to rebuild connections to after deserializing them
from the snapshot. For now, postpone their creation
until pre-execution to simplify the process.

PR-URL: #32984
Reviewed-By: Anna Henningsen <[email protected]>
Reviewed-By: Daniel Bevenius <[email protected]>
cjihrig pushed a commit that referenced this pull request Jul 23, 2020
Since V8 snapshot does not currently support instance member
initialization, initialize them in ordianry class constructors
for now so that these classes can be included in the snapshot.
This may be reverted once
https://bugs.chromium.org/p/v8/issues/detail?id=10704
is fixed and backported.

PR-URL: #32984
Reviewed-By: Anna Henningsen <[email protected]>
Reviewed-By: Daniel Bevenius <[email protected]>
cjihrig pushed a commit that referenced this pull request Jul 23, 2020
This runs `lib/internal/bootstrap/node.js` before creating
the builtin snapshot and deserialize the loaders from the
snapshot in deserialization mode.

PR-URL: #32984
Reviewed-By: Anna Henningsen <[email protected]>
Reviewed-By: Daniel Bevenius <[email protected]>
addaleax pushed a commit that referenced this pull request Jul 28, 2020
Since V8 snapshot does not currently support instance member
initialization, initialize them in ordianry class constructors
for now so that these classes can be included in the snapshot.
This may be reverted once
https://bugs.chromium.org/p/v8/issues/detail?id=10704
is fixed and backported.

PR-URL: #32984
Reviewed-By: Anna Henningsen <[email protected]>
Reviewed-By: Daniel Bevenius <[email protected]>
@ruyadorno ruyadorno mentioned this pull request Jul 28, 2020
@MylesBorins
Copy link
Contributor

MylesBorins commented Aug 17, 2020

Quick ping about backporting this. It looks like one of the commits made it onto 14.x, 3024927, which we may want to revert if we don't backport the entier thing

@addaleax
Copy link
Member

which we want to revert if we don't backport the entier thing

Shouldn’t be necessary as there is no conflict there, only missing files

@codebytere
Copy link
Member

@joyeecheung should this go to v12.x?

@ExE-Boss
Copy link
Contributor

ExE-Boss commented Jan 3, 2021

The delete bufferBinding.zeroFill statement in setupBuffer in lib/internal/bootstrap/node.js is now no longer necessary:

delete bufferBinding.zeroFill;

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
lib / src Issues and PRs related to general changes in the lib or src directory. process Issues and PRs related to the process subsystem. tsc-agenda Issues and PRs to discuss during the meetings of the TSC.
Projects
None yet
Development

Successfully merging this pull request may close these issues.