vm.Script could be used to hide the source by shipping only bytecode. #11842

hashseed · 2017-03-14T14:02:32Z

I was playing with this idea this morning, and @bmeck asked me to put this into words.

vm.Script can be used to produce a cache into a buffer, and also used to load from existing cache produced earlier. With Ignition (bytecode interpreter) launched in V8, we could "abuse" it to only ship bytecode and hide the source.

The thing with Ignition is: once a function has been compiled, we don't need the source code anymore. The optimizing compiler can construct the graph from bytecode alone. So a script can be fully shipped as bytecode. There are a couple things missing though. For a proof-of-concept these issues can be hacked before actually thinking about changing the V8 API to accommodate.

Eager compilation: every function must be compiled to bytecode already. V8 doesn't do that out of the box, but there is a command line flag called --serialize_eager that you could turn on to force eager compilation if a code cache is being created.
The source. vm.Script expects the script source to be provided always. With Ignition we don't actually need it, but we have a checksum when deserializing to check that the source matches expectation. The checksum is simply the script length at this point. So an empty string with the same length would do.
Platform dependency. V8's serializer simply walks and serializes the object graph. In case of code cache, we walk the object graph of the function (SharedFunctionInfo). Depending on whether the platform is 32 or 64 bit, the object layout is different, and the code cache would look different. I'm not 100% sure whether x64 and arm64 would produce the same code cache, either.
Version dependency: V8's bytecode is purely internal, and not versioned. So for a different version of V8, the bytecode needs to be recompiled.
Function.prototype.toString() would just show a window from whatever the dummy source was provided. Duh.

Once these issues are solved, you could ship bytecode and hide the source, without worrying about crashing the optimizing compiler.

Oh and this would only work on versions where V8 uses Ignition. For example at this shameless plug.

The text was updated successfully, but these errors were encountered:

bmeck · 2017-03-14T14:05:32Z

@hashseed it sounds like we can provide the source so that things like debuggers can show the source though? I think showing the source can be useful, but avoiding extra parsing and compilation costs would be good.

hashseed · 2017-03-14T14:07:38Z

I was just pointing out the possibility of hiding the source, if required by use case. If the source is available, then there is not much difference to what vm.Script already does now, except for maybe forced eager compilation.

davidmarkclements · 2017-03-14T15:04:27Z

The checksum is really important here I think, for transparency. Say in an open source situation, you publish the byte code with the original code, a collision free checksum provides a gaurantee that the byte code is true to the source. Is there a way to do this without a dummy checksum, and using a strong hash for a legit checksum?

hashseed · 2017-03-14T15:13:22Z

The header of the code cache contains a bunch of different fields that has to match: V8 version, source length, command line flags, etc. There is also a checksum over the payload. But that's intended for error correction, not security. It uses a Fletcher's checksum, so fairly easy to find a collision for.

DemiMarie · 2017-04-10T21:11:28Z

What about switching to a Blake2b hash? That’s very fast (faster than MD5) and as hard to find collisions in as SHA2 (i.e., impossible).

refack · 2017-04-14T15:45:29Z

I'll just put this here and walk away...

`.pyc`

hashseed · 2017-04-14T15:51:46Z

What about switching to a Blake2b hash? That’s very fast (faster than MD5) and as hard to find collisions in as SHA2 (i.e., impossible).

Might be worth experimenting with. But you'd still need a safe way to store/transmit the checksum.

As mentioned, the current checksum is to detect accidental data corruption only.

.pyc

What I'm pointing out here is precisely how someone could implement something similar to .pyc for Node.

bmeck · 2017-04-14T15:57:01Z

CC @indutny because he tried similar in long past. My understanding was that in the past speed gains were lost due to fs checks. @yang it sounds like things have changed so bytecode has more than the old cached data? Part of the problem was .js files tend to be rather small.

…

On Apr 14, 2017 10:52 AM, "Yang Guo" ***@***.***> wrote: What about switching to a Blake2b hash? That’s very fast (faster than MD5) and as hard to find collisions in as SHA2 (i.e., impossible). Might be worth experimenting with. But you'd still need a safe way to store/transmit the checksum. As mentioned, the current checksum is to detect accidental data corruption only. .pyc What I'm pointing out here is precisely how someone could implement something similar to .pyc for Node. — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#11842 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AAOUo8iNPqNR9RssCEkEprxSLqxnkaYAks5rv5Y1gaJpZM4McnVM> .

hashseed · 2017-04-14T16:24:27Z

JS code is usually the smallest representation. Bytecode take less space than native code, but still larger than JS source on average.

Code caching for individual files has been implemented about two years ago in V8. Prior to bytecode however the source still needs to be available for parsing when code is being recompiled for optimization. Turbofan can create its graph from bytecode though, so no source necessary anymore.

I may be wrong, but I think @indutny's experiments were way before the code cache, and was about putting code into V8's startup snapshot. However, the startup serializer/deserializer had many limitations back then, which were fine for V8's default startup snapshot, but did not work for arbitrary code.

Trott · 2017-08-02T19:38:33Z

Should this remain open?

bmeck · 2017-08-02T19:44:49Z

@Trott No bandwidth currently to move it, but still relevant and comes up on social media somewhat often

Eric24 · 2017-09-25T03:02:19Z

Another application of this would be the ability to allow pre-compiled code to be sent between processes via IPC. Of course the "cached code object" would need to be serialized, but even so, it would likely be faster than passing the original source code to the target process and recompiling it there (unfortunately, there's no practical way to test--that I can think of--with the way vm.Script currently works).

TimothyGu · 2018-02-01T00:21:48Z

The discussion seems to have quieted down a bit. Closing.

We can reopen this some time later.

addaleax added the vm Issues and PRs related to the vm subsystem. label Mar 14, 2017

addaleax mentioned this issue Apr 14, 2017

Discussion: v8 / nodejs relationship - The way to constant improvment nodejs/CTC#100

Closed

This was referenced Apr 14, 2017

AST, IR, bytecode and friends - standards and inovation nodejs/CTC#104

Closed

build: allow easier checking of permanent deoptimizations #12456

Merged

Published or Release Build Feature for Node.js Web application nodejs/help#576

Closed

TimothyGu added the discuss Issues opened for discussions and feedbacks. label Feb 1, 2018

TimothyGu closed this as completed Feb 1, 2018

cunev mentioned this issue Jun 27, 2022

Support caching/emitting bytecode denoland/deno#3335

Closed

rattrayalex mentioned this issue Sep 2, 2022

Option to get bytecode after V8's JIT has had time to optimize hot codepaths? bytenode/bytenode#193

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

vm.Script could be used to hide the source by shipping only bytecode. #11842

vm.Script could be used to hide the source by shipping only bytecode. #11842

hashseed commented Mar 14, 2017 •

edited

Loading

bmeck commented Mar 14, 2017

hashseed commented Mar 14, 2017

davidmarkclements commented Mar 14, 2017 •

edited

Loading

hashseed commented Mar 14, 2017

DemiMarie commented Apr 10, 2017

refack commented Apr 14, 2017

hashseed commented Apr 14, 2017

bmeck commented Apr 14, 2017 via email

hashseed commented Apr 14, 2017

Trott commented Aug 2, 2017

bmeck commented Aug 2, 2017

Eric24 commented Sep 25, 2017

TimothyGu commented Feb 1, 2018

vm.Script could be used to hide the source by shipping only bytecode. #11842

vm.Script could be used to hide the source by shipping only bytecode. #11842

Comments

hashseed commented Mar 14, 2017 • edited Loading

bmeck commented Mar 14, 2017

hashseed commented Mar 14, 2017

davidmarkclements commented Mar 14, 2017 • edited Loading

hashseed commented Mar 14, 2017

DemiMarie commented Apr 10, 2017

refack commented Apr 14, 2017

.pyc

hashseed commented Apr 14, 2017

bmeck commented Apr 14, 2017 via email

hashseed commented Apr 14, 2017

Trott commented Aug 2, 2017

bmeck commented Aug 2, 2017

Eric24 commented Sep 25, 2017

TimothyGu commented Feb 1, 2018

hashseed commented Mar 14, 2017 •

edited

Loading

davidmarkclements commented Mar 14, 2017 •

edited

Loading

`.pyc`