-
Notifications
You must be signed in to change notification settings - Fork 29.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
vm.Script could be used to hide the source by shipping only bytecode. #11842
Comments
@hashseed it sounds like we can provide the source so that things like debuggers can show the source though? I think showing the source can be useful, but avoiding extra parsing and compilation costs would be good. |
I was just pointing out the possibility of hiding the source, if required by use case. If the source is available, then there is not much difference to what |
The checksum is really important here I think, for transparency. Say in an open source situation, you publish the byte code with the original code, a collision free checksum provides a gaurantee that the byte code is true to the source. Is there a way to do this without a dummy checksum, and using a strong hash for a legit checksum? |
The header of the code cache contains a bunch of different fields that has to match: V8 version, source length, command line flags, etc. There is also a checksum over the payload. But that's intended for error correction, not security. It uses a Fletcher's checksum, so fairly easy to find a collision for. |
What about switching to a Blake2b hash? That’s very fast (faster than MD5) and as hard to find collisions in as SHA2 (i.e., impossible). |
I'll just put this here and walk away...
|
Might be worth experimenting with. But you'd still need a safe way to store/transmit the checksum. As mentioned, the current checksum is to detect accidental data corruption only.
What I'm pointing out here is precisely how someone could implement something similar to .pyc for Node. |
CC @indutny because he tried similar in long past. My understanding was
that in the past speed gains were lost due to fs checks. @yang it sounds
like things have changed so bytecode has more than the old cached data?
Part of the problem was .js files tend to be rather small.
…On Apr 14, 2017 10:52 AM, "Yang Guo" ***@***.***> wrote:
What about switching to a Blake2b hash? That’s very fast (faster than MD5)
and as hard to find collisions in as SHA2 (i.e., impossible).
Might be worth experimenting with. But you'd still need a safe way to
store/transmit the checksum.
As mentioned, the current checksum is to detect accidental data corruption
only.
.pyc
What I'm pointing out here is precisely how someone could implement
something similar to .pyc for Node.
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#11842 (comment)>, or mute
the thread
<https://github.com/notifications/unsubscribe-auth/AAOUo8iNPqNR9RssCEkEprxSLqxnkaYAks5rv5Y1gaJpZM4McnVM>
.
|
JS code is usually the smallest representation. Bytecode take less space than native code, but still larger than JS source on average. Code caching for individual files has been implemented about two years ago in V8. Prior to bytecode however the source still needs to be available for parsing when code is being recompiled for optimization. Turbofan can create its graph from bytecode though, so no source necessary anymore. I may be wrong, but I think @indutny's experiments were way before the code cache, and was about putting code into V8's startup snapshot. However, the startup serializer/deserializer had many limitations back then, which were fine for V8's default startup snapshot, but did not work for arbitrary code. |
Should this remain open? |
@Trott No bandwidth currently to move it, but still relevant and comes up on social media somewhat often |
Another application of this would be the ability to allow pre-compiled code to be sent between processes via IPC. Of course the "cached code object" would need to be serialized, but even so, it would likely be faster than passing the original source code to the target process and recompiling it there (unfortunately, there's no practical way to test--that I can think of--with the way vm.Script currently works). |
The discussion seems to have quieted down a bit. Closing. We can reopen this some time later. |
I was playing with this idea this morning, and @bmeck asked me to put this into words.
vm.Script
can be used to produce a cache into a buffer, and also used to load from existing cache produced earlier. With Ignition (bytecode interpreter) launched in V8, we could "abuse" it to only ship bytecode and hide the source.The thing with Ignition is: once a function has been compiled, we don't need the source code anymore. The optimizing compiler can construct the graph from bytecode alone. So a script can be fully shipped as bytecode. There are a couple things missing though. For a proof-of-concept these issues can be hacked before actually thinking about changing the V8 API to accommodate.
--serialize_eager
that you could turn on to force eager compilation if a code cache is being created.vm.Script
expects the script source to be provided always. With Ignition we don't actually need it, but we have a checksum when deserializing to check that the source matches expectation. The checksum is simply the script length at this point. So an empty string with the same length would do.Function.prototype.toString()
would just show a window from whatever the dummy source was provided. Duh.Once these issues are solved, you could ship bytecode and hide the source, without worrying about crashing the optimizing compiler.
Oh and this would only work on versions where V8 uses Ignition. For example at this shameless plug.
The text was updated successfully, but these errors were encountered: