-
Notifications
You must be signed in to change notification settings - Fork 179
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
NodeJS multi-thread support #64
Comments
As I said yesterday. "There is a remaining issue with Workers, and it's being worked on." In other words, the problem is recognized and even understood, so you don't have to spend time on PoC, but be a little bit patient:-) Meanwhile |
Thanks! Would you mind sharing a bit about your understatement of the problem?
A dirty work-around that seems to work consistently is to duplicate the .node file so each file is required by only one thread |
The problem is two-fold, a) initialization of SWIG private structures is not MT-safe, b) it's not sufficient to make init context-aware, it has to be even isolate-aware. First problem manifests as crashes at |
@dapplion, could you test swig/swig#1973? |
Didn't had a chance to test yet. Will update then, let me know if you have any updates too. Thanks! |
Early tests seem to work fine! Thank you so much for pushing this 🎉 Will work on POC inside Lodestar to confirm it's all good |
@dot-asm Now that we have unlock the gates of parallelization, do you think it would be possible to send a pairing context between Javascript workers? The current SWIG produced
|
This is what I customarily call a "wrong-attitude" question:-) When it comes to software, question shouldn't be if something is possible, but how to achieve the goal. So let's start with the goal. Or rather take next step from stated goal to send pairing context between workers. What is possible to send? I mean I don't actually know. Is it possible to send --- a/bindings/node.js/blst.hpp.ts
+++ b/bindings/node.js/blst.hpp.ts
@@ -190,7 +190,8 @@ export interface Pairing {
aug?: bytes
): BLST_ERROR;
commit(): void;
- merge(ctx: Pairing): BLST_ERROR;
+ asArrayBuffer(): ArrayBuffer;
+ merge(ctx: Pairing | ArrayBuffer): BLST_ERROR;
finalverify(sig?: PT): boolean;
}
One can probably even say that |
Primitive types, objects and arrays of primitive types and TypedArray instances (Buffer, ArrayBuffer, Uint8Array, etc). Stricter definition is arguments consumable by this function https://nodejs.org/api/v8.html#v8_v8_serialize_value
Yes!
This API sketch would work well yes. Some questions to study the viability
To preface my question I'm wondering what's the optimal strategy to validate a large enough amount of signatures in Javascript / Typescript. A naive approach is to just slice the signature set and divide into workers, then add the resulting booleans. However, seeing that Go, Rust and Nim use the same strategy it would be worth studying the viability: thus my petition get the Pairing context as raw bytes. It's important to note that in Javascript the communication between threads has significant lag, between 0.2 - 0.05 ms one way. If the size of the message is big enough serialization is slow and become a bottleneck too. However a Maybe a better strategy would be to do the multi-threading in C land, inside the binding, where inter-thread communication is orders of magnitude faster and sharing memory is okay. So, do you know if it's possible to generate native binding for Node with SWIG capable of doing multi-threading themselves? |
Then add this at the end of SWIGJAVASCRIPT_V8 section in blst.swg.
If you confirm that it's working as expected, I'll push to the repository...
I don't quite follow. If you want to be in the business of counting CPU cycles, how come you ended up on Javascript side? :-) :-) :-) But on serious note, I can't really afford delving into this to the depth you might expect. As you can see [above], the overhead cost is allocating some memory from V8 heap plus memcpy, and checking type and casting on the receiving side. And that's about as far as I'd go.
It's always ~3K. Formally speaking one can truncate |
🙃🙃 I'll just say that I'm learning Rust.. In a serious note, our node spends between 50-80% of total CPU time doing BLS sig verification, so if we have to spend time optimizing something, this is a great target. Throwing all the work to a thread will be a massive improvement too.
That's good enough thanks!
Thanks, I don't think it would be a problem at all then, no need to truncate. I'll report back when I can do a test with your SWIG snippet. @dot-asm Is your branch swig/swig#1973 safe to use in production? |
I personally would say so, yes. |
What's the standing with On related note, my impression is that passing ArrayBuffer between threads doesn't actually involve any data duplication or serialization. Or rather I see no reason why it would have to. ArrayBuffer data storage appears to be allocated off V8 pool, and it can be detached and reattached. So that transfer of an ArrayBuffer can be achieved by passing a pointer to the off-pool storage... |
Yeah that sounds right. I haven't had time to implement the transfer Pairing strategy, so can't comment on that front yet. I did some benchmarks and even tho transfering ArrayBuffer to threads is done with pointers, there's still a time cost in message passing, which is around 1ms per one-way transfer in a few machines I've tested. Due to that the current strategy is to bundle as many signature sets as possible to reduce the communication between workers and the main thread without under-utilizing the workers. |
"1ms" as in "millisecond"? If so, then it doesn't sound right. LAN is faster than that. If real, then node should be notified... |
😆 yeah it's ridiculous but that's the state of the art. Can you run this locally to confirm I'm not crazy? https://github.com/ChainSafe/blst-ts/blob/master/benchmark/workerMessagePassing.js |
I can run it, but I can't confirm you're not crazy:-):-):-) Not qualified, you know...:-):-):-)
However, I reckon that these are not actually representative numbers. First, it's appropriate to disregard first results, because you have to give JIT compiler a chance to kick in. Secondly, as you use setTimeout, and with increasing timeout value, you allow processor to ramp down operating frequency, which would manifest itself as operations appearing slower. This is apparently why main-to-worker averages slower than worker-to-main. I mean setTimeout falls asleep prior main-to-worker, frequency drops, but then, by the worker-to-main time, it's increasing... Either way, if I fix the frequency and discount results from the first iteration of the outer loop, I get following:
In other words messages are passed in ~60μs, "μs" as in "microseconds," at the nominal processor frequency. It's still arguably higher than expected, but I wouldn't be surprised that it's an adequate result for the snippet in question. I mean I would imagine that the process.hrtime() output gets serialized to JSON, which would take its toll... |
Actually, you are super qualified to disregard my initial results! Thanks for taking the time to analyze it, definitely now things make sense. Happy to know the latency is low in practice. |
Just in case for reference, I used node v14.16. No, it doesn't mean that I know if there were some relevant improvements, or if prior versions were "bad." It was just a convenient choice. |
Current NodeJS bindings are not able to run in a multi-threaded setup with worker_threads reliably. Most of the times the parent process crashes with a segmentation fault. I'm working on creating a minimal reproducible POC that always crashes, but it's proving very difficult.
First of all, could you say if current SWIG NodeJS bindings are safe to multi-thread? There's some directions about to load the same bindings in multiple threads in the NodeJS docs. I wonder if the current SWIG setup achieves any of this directions. https://nodejs.org/api/addons.html#addons_worker_support
In our particular case I would be happy with a threaded setup where threads don't share any resources and only communicate with the main thread through messages.
The text was updated successfully, but these errors were encountered: