-
Notifications
You must be signed in to change notification settings - Fork 48
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Python][Flight] pyarrow.flight.FlightServerBase use threading instead of multiprocessing? #236
Comments
I've searched for similar questions and maybe learned something, but anyone with more suggestions is welcome. |
Yes, if you have heavy internal processing, your only real choice is to use multiprocessing. Depending on what you want to do, you could also set the There's no option to set the thread pool because 1) it's provided by gRPC and 2) it wouldn't help anyways, you're still limited by the GIL. |
@lidavidm thank you for your reply. My program still have some IO. Considering the copy cost of multi-process communication, it may be better to start more FlightServer. |
Yeah, it does make it hard to build a CPU-intensive service in Python. Maybe the SO_REUSEPORT option can be an Arrow Cookbook recipe - is that a good solution here? |
Probably didn't understand how to use SO_REUSEPORT. Sometimes it is possible to bind to the same port, but the created clients always only connect to the same server.
|
The option has to be passed through gRPC (the code above effectively does nothing). See I'll see about adding a code snippet in the cookbook |
@lidavidm Can you please share or document the code snippet for this. |
Ah, hmm, this isn't overridable from Python. Do you want to file an issue on the main repo? |
@lidavidm, Thank you for quick response. Yes, Will file an issue. |
Workaround? What is the issue? |
I mean is there any other way to use pyarrow server with multiprocessing until the above issue get resolved? |
Sorry, I don't understand. You can just use the multiprocessing module. Is there a problem when using it? |
I tried to use arrow flight, but when testing the performance (
do_exchange
), I found that the cpu usage can only be around 100%, maybe it is the GIL problem of python multi-threading.I didn't find anything about setting the thread pool for
FlightServerBase
, so it seems that if you want to improve performance, it seems that you can only usemultiprocessing.Pool
internally to handle the internal working logic.Do you have any other suggestions? Thanks!
The text was updated successfully, but these errors were encountered: