-
-
Notifications
You must be signed in to change notification settings - Fork 2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Question] - Eventloop busy with CPU bound operations #3454
Comments
GitMate.io thinks the contributor most likely able to help you is @asvetlov. Possibly related issues are #2992 (question), #172 (Question about ServerHttpProtocol.closing()), #3066 (Question: What's timeout?), #1325 ([QUESTION] WebSocketResponse timeout parameter), and #2482 ([Question] primarydomain for aiohttp project). |
I guess a standard thread pool executor should be fine for pandas. Check it out. You don't need two loops even for process executor, just create an instance and pass it into |
Thanks @asvetlov
the second await inside run_in_executor is now allowed. I can create a sync function as entry point for ThreadPoolExecutor but not sure how to call async function from sync function |
|
This would require a lot of refactoring to the code. I wonder if there is any better way to do it at high level. I tried something like this:
This obviously not working. But if there is a way to do it at high level, it would make my life much easier. |
You have to run any blocking code in an executor. |
Agree the issue is not aiohttp specific. Thanks @asvetlov for help! |
Welcome! |
The simplest solution is to call async def transformStock():
stocks = await getStockData()
securities = await getSecurityData()
stockDF = await loop.run_in_executor(None, create_dataframe_etc)
return stockDF As it happens I was implementing almost this exact thing this morning, see here (Note, while this is tested and working I haven't probed its performance at all). |
Yup looks like i will have to refactor the code to add run_in_executor for CPU intensive code blocks. Thanks @samuelcolvin for the example. The problem for me is that its spread all over and we have 100s of these transformation functions which performs both - fetching the data and running CPU intensive work. |
Scenario - I have an aiohttp service with two endpoints 1) /health which is called on a regular 3 second internal (with timeout of 3 seconds) by some external container orchestration service to know that service is responding. If there are 3 consecutive health check failures, it will think the service is in a bad state and would restart it. 2) /transform - this is an api that call many external endpoint to fetch data and then run some pandas operations.
For /transform - the call to external endpoints are fine but the actual pandas work can be CPU bound and puts the event loop in a busy state. When that happens, the health end point starts to fails and the orchestration service would restart the container.
SampleCode:
I would like the main event loop which aiohttp is using to not be hijacked by the long running CPU operations. From asyncio docs seems like the strategy for running CPU bound operations should be to run it inside ProcessPoolExecutor as shown in the link below but the problem is that all these functions has async syntax.
https://docs.python.org/3/library/asyncio-eventloop.html#executing-code-in-thread-or-process-pools
Is it possible to use two event loops? One for main aiohttp and other for running these transformation functions? Or if there is any other way possible to achieve the same? Some working example would be great!
Thanks!
The text was updated successfully, but these errors were encountered: