Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Convert some views to native async. #2720

Merged
merged 1 commit into from
Jul 20, 2023
Merged

Conversation

ttys0dev
Copy link
Contributor

@ttys0dev ttys0dev commented May 3, 2023

We're a bit limited in which handlers can actually use native asyncio until #2703 is merged since django 3.2 does not support async ORM operations. This is an alternative approach to #2701.

@ttys0dev ttys0dev force-pushed the asgi branch 6 times, most recently from 3fe0584 to 3bd4184 Compare May 8, 2023 22:26
@ttys0dev ttys0dev force-pushed the asgi branch 4 times, most recently from 2ca979a to a816fc5 Compare May 10, 2023 09:07
@ttys0dev
Copy link
Contributor Author

This is probably ready for an initial review, I migrated most of the celery codepaths in the process_recap_upload view to asyncio. So far I haven't re-added any retry logic as it's a bit unclear what exactly commonly fails and actually needs retries(I also suspect the celery retries are not particularly reliable).

@mlissner
Copy link
Member

Thanks. I'll put it on my backlog.

@mlissner
Copy link
Member

Seems like switching to gevent is a smaller move. If we do that in #2701, can we close this?

@ttys0dev
Copy link
Contributor Author

Seems like switching to gevent is a smaller move.

I can spin a minimal version of the ASGI migration without migrating the views(migrating the views can be done later as needed).

If we do that in #2701, can we close this?

l'm thinking this should have better performance/reliability as it avoids monkey patching and instead natively implements asynchronous IO handling(and uses a thread pool as a fallback for non-native asyncio functions).

@mlissner
Copy link
Member

l'm thinking this should have better performance/reliability

Great, so #2701 can be closed? :)

@ttys0dev
Copy link
Contributor Author

Great, so #2701 can be closed? :)

Yeah, closed that out, there's a number of bugs(such as compatibility issues related to monkeypatching and with multiprocessing) I came across with gevent that don't seem to have good fixes.

The minimal version of this PR which is basically just the WSGI to ASGI server migration part without optimizing the views to be native asyncio is what's in #2783. Hopefully that's a lot easier to review than this which combined the WSGI to ASGU server migration in addition to optimizing some of the views.

Once #2783 is merged the views can then be incrementally migrated to support native asyncio as needed. The nice thing is that ASGI should be fully backwards compatible with the existing sync views so that we can do this on an as needed basis over time.

@ttys0dev ttys0dev force-pushed the asgi branch 2 times, most recently from 88bf61d to 1a06a17 Compare June 4, 2023 07:54
@ttys0dev ttys0dev changed the title Migrate from WSGI to ASGI Convert some views to native async. Jun 4, 2023
@ttys0dev ttys0dev force-pushed the asgi branch 2 times, most recently from 7e6e229 to 375857a Compare June 4, 2023 07:58
@ttys0dev
Copy link
Contributor Author

ttys0dev commented Jun 4, 2023

rebased

@mlissner
Copy link
Member

mlissner commented Jun 7, 2023

Thanks for keeping this ready to go. I took a quick look through it, but I don't have time to think about it carefully (I'm on vacation for two weeks starting Saturday). I did some reading about async, but I think it'd be good to have @albertisfu look through it as well.

A couple questions:

  • How did you pick the views/code you did? It seems sort of random.
  • What's up with all the type ignoring in the urls.py file?
  • Why not make all the microservices async instead of giving them the sync_to_async treatment (they seem like a priority)?

@ttys0dev
Copy link
Contributor Author

ttys0dev commented Jun 7, 2023

  • How did you pick the views/code you did? It seems sort of random.

It's basically pdf file upload codepaths plus other ones that didn't seem to have interdependencies that would require further conversions.

  • What's up with all the type ignoring in the urls.py file?

It seems that django-stubs doesn't properly handle typing for async views.

  • Why not make all the microservices async instead of giving them the sync_to_async treatment (they seem like a priority)?

What microservices are you referring to exactly?

@albertisfu albertisfu self-requested a review June 7, 2023 17:52
Copy link
Contributor

@albertisfu albertisfu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've reviewed this and learned how to utilize asynchronous support to enhance performance on I/O-bound tasks.

Overall this looks good to me. Troller is working well after changes done here.

I only wrote a question above related to celery tasks that were converted to async functions.

And about the microservices that Mike is referring to are these ones:
https://github.com/freelawproject/courtlistener/blob/main/cl/lib/microservice_utils.py

cl/recap/tasks.py Show resolved Hide resolved
@ttys0dev
Copy link
Contributor Author

ttys0dev commented Jul 3, 2023

rebased

@ttys0dev
Copy link
Contributor Author

ttys0dev commented Jul 3, 2023

  • Processing and parsing HTML. This is usually fast for most types of objects, but notably can be slow for longer docket and docket history reports.

Any idea how slow we are talking here? I'd say unless we're sure this is going to cause issues then we can defer it and optimize down the line as there are multiple potential strategies we can use to deal with it.

@mlissner
Copy link
Member

mlissner commented Jul 7, 2023

Just catching up on the state of play here. I want to re-read the PR because parts of it make me a bit nervous, but I can reply to this now:

Any idea how slow we are talking here? I'd say unless we're sure this is going to cause issues then we can defer it and optimize down the line as there are multiple potential strategies we can use to deal with it.

Probably maxes out at around 10s, I'd guess? Usually would take about half a second?

Separately, for your retries could you do it with our retry decorator? It should be tidier than the loop you've got: cl.lib.decorators.retry?

@ttys0dev
Copy link
Contributor Author

ttys0dev commented Jul 7, 2023

Probably maxes out at around 10s, I'd guess? Usually would take about half a second?

Sounds like it should probably be ok without a background task.

Separately, for your retries could you do it with our retry decorator? It should be tidier than the loop you've got: cl.lib.decorators.retry?

It's not really asyncio compatible, this would block the event loop.

@ttys0dev
Copy link
Contributor Author

ttys0dev commented Jul 8, 2023

Separately, for your retries could you do it with our retry decorator? It should be tidier than the loop you've got: cl.lib.decorators.retry?

By the way I did look at potentially adding an async variant of this decorator, however I think it's probably best to defer refactoring this at the moment. The retry mechanism with celery worked AFAIU by retrying the entire function, which is similar to cl.lib.decorators.retry which also retries the entire function.

I don't think a retry the entire function decorator is good alternative to the retry loop approach I used as we do want to do a more granular retry operation on a specific codeblock within a function(ie for performance to avoid unnecessarily repeating queries that come before the codeblock we want to retry).

Using a simple retry loop like I did seems to be the standard approach for implementing granular retries with asyncio within a codeblock from what I can tell.

@mlissner
Copy link
Member

I'm sorry, @ttys0dev can you fix the conflicts on this one and I'll take one more look at it?

@ttys0dev
Copy link
Contributor Author

rebased

Copy link
Member

@mlissner mlissner left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, I re-did my whole review to see if I caught anything else, and I found a few more things.

The big picture is that I'm still pretty worried about moving these celery tasks to async, but I'm prepared to be convinced that it'll be OK, and I appreciate you pushing me/this forward despite my and others misgivings.

It's particularly weird to be moving celery processing tasks to the web tier, for example, instead of our celery tier, but I guess that's OK. There are things I'll have to learn though. Like right now if I want to re-process 1,000 PDFs, I just do:

for pdf in pdfs:
    process_recap_pdf.delay(pdf)

And that creates 1,000 tasks in Celery to get it all done. I guess there's a good way to do this with async too.

cl/lib/thumbnails.py Outdated Show resolved Hide resolved
interval_step=10 * 60,
)
def process_recap_pdf(self, pk):
async def process_recap_pdf(pk):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, I'm going to try this one more time, just so you can tell me that, "Yes, Mike, this is fine."

I have two concerns here:

  1. process_recap_pdf can do OCR and OCR can take a really long time. The longest is something like a day, but generally it's done within a minute or two. Sometimes it can take a lot longer though.

    Given that that's the case, can we count on a client being around to receive a response? And if the client wanders away, could that screw up the processing?

    In other words, if this method takes an hour and the user throws their laptop into a lake after a few seconds, is that OK?

  2. We have two types of users for the RECAP upload API. The extension (via browsers), and API users. The API users are used to getting nearly instant responses, and this will make their responses much slower. Won't that screw up the speed that clients can do uploads via servers, if they don't currently have a multi-threaded way of uploading? That's a big change in how this API works.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. Given that that's the case, can we count on a client being around to receive a response? And if the client wanders away, could that screw up the processing?

I think it should be ok.

In other words, if this method takes an hour and the user throws their laptop into a lake after a few seconds, is that OK?

Probably no worse than the current situation where processing can fail somewhat randomly even when there is a success response to the initial upload, moving OCR out of the request path and into background job can always be done if it's an issue.

The API users are used to getting nearly instant responses, and this will make their responses much slower.

Well right now the upload responses are mostly useless AFAIU, they don't accurately indicate if the request was successful. Having accurate response status codes is going to be something API users are likely to want.

Won't that screw up the speed that clients can do uploads via servers, if they don't currently have a multi-threaded way of uploading?

It depends on the exact implementation of the upload client, keep in mind not having useful responses is problematic in general for API users.

cl/recap/tasks.py Outdated Show resolved Hide resolved
cl/recap/tasks.py Outdated Show resolved Hide resolved
cl/recap/tasks.py Show resolved Hide resolved
@ttys0dev
Copy link
Contributor Author

It's particularly weird to be moving celery processing tasks to the web tier, for example, instead of our celery tier, but I guess that's OK.

Yeah, so celery seems to be more for background jobs that don't need say a success status response to be returned to the requester.

And that creates 1,000 tasks in Celery to get it all done. I guess there's a good way to do this with async too.

You could just create a celery task that wraps the asyncio function using async_to_sync if you aren't running that through the normal web stack.

@mlissner
Copy link
Member

OK, let's see how this goes. Merging. Thank you for sticking with it for so long. I'm worried about this will impact our extension users and our API users, but hopefully it'll be OK and a step in the right direction.

@mlissner mlissner merged commit 8bfa694 into freelawproject:main Jul 20, 2023
@ttys0dev ttys0dev deleted the asgi branch July 20, 2023 14:10
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Status: Done
Status: Done
Development

Successfully merging this pull request may close these issues.

3 participants