-
Notifications
You must be signed in to change notification settings - Fork 213
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add aiohttp client sharing #3024
Conversation
14d064c
to
f956f4e
Compare
f956f4e
to
17802db
Compare
152040a
to
2520356
Compare
3567b71
to
d198c1b
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM! Thanks for the detailed instructions and tests.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks great! No blocking comments, I was able to perform all of the testing steps mentioned except for the shutdown step. For whatever reason, changes that I make to the API code locally aren't causing the API service to reload 🤔 I have DJANGO_DEBUG_ENABLED=True
and ENVIRONMENT=development
, not sure why that might be happening, but the rest looks great!
api/conf/lifecycle_handler.py
Outdated
class ASGILifecycleHandler: | ||
""" | ||
Extend default ASGIHandler to implement lifetime hooks. | ||
Handle ASGI lifecycle messages. | ||
|
||
Django's ASGIHandler does not handle these messages, | ||
so we have to implement it ourselves. Only shutdown handlers | ||
are currently supported. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Out of curiosity, where did you piece together that this class was necessary, and what to include within it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I read in the Django bug tracker that ASGI lifecycle wasn't supported. Somehow I didn't come across this project, though, which I wonder if we should use instead of our custom implementation: https://github.com/illagrenan/django-asgi-lifespan
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Update: yeah, I'm going to go ahead and change this PR to use that package so that we don't have to do a refactor later to remove our custom implementation.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah! I remember why I originally did not use that package: it only supports uvicorn directly, but not under gunicorn workers. That's not an issue for us anymore, so we can use it.
It doesn't have a lot of activity, but it has comprehensive unit tests and by using Django signals, gets around the need to implement custom weak ref handling (Django signal handler refs are weak by default).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Okay, that's done in the latest commit: e4dc958
It actually was a pretty nice way to clean things up. Rather than registering each session's close and needing to manage it, we can benefit from the _SESSIONS
gathering point and register a single signal handler that closes all the sessions.
I had to refactor the test fixture as well, and almost thought it was finally going to break completely, until I realised I'd come across a bug in the upstream library, for which I've submitted a patch: illagrenan/django-asgi-lifespan#80
It doesn't currently affect us, and shouldn't in the foreseeable future, because we don't need the startup lifecycle event (and worst-case scenario, we can still call the startup event without awaiting it and just tolerate a log at the end of the tests complaining that it was never awaited).
In any case, I think the end result is a lot cleaner and easier to follow, and easier to extend into the future.
app = "conf.asgi:static_files_application" if is_local else "conf.asgi:application" | ||
|
||
uvicorn.run( | ||
app, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice simplification!
session_1 = loop.run_until_complete(get_aiohttp_session()) | ||
session_2 = loop.run_until_complete(get_aiohttp_session()) | ||
|
||
assert session_1 is session_2 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great test!
@AetherUnbound I believe you need |
Thanks, I'll give that a shot! FWIW that appears to be the default: Line 8 in 19e6e4b
|
@krysal indeed that was it! Do you think we should change the default environment to |
Development is overloaded. It was used as the deployed "staging" environment for a long time. The original ASGI PR already changed it to |
Note: this does not apply to production because we serve static files from Nginx there: those static file requests never make it to the Django application and, indeed, it is not configured to serve static files. This change uses the ASGI static file handler that the Django `runserver` management command uses and correctly handles streaming responses. The only consequence of not doing this is that warnings will appear locally and, if for some reason local interactions are bypassing the static file cache on the browser, you could get a memory leak. Again, that only applies to local environments. Python code never interacts with, considers, or is configured for static files in production, so this is not an issue for production. The correct behaviour for production, which you can test by setting ENVIRONMENT to something other than `local` in `api/.env`, is to 404 on static files.
d198c1b
to
e4dc958
Compare
Fixes
Fixes #2788 by @sarayourfriend
Description
Shares the aiohttp client within a given event loop.
I've also switched the "OpenverseASGIHandler", which was subclassing Django's ASGI handler and required us to write a bespoke
get_asgi_application
function, duplicating Django's built-in initialisation work in our code. Instead, we can follow a clearer middleware/handler pattern that calls to a parent application. This works very well and is so clear that I think we could even publish it as a Django extension in the future, provided there isn't one already. The Django project rejected a proposal to add lifecycle handling to Django itself, so there is a need for a library to handle it transparently.Along with that small refactor, I also used a single consistent name for the "application" uvicorn runs. This reduces the number of places we need to check for the environment to decide which application to run to just inside
asgi.py
, instead of both there andrun.py
. To make sure we're always referencing the correct object to register lifecycle events against, I've exported the lifecycle handler/application as its own singleton.Testing Instructions
Checkout the branch and run
just build web
in case you don't have the dependencies added in #3011. For good measure, delete your Redis volume to prevent cached dead links. For extra good measure, runjust recreate
and forget about fiddly details :)Run the API and open the logs with
just logs web
. Make a search request with thefilter_dead=True
and confirm you see a log about creating a new client session because none exists (grep logs withjust logs web | grep aiohttp
to filter the logs down to the relevant ones). Make repeat requests, related requests, and requests for different queries. Confirm that in the logs you can see messages about re-using the same session. Finally, test the shutdown behaviour works as expected by making an arbitrary change to any file in the API code. In the web container's logs you should see uvicorn reload the application, and in the process, a log indicating how many shutdown handlers were executed. Specifically you should see it log that only 1 handler was executed, for the single, re-used aiohttp session: "Executed 1 handler(s) before shutdown".As above, you can filter the logs using grep to make this easier:
To force test that sessions are used per loop, change the decorator on
check_dead_links.__init__._make_head_requests
to@async_to_sync(force_new_loop=True)
so that a new loop is created every time the function is called. You should now see a log fromapi.utils.aiohttp
about creating a new session each time you make a new search request.Each time you make a search request, you must ensure that dead link checking runs. The only consistent way to do this is to use a unique query each time. I've tested this by cycling through queries for "birds", "cat", and "run". Result tags are a good place to easily find new queries to run that will require dead link checking. Just keep in mind that dead link masking and caching will prevent repeated queries from re-running dead link checks, and you need to work around that to exercise the dead link checking route while testing. Alternatively, you could use the redis CLI to remove all cached dead links, but to me that seemed way more tedious that just using a new query term each time, so I didn't even try it and don't have any advice for attempting it.
Checklist
Update index.md
).main
) or a parent feature branch.Developer Certificate of Origin
Developer Certificate of Origin