Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add aiohttp client sharing #3024

Merged
merged 7 commits into from
Nov 22, 2023
Merged

Add aiohttp client sharing #3024

merged 7 commits into from
Nov 22, 2023

Conversation

sarayourfriend
Copy link
Collaborator

@sarayourfriend sarayourfriend commented Sep 14, 2023

Fixes

Fixes #2788 by @sarayourfriend

Description

Shares the aiohttp client within a given event loop.

I've also switched the "OpenverseASGIHandler", which was subclassing Django's ASGI handler and required us to write a bespoke get_asgi_application function, duplicating Django's built-in initialisation work in our code. Instead, we can follow a clearer middleware/handler pattern that calls to a parent application. This works very well and is so clear that I think we could even publish it as a Django extension in the future, provided there isn't one already. The Django project rejected a proposal to add lifecycle handling to Django itself, so there is a need for a library to handle it transparently.

Along with that small refactor, I also used a single consistent name for the "application" uvicorn runs. This reduces the number of places we need to check for the environment to decide which application to run to just inside asgi.py, instead of both there and run.py. To make sure we're always referencing the correct object to register lifecycle events against, I've exported the lifecycle handler/application as its own singleton.

Testing Instructions

Checkout the branch and run just build web in case you don't have the dependencies added in #3011. For good measure, delete your Redis volume to prevent cached dead links. For extra good measure, run just recreate and forget about fiddly details :)

Run the API and open the logs with just logs web. Make a search request with the filter_dead=True and confirm you see a log about creating a new client session because none exists (grep logs with just logs web | grep aiohttp to filter the logs down to the relevant ones). Make repeat requests, related requests, and requests for different queries. Confirm that in the logs you can see messages about re-using the same session. Finally, test the shutdown behaviour works as expected by making an arbitrary change to any file in the API code. In the web container's logs you should see uvicorn reload the application, and in the process, a log indicating how many shutdown handlers were executed. Specifically you should see it log that only 1 handler was executed, for the single, re-used aiohttp session: "Executed 1 handler(s) before shutdown".

As above, you can filter the logs using grep to make this easier:

$ just logs web | grep lifecycle_handler
just dc logs -f web
env COMPOSE_PROFILES="api,ingestion_server,frontend,catalog" docker-compose -f docker-compose.yml logs -f web
openverse-web-1  | [2023-11-03 00:28:59,601 - conf.lifecycle_handler... -  75][INFO] [none] Executed 0 handler(s) before shutdown.

To force test that sessions are used per loop, change the decorator on check_dead_links.__init__._make_head_requests to @async_to_sync(force_new_loop=True) so that a new loop is created every time the function is called. You should now see a log from api.utils.aiohttp about creating a new session each time you make a new search request.

Each time you make a search request, you must ensure that dead link checking runs. The only consistent way to do this is to use a unique query each time. I've tested this by cycling through queries for "birds", "cat", and "run". Result tags are a good place to easily find new queries to run that will require dead link checking. Just keep in mind that dead link masking and caching will prevent repeated queries from re-running dead link checks, and you need to work around that to exercise the dead link checking route while testing. Alternatively, you could use the redis CLI to remove all cached dead links, but to me that seemed way more tedious that just using a new query term each time, so I didn't even try it and don't have any advice for attempting it.

Checklist

  • My pull request has a descriptive title (not a vague title likeUpdate index.md).
  • My pull request targets the default branch of the repository (main) or a parent feature branch.
  • My commit messages follow best practices.
  • My code follows the established code style of the repository.
  • I added or updated tests for the changes I made (if applicable).
  • I added or updated documentation (if applicable).
  • I tried running the project locally and verified that there are no visible errors.
  • [N/A] I ran the DAG documentation generator (if applicable).

Developer Certificate of Origin

Developer Certificate of Origin
Developer Certificate of Origin
Version 1.1

Copyright (C) 2004, 2006 The Linux Foundation and its contributors.
1 Letterman Drive
Suite D4700
San Francisco, CA, 94129

Everyone is permitted to copy and distribute verbatim copies of this
license document, but changing it is not allowed.


Developer's Certificate of Origin 1.1

By making a contribution to this project, I certify that:

(a) The contribution was created in whole or in part by me and I
    have the right to submit it under the open source license
    indicated in the file; or

(b) The contribution is based upon previous work that, to the best
    of my knowledge, is covered under an appropriate open source
    license and I have the right under that license to submit that
    work with modifications, whether created in whole or in part
    by me, under the same open source license (unless I am
    permitted to submit under a different license), as indicated
    in the file; or

(c) The contribution was provided directly to me by some other
    person who certified (a), (b) or (c) and I have not modified
    it.

(d) I understand and agree that this project and the contribution
    are public and that a record of the contribution (including all
    personal information I submit with it, including my sign-off) is
    maintained indefinitely and may be redistributed consistent with
    this project or the open source license(s) involved.

@sarayourfriend sarayourfriend requested a review from a team as a code owner September 14, 2023 04:46
@sarayourfriend sarayourfriend requested review from obulat and dhruvkb and removed request for a team September 14, 2023 04:46
@github-actions github-actions bot added the 🧱 stack: api Related to the Django API label Sep 14, 2023
@openverse-bot openverse-bot added the 🚦 status: awaiting triage Has not been triaged & therefore, not ready for work label Sep 14, 2023
@sarayourfriend sarayourfriend marked this pull request as draft September 18, 2023 05:31
@obulat obulat added 🟧 priority: high Stalls work on the project or its dependents 💻 aspect: code Concerns the software code in the repository 🧰 goal: internal improvement Improvement that benefits maintainers, not users and removed 🚦 status: awaiting triage Has not been triaged & therefore, not ready for work labels Sep 19, 2023
@sarayourfriend sarayourfriend mentioned this pull request Nov 2, 2023
2 tasks
@sarayourfriend sarayourfriend force-pushed the add/aiohttp-client-sharing branch from 152040a to 2520356 Compare November 2, 2023 23:48
Base automatically changed from add/asgi to main November 8, 2023 22:23
@sarayourfriend sarayourfriend mentioned this pull request Nov 15, 2023
6 tasks
@sarayourfriend sarayourfriend force-pushed the add/aiohttp-client-sharing branch from 3567b71 to d198c1b Compare November 17, 2023 05:17
@sarayourfriend sarayourfriend marked this pull request as ready for review November 17, 2023 05:23
Copy link
Member

@krysal krysal left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! Thanks for the detailed instructions and tests.

Copy link
Collaborator

@AetherUnbound AetherUnbound left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks great! No blocking comments, I was able to perform all of the testing steps mentioned except for the shutdown step. For whatever reason, changes that I make to the API code locally aren't causing the API service to reload 🤔 I have DJANGO_DEBUG_ENABLED=True and ENVIRONMENT=development, not sure why that might be happening, but the rest looks great!

Comment on lines 15 to 21
class ASGILifecycleHandler:
"""
Extend default ASGIHandler to implement lifetime hooks.
Handle ASGI lifecycle messages.

Django's ASGIHandler does not handle these messages,
so we have to implement it ourselves. Only shutdown handlers
are currently supported.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Out of curiosity, where did you piece together that this class was necessary, and what to include within it?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I read in the Django bug tracker that ASGI lifecycle wasn't supported. Somehow I didn't come across this project, though, which I wonder if we should use instead of our custom implementation: https://github.com/illagrenan/django-asgi-lifespan

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Update: yeah, I'm going to go ahead and change this PR to use that package so that we don't have to do a refactor later to remove our custom implementation.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah! I remember why I originally did not use that package: it only supports uvicorn directly, but not under gunicorn workers. That's not an issue for us anymore, so we can use it.

It doesn't have a lot of activity, but it has comprehensive unit tests and by using Django signals, gets around the need to implement custom weak ref handling (Django signal handler refs are weak by default).

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay, that's done in the latest commit: e4dc958

It actually was a pretty nice way to clean things up. Rather than registering each session's close and needing to manage it, we can benefit from the _SESSIONS gathering point and register a single signal handler that closes all the sessions.

I had to refactor the test fixture as well, and almost thought it was finally going to break completely, until I realised I'd come across a bug in the upstream library, for which I've submitted a patch: illagrenan/django-asgi-lifespan#80

It doesn't currently affect us, and shouldn't in the foreseeable future, because we don't need the startup lifecycle event (and worst-case scenario, we can still call the startup event without awaiting it and just tolerate a log at the end of the tests complaining that it was never awaited).

In any case, I think the end result is a lot cleaner and easier to follow, and easier to extend into the future.

Comment on lines -9 to -12
app = "conf.asgi:static_files_application" if is_local else "conf.asgi:application"

uvicorn.run(
app,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice simplification!

session_1 = loop.run_until_complete(get_aiohttp_session())
session_2 = loop.run_until_complete(get_aiohttp_session())

assert session_1 is session_2
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great test!

@krysal
Copy link
Member

krysal commented Nov 21, 2023

@AetherUnbound I believe you need ENVIRONMENT=local to see the API service reloading; that is what I have in my .env file, though admittedly, I find it a bit odd and think it should be set to development as in your case 🤔

@AetherUnbound
Copy link
Collaborator

@AetherUnbound I believe you need ENVIRONMENT=local to see the API service reloading; that is what I have in my .env file, though admittedly, I find it a bit odd and think it should be set to development as in your case 🤔

Thanks, I'll give that a shot! FWIW that appears to be the default:

ENVIRONMENT=development

@AetherUnbound
Copy link
Collaborator

@krysal indeed that was it! Do you think we should change the default environment to local in the env.template file?

@sarayourfriend
Copy link
Collaborator Author

Development is overloaded. It was used as the deployed "staging" environment for a long time. The original ASGI PR already changed it to local instead of development, in line with the ingestion server and the frontend. Looks like I forgot to add it to the template, so I'll do that in this PR now 👍.

Note: this does not apply to production because we serve static files from Nginx there: those static file requests never make it to the Django application and, indeed, it is not configured to serve static files. This change uses the ASGI static file handler that the Django `runserver` management command uses and correctly handles streaming responses. The only consequence of not doing this is that warnings will appear locally and, if for some reason local interactions are bypassing the static file cache on the browser, you could get a memory leak. Again, that only applies to local environments. Python code never interacts with, considers, or is configured for static files in production, so this is not an issue for production. The correct behaviour for production, which you can test by setting ENVIRONMENT to something other than `local` in `api/.env`, is to 404 on static files.
@sarayourfriend sarayourfriend force-pushed the add/aiohttp-client-sharing branch from d198c1b to e4dc958 Compare November 22, 2023 06:14
@sarayourfriend sarayourfriend merged commit 1de2d21 into main Nov 22, 2023
43 checks passed
@sarayourfriend sarayourfriend deleted the add/aiohttp-client-sharing branch November 22, 2023 07:06
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
💻 aspect: code Concerns the software code in the repository 🧰 goal: internal improvement Improvement that benefits maintainers, not users 🟧 priority: high Stalls work on the project or its dependents 🧱 stack: api Related to the Django API
Projects
Archived in project
Development

Successfully merging this pull request may close these issues.

Use a shared aiohttp.ClientSession rather than creating a new one per-request
5 participants