Convert search endpoints to asynchronous #3449
Labels
💻 aspect: code
Concerns the software code in the repository
✨ goal: improvement
Improvement to an existing user-facing feature
🟨 priority: medium
Not blocking but should be addressed soon
🧱 stack: api
Related to the Django API
🔒 staff only
Restricted to staff members
🔧 tech: django
Involves Django
🔧 tech: elasticsearch
Involves Elasticsearch
🐍 tech: python
Involves Python
Problem
Since we have converted the application to ASGI, we can now benefit from the async Elasticsearch client. Requests to Elasticsearch are some of the longest running blocking operations in our application. By using the async client, we can remove that block when waiting for queries to come back from Elasticsearch.
Description
The primary thing to convert in the search route is the Elasticsearch client usage. The asynchronous client swaps the underlying "node" (the request engine the ES Client uses) with aiohttp. We can't just swap out all usages of the Elasticsearch client for the async client, however, so we'll need to maintain both the synchronous and asynchronous for a period of time while we switch over from sync to async in all our usage of the client.
To do this, update the
_elasticsearch_connect
function to return both a synchronous and asynchronous client. Rename the existing synchronousES
client toSYNC_ES
and add a newASYNC_ES
assigned to the asynchronous client. Update all usages ofsettings.ES
tosettings.SYNC_ES
.Now for the complex part of this issue: the Elasticsearch DSL library, which we use for our normal search routine, does not yet support the async Elasticsearch client. We can work around this, however, by not using
Search::execute
and changing theget_es_response
function to useASYNC_ES.search
directly instead.openverse/api/api/controllers/elasticsearch/helpers.py
Line 50 in 7bb4298
Something like this might work:
That's basically an adaptation of the Elasticsearch DSL's
Search::execute
method into ourget_es_response
function.All functions that call
get_es_response
.After updating
get_es_response
, we'll also want to updatecheck_dead_links
to be an asynchronous function rather than having it call anasync_to_sync
wrapped function. We'll need to follow the chain of functions all the way up to the route endpoints until the route endpoints and all functions that interact with Elasticsearch they use areasync def
.Additional context
Marked staff only due to complexity.
The text was updated successfully, but these errors were encountered: