-
Notifications
You must be signed in to change notification settings - Fork 50
Reduce DB queries needed in search results #1040
Conversation
API Developer Docs Preview: Ready https://wordpress.github.io/openverse-api/_preview/1040 Please note that GitHub pages takes a little time to deploy newly pushed code, if the links above don't work or you see old versions, wait 5 minutes and try again. You can check the GitHub pages deployment action list to see the current status of the deployments. |
0f7ab0b
to
0105e8c
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is exciting! One request I would have is to use Django's query count checks to ensure that search hits for each image and audio only ever require at most a single query: https://pytest-django.readthedocs.io/en/latest/helpers.html#django-assert-num-queries
We also need to address the renamed column. If it is absolutely necessary for some reason (we need to justify this very carefully considering the implications of downtime) then we need to carefully plan it. If it can at all be avoided, either via a property to unify the API or, if the column definitely needs to be renamed, then via the multi-step zero-downtime approach I described in the other comment (though again, carefully justified, considering how much work it is to do it, including needing to write a command to incrementally move the data between the two columns).
Exciting PR nonetheless 🙂
migrations.RenameField( | ||
model_name='imagereport', | ||
old_name='created_at', | ||
new_name='created_on', | ||
), | ||
migrations.RenameField( | ||
model_name='audioreport', | ||
old_name='created_at', | ||
new_name='created_on', | ||
), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We cannot use RenameField
with zero downtime deployments. Do we need to rename this or could we use a property to unify the API without needing to have this migration happen? If it absolutely 100% needs to happen then we need to do a multi-step migration, first to create the new column, then to move the data to the new column, then to delete the old column.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I updated the code in a6e3759 to not include these frivolous changes. It's not worth the incurred downtime.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry for digging up a past conversation, but would RenameField
operations be okay if we preserve the underlying column name and only change the model field? Then the SQL output of the ensuing migration would essentially be a no-op and would not cause a downtime afaik.
…o select_related
As much as I'd like to, I don't think I'll get to review this in the week, so tagging other folks in case they can/want to. |
The migration 0052 is largely a no-op with only 2 actual SQL statements! This can be verified with the $ just dc exec web python manage.py sqlmigrate api 0052 BEGIN;
--
-- Add field audioset to audio
-- (no-op)
--
-- Alter field identifier on audioreport
CREATE INDEX "nsfw_reports_audio_identifier_ebe3a079" ON "nsfw_reports_audio" ("identifier");
--
-- Alter field identifier on deletedaudio
-- (no-op)
--
-- Alter field identifier on deletedimage
-- (no-op)
--
-- Alter field identifier on imagereport
CREATE INDEX "nsfw_reports_identifier_f0374e03" ON "nsfw_reports" ("identifier");
--
-- Alter field identifier on matureaudio
-- (no-op)
--
-- Alter field identifier on matureimage
-- (no-op)
--
-- Rename field identifier on audioreport to media_obj
-- (no-op)
--
-- Rename field identifier on deletedaudio to media_obj
-- (no-op)
--
-- Rename field identifier on deletedimage to media_obj
-- (no-op)
--
-- Rename field identifier on imagereport to media_obj
-- (no-op)
--
-- Rename field identifier on matureaudio to media_obj
-- (no-op)
--
-- Rename field identifier on matureimage to media_obj
-- (no-op)
COMMIT; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Okay, I think I understand everything happening here! This is definitely some ORM magic 🤯 I ran this locally as described and confirmed that only 1 query is now made for search results. I also migrated my local database, then hopped back over to main to make sure we were backwards compatible after the migration. I wasn't able to raise any errors after several searches and poking around in the admin UI. Based on that and your lovely SQL plan output (thanks for posting that @dhruvkb) I think this is good to go 🙂
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Brilliant, thanks @dhruvkb. Excellent work sorting through the complexities here 🚀
This PR has migrations. Please rebase it before merging to ensure that conflicting migrations are not introduced. |
Fixes
Fixes #1036 by @sarayourfriend
Description
Currently a search result needs several DB queries to populate the page.
Hit
instances from ESAudioSet
andMatureAudio
instances. For each image search for theMatureImage
instances.This PR
OpenledgerModel
in more places for consistent timestamping fieldsThis is an example query for audio, it includes the related
matureaudio
andaudioset
data too!Testing Instructions
grep
.just dc logs -f web | grep -E '\[db]'
.Checklist
Update index.md
).main
) or a parent feature branch.Developer Certificate of Origin
Developer Certificate of Origin