Skip to content
This repository has been archived by the owner on Feb 22, 2023. It is now read-only.

Reduce DB queries needed in search results #1040

Merged
merged 23 commits into from
Dec 20, 2022
Merged

Reduce DB queries needed in search results #1040

merged 23 commits into from
Dec 20, 2022

Conversation

dhruvkb
Copy link
Member

@dhruvkb dhruvkb commented Dec 12, 2022

Fixes

Fixes #1036 by @sarayourfriend

Description

Currently a search result needs several DB queries to populate the page.

  1. Get Hit instances from ES
  2. Get DB rows corresponding to the hits.
  3. For each audio row, search for AudioSet and MatureAudio instances. For each image search for the MatureImage instances.

This PR

  • uses ORM-fu to bring the total queries down to 1!
  • improves the structure of reporting models to use FK
  • improves the admin experience using autocomplete field
  • uses OpenledgerModel in more places for consistent timestamping fields

This is an example query for audio, it includes the related matureaudio and audioset data too!

SELECT 
  "audio"."id", 
  "audio"."created_on", 
  "audio"."updated_on", 
  "audio"."identifier", 
  "audio"."foreign_identifier", 
  "audio"."title", 
  "audio"."foreign_landing_url", 
  "audio"."creator", 
  "audio"."creator_url", 
  "audio"."thumbnail", 
  "audio"."provider", 
  "audio"."url", 
  "audio"."filesize", 
  "audio"."filetype", 
  "audio"."watermarked", 
  "audio"."license", 
  "audio"."license_version", 
  "audio"."source", 
  "audio"."last_synced_with_source", 
  "audio"."removed_from_source", 
  "audio"."view_count", 
  "audio"."tags", 
  "audio"."tags_list", 
  "audio"."category", 
  "audio"."meta_data", 
  "audio"."bit_rate", 
  "audio"."sample_rate", 
  "audio"."audio_set_foreign_identifier", 
  "audio"."audio_set_position", 
  "audio"."genres", 
  "audio"."duration", 
  "audio"."alt_files", 
  "audioset"."id", 
  "audioset"."created_on", 
  "audioset"."updated_on", 
  "audioset"."foreign_identifier", 
  "audioset"."title", 
  "audioset"."foreign_landing_url", 
  "audioset"."creator", 
  "audioset"."creator_url", 
  "audioset"."thumbnail", 
  "audioset"."provider", 
  "audioset"."url", 
  "audioset"."filesize", 
  "audioset"."filetype", 
  "api_matureaudio"."updated_on", 
  "api_matureaudio"."created_on", 
  "api_matureaudio"."media_obj_id" 
FROM 
  "audio" 
  LEFT OUTER JOIN "audioset" ON (
    "audio"."audio_set_foreign_identifier" = "audioset"."foreign_identifier" 
    AND "audio"."provider" = "audioset"."provider"
  ) 
  LEFT OUTER JOIN "api_matureaudio" ON (
    "audio"."identifier" = "api_matureaudio"."media_obj_id"
  ) 
WHERE 
  "audio"."identifier" = 'f97384f6-a5ba-4668-881f-f616904651f0' :: uuid 
LIMIT 
  21;

Testing Instructions

Screenshot 2022-12-12 at 12 47 41 PM

  1. Enable logging for Django DB queries as explained here.
  2. You can also define a formatter with some text like "[db]" for filtering the output using grep.
  3. Start logging the output: just dc logs -f web | grep -E '\[db]'.
  4. Visit the search results and single result pages of audio and image media types.
  5. You should see a single query for each page view in the logs.

Checklist

  • My pull request has a descriptive title (not a vague title like Update index.md).
  • My pull request targets the default branch of the repository (main) or a parent feature branch.
  • My commit messages follow best practices.
  • My code follows the established code style of the repository.
  • I added or updated tests for the changes I made (if applicable).
  • I added or updated documentation (if applicable).
  • I tried running the project locally and verified that there are no visible errors.

Developer Certificate of Origin

Developer Certificate of Origin
Developer Certificate of Origin
Version 1.1

Copyright (C) 2004, 2006 The Linux Foundation and its contributors.
1 Letterman Drive
Suite D4700
San Francisco, CA, 94129

Everyone is permitted to copy and distribute verbatim copies of this
license document, but changing it is not allowed.


Developer's Certificate of Origin 1.1

By making a contribution to this project, I certify that:

(a) The contribution was created in whole or in part by me and I
    have the right to submit it under the open source license
    indicated in the file; or

(b) The contribution is based upon previous work that, to the best
    of my knowledge, is covered under an appropriate open source
    license and I have the right under that license to submit that
    work with modifications, whether created in whole or in part
    by me, under the same open source license (unless I am
    permitted to submit under a different license), as indicated
    in the file; or

(c) The contribution was provided directly to me by some other
    person who certified (a), (b) or (c) and I have not modified
    it.

(d) I understand and agree that this project and the contribution
    are public and that a record of the contribution (including all
    personal information I submit with it, including my sign-off) is
    maintained indefinitely and may be redistributed consistent with
    this project or the open source license(s) involved.

@dhruvkb dhruvkb added 🟨 priority: medium Not blocking but should be addressed soon 🛠 goal: fix Bug fix 💻 aspect: code Concerns the software code in the repository labels Dec 12, 2022
@github-actions github-actions bot added the migrations Modifications to Django migrations label Dec 12, 2022
@github-actions
Copy link

github-actions bot commented Dec 12, 2022

API Developer Docs Preview: Ready

https://wordpress.github.io/openverse-api/_preview/1040

Please note that GitHub pages takes a little time to deploy newly pushed code, if the links above don't work or you see old versions, wait 5 minutes and try again.

You can check the GitHub pages deployment action list to see the current status of the deployments.

@WordPress WordPress deleted a comment from github-actions bot Dec 12, 2022
@WordPress WordPress deleted a comment from github-actions bot Dec 12, 2022
@dhruvkb dhruvkb marked this pull request as ready for review December 12, 2022 14:37
@dhruvkb dhruvkb requested a review from a team as a code owner December 12, 2022 14:37
Copy link
Contributor

@sarayourfriend sarayourfriend left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is exciting! One request I would have is to use Django's query count checks to ensure that search hits for each image and audio only ever require at most a single query: https://pytest-django.readthedocs.io/en/latest/helpers.html#django-assert-num-queries

We also need to address the renamed column. If it is absolutely necessary for some reason (we need to justify this very carefully considering the implications of downtime) then we need to carefully plan it. If it can at all be avoided, either via a property to unify the API or, if the column definitely needs to be renamed, then via the multi-step zero-downtime approach I described in the other comment (though again, carefully justified, considering how much work it is to do it, including needing to write a command to incrementally move the data between the two columns).

Exciting PR nonetheless 🙂

api/catalog/api/examples/audio_responses.py Outdated Show resolved Hide resolved
api/catalog/api/examples/audio_responses.py Show resolved Hide resolved
Comment on lines 32 to 41
migrations.RenameField(
model_name='imagereport',
old_name='created_at',
new_name='created_on',
),
migrations.RenameField(
model_name='audioreport',
old_name='created_at',
new_name='created_on',
),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We cannot use RenameField with zero downtime deployments. Do we need to rename this or could we use a property to unify the API without needing to have this migration happen? If it absolutely 100% needs to happen then we need to do a multi-step migration, first to create the new column, then to move the data to the new column, then to delete the old column.

Copy link
Member Author

@dhruvkb dhruvkb Dec 13, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I updated the code in a6e3759 to not include these frivolous changes. It's not worth the incurred downtime.

Copy link
Member Author

@dhruvkb dhruvkb Dec 20, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry for digging up a past conversation, but would RenameField operations be okay if we preserve the underlying column name and only change the model field? Then the SQL output of the ensuing migration would essentially be a no-op and would not cause a downtime afaik.

api/catalog/api/migrations/0053_audio_audioset.py Outdated Show resolved Hide resolved
api/catalog/api/models/audio.py Show resolved Hide resolved
api/catalog/api/models/audio.py Outdated Show resolved Hide resolved
api/test/dead_link_filter_test.py Show resolved Hide resolved
api/test/media_integration.py Outdated Show resolved Hide resolved
@WordPress WordPress deleted a comment from github-actions bot Dec 13, 2022
@WordPress WordPress deleted a comment from github-actions bot Dec 13, 2022
@WordPress WordPress deleted a comment from github-actions bot Dec 14, 2022
@WordPress WordPress deleted a comment from github-actions bot Dec 16, 2022
@krysal
Copy link
Member

krysal commented Dec 19, 2022

As much as I'd like to, I don't think I'll get to review this in the week, so tagging other folks in case they can/want to.

@krysal krysal requested a review from stacimc December 19, 2022 17:26
@WordPress WordPress deleted a comment from github-actions bot Dec 20, 2022
@WordPress WordPress deleted a comment from github-actions bot Dec 20, 2022
@WordPress WordPress deleted a comment from github-actions bot Dec 20, 2022
@WordPress WordPress deleted a comment from github-actions bot Dec 20, 2022
@dhruvkb
Copy link
Member Author

dhruvkb commented Dec 20, 2022

The migration 0052 is largely a no-op with only 2 actual SQL statements! This can be verified with the sqlmigrate command.

$ just dc exec web python manage.py sqlmigrate api 0052
BEGIN;
--
-- Add field audioset to audio
-- (no-op)
--
-- Alter field identifier on audioreport
CREATE INDEX "nsfw_reports_audio_identifier_ebe3a079" ON "nsfw_reports_audio" ("identifier");
--
-- Alter field identifier on deletedaudio
-- (no-op)
--
-- Alter field identifier on deletedimage
-- (no-op)
--
-- Alter field identifier on imagereport
CREATE INDEX "nsfw_reports_identifier_f0374e03" ON "nsfw_reports" ("identifier");
--
-- Alter field identifier on matureaudio
-- (no-op)
--
-- Alter field identifier on matureimage
-- (no-op)
--
-- Rename field identifier on audioreport to media_obj
-- (no-op)
--
-- Rename field identifier on deletedaudio to media_obj
-- (no-op)
--
-- Rename field identifier on deletedimage to media_obj
-- (no-op)
--
-- Rename field identifier on imagereport to media_obj
-- (no-op)
--
-- Rename field identifier on matureaudio to media_obj
-- (no-op)
--
-- Rename field identifier on matureimage to media_obj
-- (no-op)
COMMIT;

Copy link
Contributor

@AetherUnbound AetherUnbound left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay, I think I understand everything happening here! This is definitely some ORM magic 🤯 I ran this locally as described and confirmed that only 1 query is now made for search results. I also migrated my local database, then hopped back over to main to make sure we were backwards compatible after the migration. I wasn't able to raise any errors after several searches and poking around in the admin UI. Based on that and your lovely SQL plan output (thanks for posting that @dhruvkb) I think this is good to go 🙂

Copy link
Contributor

@sarayourfriend sarayourfriend left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Brilliant, thanks @dhruvkb. Excellent work sorting through the complexities here 🚀

@WordPress WordPress deleted a comment from github-actions bot Dec 20, 2022
@github-actions
Copy link

This PR has migrations. Please rebase it before merging to ensure that conflicting migrations are not introduced.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
💻 aspect: code Concerns the software code in the repository 🌟 goal: addition Addition of new feature migrations Modifications to Django migrations 🟨 priority: medium Not blocking but should be addressed soon
Projects
None yet
Development

Successfully merging this pull request may close these issues.

attribution is null in search results view
4 participants