Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Refactor serializers to match media models and ES indices #745

Open
1 task
dhruvkb opened this issue Oct 15, 2021 · 1 comment
Open
1 task

Refactor serializers to match media models and ES indices #745

dhruvkb opened this issue Oct 15, 2021 · 1 comment
Labels
💻 aspect: code Concerns the software code in the repository ✨ goal: improvement Improvement to an existing user-facing feature help wanted Open to participation from the community 🟨 priority: medium Not blocking but should be addressed soon 🧱 stack: api Related to the Django API

Comments

@dhruvkb
Copy link
Member

dhruvkb commented Oct 15, 2021

Description

  • Clean up the use of validations such as allow_null, allow_blank and required in serializers:

    • Some serializers are used for deserializing input text to data and some for serializing data to JSON. The validations mean different things during serialization and deserialization.
    • The same serializer is used for Hit when searching but for Media subclasses when retrieving details. This can cause issues due to Hit fields being a subset of those in Media.
    • These validations, which are currently undocumented, can be cleaned up and their use can be documented.
  • Since the serializers are not structured in correspondence to the mixins making up media classes, they tend to go out of sync when the Django models are migrated or when ES indices change.

Expectation

Validations should be used cautiously and with documentation so that they can stay up to date with changes to Media (made by Django model migrations) or changes to Hit (made by changes to ES indices).

Resolution

  • 🙋 I would be interested in resolving this bug.
@dhruvkb dhruvkb added help wanted Open to participation from the community 🟨 priority: medium Not blocking but should be addressed soon ✨ goal: improvement Improvement to an existing user-facing feature 💻 aspect: code Concerns the software code in the repository labels Oct 15, 2021
@obulat obulat transferred this issue from WordPress/openverse-api Feb 22, 2023
@github-project-automation github-project-automation bot moved this to 📋 Backlog in Openverse Backlog Feb 23, 2023
@dhruvkb
Copy link
Member Author

dhruvkb commented Feb 26, 2023

Since we are no longer serializing Hit instances directly after WordPress/openverse-api#1040, part of the issue pertaining to Hit can be considered solved.

@krysal krysal added 🧱 stack: api Related to the Django API and removed 🧱 stack: backend labels Mar 6, 2023
dhruvkb pushed a commit that referenced this issue Apr 14, 2023
* cleaning and temp table in pg

* sketch of full dag NOT TESTED

* inaturalist dag without tests or reporting (yet)

* complete dag, 25 mill recs in 5.5 hours local test

* Add passwords for s3 testing with new docker

* make temp loading table UNLOGGED to load it faster

* inat with translation 75 million recs in 8 hrs

* using OUTPUT_DIR for API files

* clarify delayed requester vs requester

* DRYer approach to tags TO DO

* comments on taxa transformation

* scientific names not ids for manual translation

* TO DO comment clean-up

* fix name insert syntax

* Merge 'main' into feature/inaturalist-performance

* add clarity on batch limit override

* missing piece of merge from main

* limit to 20 tags per photo

* add option to use alternate dag creation for sql

* adjust tests see issue #898

* slightly faster way to pull medium test sample

* Note another data source for vernacular names

* remove unnecessary test code

* clean and upsert one batch at a time

* log parsing resource doc

* use common.constants.IMAGE instead of MEDIA_TYPE

* add explanation of ancestry joins and taxa tags

* use existing clean_intermediate_table_data

* remove unnecessary env vars from load_to_s3

* declarative doc string for file update check

* update iNaturalist description

* remove message to Staci :)

* use dynamically generated load subtasks

* clarify taxa comments and include languages

* consolidate consolidation code

* add testing for consolidated metrics

* separate ti_mock instances per test

* test get batches

* shorter titles to save space

* add better testing instructions

* dag parameter to manage post-ingestion deletions

* Add kwargs to get_response_json call

* get_media_type can be static method

Co-authored-by: Krystle Salazar <[email protected]>

* link to original inaturalist photo, rather than medium

Co-authored-by: Krystle Salazar <[email protected]>

* prefer creator name over login

* remove unused constants

* add to do for extension cleanup

Co-authored-by: Madison Swain-Bowden <[email protected]>
Co-authored-by: Krystle Salazar <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
💻 aspect: code Concerns the software code in the repository ✨ goal: improvement Improvement to an existing user-facing feature help wanted Open to participation from the community 🟨 priority: medium Not blocking but should be addressed soon 🧱 stack: api Related to the Django API
Projects
Status: 📋 Backlog
Development

No branches or pull requests

3 participants