Skip to content
This repository has been archived by the owner on Aug 4, 2023. It is now read-only.

Add Audio to the database #111

Merged
merged 26 commits into from
Jul 2, 2021
Merged

Add Audio to the database #111

merged 26 commits into from
Jul 2, 2021

Conversation

obulat
Copy link
Contributor

@obulat obulat commented Jun 25, 2021

Related to WordPress/openverse#19
This is a cleaned-up version of #64 part related to saving the Audio data to the database. And #69 :)

This PR:

  • creates a new Audio database. The sql scripts for creation of the Postgres tables were added to both local_postgres folder that is used for local testing, and the openledger_sql folder which is [probably] used in production.

  • Reorders the AudioStore columns so that audio-specific columns are all at the end. This makes it easier to write and read the data, and to test. It would be nice to do the same for ImageStore, however we have many TSV files that use the columns in the current order, so we would need to detect the TSV file structure and process it accordingly.

  • adds media type parameter to the loader functions so that they can select correct list of columns to write to the database.

  • adds audio-specific SQL queries to loader/sql.py for creating audio tables, upserting and overwriting audio in the database.

Added on June 30:

  • moves watermarked column from image-specific columns to a list of columns common for all media types, as per API models, as suggested by @krysal .
  • detects correct ingestion_type column value for all media based on source value.

There are three different places where data is added or overwritten in the database, and in each place the columns are listed separately because different columns are affected. I would like to have this process abstracted in the future, so that we don't have to update column lists in all three places.

Signed-off-by: Olga Bulat [email protected]

obulat added 19 commits June 17, 2021 15:44
This reverts commit f3785d0

Signed-off-by: Olga Bulat <[email protected]>
Signed-off-by: Olga Bulat <[email protected]>
Signed-off-by: Olga Bulat <[email protected]>
# Conflicts:
#	src/cc_catalog_airflow/dags/common/storage/image.py
#	src/cc_catalog_airflow/dags/common/storage/test_image.py
Signed-off-by: Olga Bulat <[email protected]>
Signed-off-by: Olga Bulat <[email protected]>
Signed-off-by: Olga Bulat <[email protected]>
@obulat obulat requested a review from a team as a code owner June 25, 2021 10:08
@obulat obulat requested review from zackkrida and dhruvkb and removed request for a team June 25, 2021 10:08
Signed-off-by: Olga Bulat <[email protected]>
Base automatically changed from extract_media_db to main June 25, 2021 12:26
@obulat obulat changed the title Add MediaStore entity Add Audio to the database Jun 25, 2021
Copy link
Member

@zackkrida zackkrida left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Small bug i think

@zackkrida zackkrida self-requested a review June 25, 2021 15:17
zackkrida
zackkrida previously approved these changes Jun 25, 2021
@zackkrida
Copy link
Member

I would like to have this process abstracted in the future, so that we don't have to update column lists in all three places.

👍 👍 👍 👍 👍 👍 👍 👍 👍 👍 👍 👍 👍 👍 👍 👍 👍 👍 👍 👍 👍 👍 👍 👍 👍 👍 👍 👍 👍 👍

Signed-off-by: Olga Bulat <[email protected]>
dhruvkb
dhruvkb previously approved these changes Jun 29, 2021
Copy link
Member

@dhruvkb dhruvkb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't understand it fully but from what I do, it seems like an extension of the existing code to accept a media_type and set the columns accordingly. If my understanding is correct, it looks good to me.

Copy link
Member

@zackkrida zackkrida left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🎶 🎶 🎶

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
🌟 goal: addition Addition of new feature
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants