Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Consider using url as field name in provider scripts #1409

Closed
1 task
stacimc opened this issue Oct 7, 2022 · 2 comments · Fixed by #1891
Closed
1 task

Consider using url as field name in provider scripts #1409

stacimc opened this issue Oct 7, 2022 · 2 comments · Fixed by #1891
Assignees
Labels
💻 aspect: code Concerns the software code in the repository ✨ goal: improvement Improvement to an existing user-facing feature 🟨 priority: medium Not blocking but should be addressed soon 🧱 stack: catalog Related to the catalog and Airflow DAGs

Comments

@stacimc
Copy link
Collaborator

stacimc commented Oct 7, 2022

Current Situation

Currently in our provider scripts, we use a different field name per media type for the url field -- so, image_url and audio_url. In the actual database, these all just map to url, and so we have to manually convert the field name in each of the associated media stores (example in ImageStore).

Suggested Improvement

Unless there's a reason it must be done this way, I'd like to just use url everywhere.

Benefit

  • We would be able to remove the MediaStore logic to convert the field name
  • It would be much easier to document what fields are needed in provider scripts, and what they map to in the actual database.
  • It would be much easier for providers that consume media of multiple types to deal with URLs. See here an example of Wikimedia needing to set the url, then later pop it off the data and rename it depending on the media type, only for it to just get changed back in the media store
  • It would be much easier to define types for ImageData and AudioData.

Additional context

I'm unaware of the reasoning for the current implementation, so it's possible there's a critical reason why these fields are renamed and we should be sure to test thoroughly.

Implementation

  • 🙋 I would be interested in implementing this feature.
@stacimc stacimc added 🟨 priority: medium Not blocking but should be addressed soon ✨ goal: improvement Improvement to an existing user-facing feature 💻 aspect: code Concerns the software code in the repository labels Oct 7, 2022
@AetherUnbound
Copy link
Collaborator

Fully support this! I honestly wasn't even aware of this discrepancy until recently, even after working with the provider scripts for some time.

@obulat
Copy link
Contributor

obulat commented Dec 12, 2022

I tried finding a reason behind image_url as the field name, and couldn't. The database column name seems to always have been url, and the PR that adds ImageStore simply says that image_url in ImageStore corresponds to url in the database: cc-archive/cccatalog#246

I support the change 👍

@obulat obulat added the 🧱 stack: catalog Related to the catalog and Airflow DAGs label Feb 23, 2023
@github-project-automation github-project-automation bot moved this to 📋 Backlog in Openverse Backlog Apr 17, 2023
@obulat obulat transferred this issue from WordPress/openverse-catalog Apr 17, 2023
@obulat obulat self-assigned this Apr 23, 2023
@obulat obulat moved this from 📋 Backlog to 🏗 In progress in Openverse Backlog Apr 23, 2023
@github-project-automation github-project-automation bot moved this from 🏗 In progress to ✅ Done in Openverse Backlog May 17, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
💻 aspect: code Concerns the software code in the repository ✨ goal: improvement Improvement to an existing user-facing feature 🟨 priority: medium Not blocking but should be addressed soon 🧱 stack: catalog Related to the catalog and Airflow DAGs
Projects
Archived in project
Development

Successfully merging a pull request may close this issue.

3 participants