Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use factory_boy for catalog test data generation #4751

Open
sarayourfriend opened this issue Aug 13, 2024 · 0 comments
Open

Use factory_boy for catalog test data generation #4751

sarayourfriend opened this issue Aug 13, 2024 · 0 comments
Labels
🤖 aspect: dx Concerns developers' experience with the codebase ✨ goal: improvement Improvement to an existing user-facing feature 🟩 priority: low Low priority and doesn't need to be rushed 🧱 stack: catalog Related to the catalog and Airflow DAGs

Comments

@sarayourfriend
Copy link
Collaborator

Problem

The API uses factory_boy for generating test data. The catalog tests do not, and as such, require far more manual configuration of test data, for example:

https://github.com/WordPress/openverse/blob/change/tag-upsert-strategy/catalog/tests/dags/database/test_batched_update.py#L77-L106

This code is in a private function in one test suite, and defaults must manually be configured instead of automatically handled.

This increases the overhead of testing with sample data, which makes writing tests less pleasant over time.

Description

factory_boy has bulk creation methods, and other useful approaches for making the generation of test data a non-issue when writing tests, and centralises the implementation of test data generation so improvements for one test can be shared by all existing and future tests.

We can use factory_boy to generate test data, but inserting it into the database will still require translating the class instance data, because while it supports several ORMs, our Airflow DAGs do not use any of them, not even for table description.

Once we implement #416 we can use the SQLAlchemy (or similar) integrations available for factory_boy, so that test data is easy to generate and maintain.

In the mean time, factory_boy can be used to generate instances of DTOs that get translated to the dicts used to insert data into one table or another as needed, using our own methods to translate the DTOs into those dicts.

Alternatives

Implement our own version of this with bespoke functions similar to the one in the code I linked above. That could be fine too, it just doesn't come with the benefits of years of maintenance and experience in a library like factory_boy meant to make this specific thing easy.

@sarayourfriend sarayourfriend added 🟩 priority: low Low priority and doesn't need to be rushed ✨ goal: improvement Improvement to an existing user-facing feature 🤖 aspect: dx Concerns developers' experience with the codebase 🧱 stack: catalog Related to the catalog and Airflow DAGs labels Aug 13, 2024
@openverse-bot openverse-bot moved this to 📋 Backlog in Openverse Backlog Aug 13, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
🤖 aspect: dx Concerns developers' experience with the codebase ✨ goal: improvement Improvement to an existing user-facing feature 🟩 priority: low Low priority and doesn't need to be rushed 🧱 stack: catalog Related to the catalog and Airflow DAGs
Projects
Status: 📋 Backlog
Development

No branches or pull requests

1 participant