-
Notifications
You must be signed in to change notification settings - Fork 54
Create a Provider API script template #93
Conversation
Signed-off-by: Olga Bulat <[email protected]>
# Conflicts: # src/cc_catalog_airflow/dags/common/__init__.py
Signed-off-by: Olga Bulat <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added some small changes, I'm sure this will be reviewed and re-reviewed a few times until we finish. You might also be interested this tool:
Co-authored-by: Zack Krida <[email protected]>
Co-authored-by: Zack Krida <[email protected]>
This tool looks neat! What I think is good about the current approach is the fact that the contributors will not have to install any additional packages, only the ones required to run the API script( And using provider API scripts as an entry to contributing is good because contributors won't even have to set up Catalog in Docker! On Windows, it's very difficult for people new to programming: you have to either have Professional edition of Windows, or install WSL2 system. |
@krysal it would be excellent for you to test this template by writing a Provider API Script for stocksnap.io. I have gotten more information that should help. They have an api we can use available at https://stocksnap.io/api/load-photos/date/desc/1 where "1" is the page number. I did some simple testing and there's around 33k records at this API. Please copy Olga's template files into a new PR and document the experience of using them. This should be a great way to learn what is missing in the files; any questions you have or problems you encounter are a good thing and will help us make the template better! |
Quick docs (to be expanded!):
By default, it creates a script for images. If you are writing a script for an audio provider, you need to add
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When attempting to run the new
create_api_script
command I got the following errors:FileNotFoundError: [Errno 2] No such file or directory: 'dags/provider_api_scripts/stocksnap.py'
Fixed by switching from dot relative path (Path('.'
)) to absolute (Path(__file__)
)
Signed-off-by: Olga Bulat <[email protected]>
Co-authored-by: Zack Krida <[email protected]>
Signed-off-by: Olga Bulat <[email protected]>
TODO: Add a TODO to add a popularity metric to meta_data (views/listens, likes, downloads etc). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry the delayed review! I was able to run the create_api_script.py
only with its full path. Otherwise the No such file or directory
error persists. I had to do some tweaks but I agree this will be a very good starting point for new contributors once finished.
""" | ||
This is template for an API script. Broadly, there are several steps: | ||
1. Download batches of information for the query for openly-licensed media | ||
2. For each item in batch, extract the necessary meta data. | ||
3. Save the metadata using ImageStore.add_item or AudioStore.add_item methods | ||
|
||
Try to write small functions that are easier to test. Don't forget to | ||
write tests, too! | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good overview to add a source, and remarking tests 👍
Co-authored-by: Krystle Salazar <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I finished testing the test template and made some minor adjustments, now it's ready to go! 🎶
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM!
The base branch was changed.
Signed-off-by: Olga Bulat <[email protected]>
Signed-off-by: Olga Bulat <[email protected]>
Signed-off-by: Olga Bulat <[email protected]>
Signed-off-by: Olga Bulat <[email protected]>
Signed-off-by: Olga Bulat <[email protected]>
Signed-off-by: Olga Bulat <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is really helpful, thank you! 🌟
Fixes WordPress/openverse#1734
This PR creates a Cookiecutter-like script for writing Provider API scripts.
It needs more testing and documentation.
To run the script, create and activate a virtual environment, navigate to the templates folder and run it with provider name as a parameter:
This will create three files:
There are some instructions and a lot of #TODO comments in the file. It would be really helpful if you note any problems, difficulties, or anything that's unclear during testing.
Signed-off-by: Olga Bulat [email protected]