Skip to content
This repository has been archived by the owner on Aug 4, 2023. It is now read-only.

Create a Provider API script template #93

Merged
merged 27 commits into from
Jul 9, 2021
Merged

Create a Provider API script template #93

merged 27 commits into from
Jul 9, 2021

Conversation

obulat
Copy link
Contributor

@obulat obulat commented Jun 8, 2021

Fixes WordPress/openverse#1734

This PR creates a Cookiecutter-like script for writing Provider API scripts.
It needs more testing and documentation.

To run the script, create and activate a virtual environment, navigate to the templates folder and run it with provider name as a parameter:

cd src/cc_catalog_airflow/templates/
python3 create_api_script.py <provider_name>

This will create three files:

  1. The API provider script with a skeleton implementation
  2. The test file (blank)
  3. The workflow file which doesn't really need much change, I suppose.

There are some instructions and a lot of #TODO comments in the file. It would be really helpful if you note any problems, difficulties, or anything that's unclear during testing.

Signed-off-by: Olga Bulat [email protected]

obulat added 2 commits June 9, 2021 15:11
# Conflicts:
#	src/cc_catalog_airflow/dags/common/__init__.py
Signed-off-by: Olga Bulat <[email protected]>
zackkrida
zackkrida previously approved these changes Jun 9, 2021
Copy link
Member

@zackkrida zackkrida left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added some small changes, I'm sure this will be reviewed and re-reviewed a few times until we finish. You might also be interested this tool:

https://plopjs.com/

@obulat
Copy link
Contributor Author

obulat commented Jun 11, 2021

Added some small changes, I'm sure this will be reviewed and re-reviewed a few times until we finish. You might also be interested this tool:

https://plopjs.com/

This tool looks neat!

What I think is good about the current approach is the fact that the contributors will not have to install any additional packages, only the ones required to run the API script(requests, lxml maybe?).

And using provider API scripts as an entry to contributing is good because contributors won't even have to set up Catalog in Docker! On Windows, it's very difficult for people new to programming: you have to either have Professional edition of Windows, or install WSL2 system.

@zackkrida
Copy link
Member

Yes, it's great that they're generally small standalone scripts that are straightforward to review and test. This is looking great and I'd like @dhruvkb and @krysal to take a look this week too

@zackkrida
Copy link
Member

@krysal it would be excellent for you to test this template by writing a Provider API Script for stocksnap.io. I have gotten more information that should help.

They have an api we can use available at https://stocksnap.io/api/load-photos/date/desc/1 where "1" is the page number. I did some simple testing and there's around 33k records at this API. Please copy Olga's template files into a new PR and document the experience of using them. This should be a great way to learn what is missing in the files; any questions you have or problems you encounter are a good thing and will help us make the template better!

@obulat
Copy link
Contributor Author

obulat commented Jun 16, 2021

Quick docs (to be expanded!):
To create an API script using, run

cd src/cc_catalog_airflow/templates/
python3 create_api_script.py --provider <name>

By default, it creates a script for images. If you are writing a script for an audio provider, you need to add --media audio argument:

python3 create_api_script.py --provider <name> --media audio

Copy link
Member

@zackkrida zackkrida left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When attempting to run the new create_api_script command I got the following errors:

FileNotFoundError: [Errno 2] No such file or directory: 'dags/provider_api_scripts/stocksnap.py'

Fixed by switching from dot relative path (Path('.')) to absolute (Path(__file__))

src/cc_catalog_airflow/templates/create_api_script.py Outdated Show resolved Hide resolved
@obulat
Copy link
Contributor Author

obulat commented Jun 18, 2021

TODO: Add a TODO to add a popularity metric to meta_data (views/listens, likes, downloads etc).

Copy link
Member

@krysal krysal left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry the delayed review! I was able to run the create_api_script.py only with its full path. Otherwise the No such file or directory error persists. I had to do some tweaks but I agree this will be a very good starting point for new contributors once finished.

src/cc_catalog_airflow/templates/workflow.py_template Outdated Show resolved Hide resolved
Comment on lines +21 to +29
"""
This is template for an API script. Broadly, there are several steps:
1. Download batches of information for the query for openly-licensed media
2. For each item in batch, extract the necessary meta data.
3. Save the metadata using ImageStore.add_item or AudioStore.add_item methods

Try to write small functions that are easier to test. Don't forget to
write tests, too!

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good overview to add a source, and remarking tests 👍

krysal
krysal previously approved these changes Jul 2, 2021
Copy link
Member

@krysal krysal left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I finished testing the test template and made some minor adjustments, now it's ready to go! 🎶

zackkrida
zackkrida previously approved these changes Jul 6, 2021
Copy link
Member

@zackkrida zackkrida left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

@obulat obulat changed the base branch from main to licenses July 7, 2021 09:02
@obulat obulat changed the base branch from licenses to main July 7, 2021 09:05
@obulat obulat dismissed stale reviews from zackkrida and krysal July 7, 2021 09:05

The base branch was changed.

@obulat obulat changed the base branch from main to licenses July 7, 2021 09:05
Copy link
Member

@krysal krysal left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is really helpful, thank you! 🌟

@obulat obulat merged commit f5e8a9d into licenses Jul 9, 2021
@zackkrida zackkrida deleted the template branch July 9, 2021 16:18
@zackkrida zackkrida added the 🌟 goal: addition Addition of new feature label Aug 12, 2021
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
🌟 goal: addition Addition of new feature
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[Feature] Add a Provider API Script template
3 participants