-
Notifications
You must be signed in to change notification settings - Fork 54
Add a Nappy provider DAG using ProviderDataIngester #796
Conversation
- Breaks out into several files - Removes documentation that is redundant (copied from code) - Prefers documentation within the template - Explicitly documents advanced options as FAQ - Some small updates to the templating
@@ -13,26 +13,27 @@ | |||
|
|||
|
|||
# Default provider names |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I sorted this list alphabetically, but can revert if desired.
I'd love preliminary reviews on this, even while it's drafted, @WordPress/openverse-catalog |
Ooh, and one small issue I'm observing. I'm not seeing the |
That's right -- we actually hardcode |
@stacimc yep, that's right regarding SMK. Maybe I'll stick |
Oh sweet, there are thumbnails available for Rawpixel too but I wasn't sure where to add them either. I'll add that to #795! |
Added! Oh, and looking over the PR description it seems like there are a few metrics we could use for popularity. Would you be willing to add those to the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for starting this @zackkrida and for picking this up @rwidom! I have a number of suggested changes, mostly removing some files/comment lines. I also have a suggestion about batch size & popularity-related updates.
I ran this locally and it worked great!
# Hardoded to CC0, the only license Nappy.co uses | ||
license_info = get_license_info( | ||
"https://creativecommons.org/publicdomain/zero/1.0/" | ||
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If all results are CC0, we should set this as a value on the class or instance and use it there rather than calling this function for every record!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oops! Totally, yes.
meta_data = { | ||
"views": data.get("views"), | ||
"saves": data.get("saves"), | ||
"downloads": data.get("downloads"), | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for adding these! We'll also want to add downloads
to the DDL and the image popularity metrics test!
Co-authored-by: Madison Swain-Bowden <[email protected]>
Co-authored-by: Madison Swain-Bowden <[email protected]>
Co-authored-by: Madison Swain-Bowden <[email protected]>
Co-authored-by: Madison Swain-Bowden <[email protected]>
Co-authored-by: Madison Swain-Bowden <[email protected]>
Co-authored-by: Madison Swain-Bowden <[email protected]>
Co-authored-by: Madison Swain-Bowden <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Whew! This is almost good to go, @rwidom would you be willing to move the _convert_filesize
function to a static method and add tests for it as well? I can do that if that's easier 🙂
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Excellent, thank you @rwidom and @zackkrida!
pytest.param("4kB", 4_000, id="happy_kB"), | ||
pytest.param("4MB", 4_000_000, id="happy_MB"), | ||
pytest.param("4GB", 4_000_000_000, id="happy_GB"), | ||
pytest.param("", None, id="empty_string"), | ||
pytest.param([], None, id="not_a_string"), | ||
pytest.param("gibberish", None, id="gibberish"), | ||
pytest.param("10.3kB", 10_300, id="decimal"), | ||
pytest.param("10.12345kB", 10_123, id="rounding"), | ||
pytest.param(" 4 kB ", 4_000, id="extra_spaces"), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think these are ready, but do we need someone else to sign off, @zackkrida ?
Oops, I guess not and that's why you removed the request from Staci! :) |
@rwidom I removed the Staci request because I was going to review the PR. I have, and I think this is ready to merge! Thank you so much again for finishing the PR. |
Fixes WordPress/openverse#1445 by @zackkrida
Description
Adds Nappy.co to the Catalog using the new provider DAG template in #790. This is mostly to test that PR. I've always wanted to write a provider DAG and this API was literally handwritten for us, so it was a perfect test case.
Sample results (csv)
TODO
Popularity data
Testing Instructions
just recreate
and run thenappy_workflow
through the Airflow UI. It should load2,059
images pretty quickly.Checklist
Update index.md
).main
) or a parent feature branch.Developer Certificate of Origin
Developer Certificate of Origin