-
Notifications
You must be signed in to change notification settings - Fork 212
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Replace media_url
with url
in provider scripts
#1891
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It looks like there's a failing test for the provider data ingester:
=================================== FAILURES ===================================
_____________________________ test_get_batch_data ______________________________
[gw1] linux -- Python 3.10.10 /usr/local/bin/python
def test_get_batch_data():
response_json = _get_resource_json("complete_response.json")
batch = ingester.get_batch_data(response_json)
> assert batch == EXPECTED_BATCH_DATA
E AssertionError: assert [{'id': 100, ...le 102', ...}] == [{'id': 100, ...534_web.jpg'}]
E At index 0 diff: {'id': 100, 'image_url': 'https://openaccess-cdn.clevelandart.org/1916.586.a/1916.586.a_web.jpg', 'media_type': 'image', 'title': 'Title 100', 'url': 'https://clevelandart.org/art/1916.586.a'} != {'id': 100, 'media_type': 'image', 'title': 'Title 100', 'url': 'https://openaccess-cdn.clevelandart.org/1916.586.a/1916.586.a_web.jpg'}
E Use -v to get more diff
tests/dags/providers/provider_api_scripts/test_provider_data_ingester.py:137: AssertionError
=========================== short test summary info ============================
FAILED tests/dags/providers/provider_api_scripts/test_provider_data_ingester.py::test_get_batch_data
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm also getting the following when I run the SMK DAG locally:
[2023-04-25, 00:09:13 UTC] {slack.py:320} INFO -
*DAG*: `smk_workflow`
*Task*: `ingest_data.pull_image_data`
*Logical Date*: 2023-03-01T00:00:00Z
*Log*: http://localhost:8080/log?execution_date=2023-03-01T00%3A00%3A00%2B00%3A00&task_id=ingest_data.pull_image_data&dag_id=smk_workflow&map_index=-1
*Exception*: ImageStore.add_item() missing 1 required positional argument: 'url'
*Exception Type*: `builtins.TypeError`
It looks like this piece was not updated:
"image_url": img.get("image_url"), |
I think with a change this large, we'll need to be sure we are able to run each affected DAG successfully locally before we deploy it (I wish there was a better way, maybe some ingestion testing in the future!)
@@ -123,13 +123,13 @@ def clean_media_metadata(self, **media_data) -> dict | None: | |||
for field in [ | |||
"foreign_identifier", | |||
"foreign_landing_url", | |||
f"{self.media_type}_url", | |||
"url", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Whew, this is a nice change to see!
f53a1f7
to
acca578
Compare
f87e1a5
to
a57ca3f
Compare
@obulat Is this PR ready for review or it is a work in progress? |
a57ca3f
to
f26ace3
Compare
I've rebased it onto the license_info PR, and am going to draft it until that PR is merge. I will run the ingestion for all providers locally after that to check that everything works as expected. |
9ce9453
to
6a749c9
Compare
f26ace3
to
1c7ca3c
Compare
6a749c9
to
127781b
Compare
1c7ca3c
to
1026c55
Compare
b6796a6
to
fbad818
Compare
@AetherUnbound, I ran data ingestion locally, and did not get any |
fbad818
to
d1c714b
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks great, thanks for testing all the provider scripts! 🙌🏼 I have one thought for a variable rename but it's not blocking.
@@ -52,7 +52,7 @@ class NyplDataIngester(ProviderDataIngester): | |||
# NYPL returns a list of image objects, with the dimension encoded | |||
# in the URL's query parameter. | |||
# This list is in order from the largest image to the smallest one. | |||
image_url_dimensions = ["g", "v", "q", "w", "r"] | |||
url_dimensions = ["g", "v", "q", "w", "r"] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe this would be better as image_dimensions
?
Based on the medium urgency of this PR, the following reviewers are being gently reminded to review this PR: @krysal Excluding weekend1 days, this PR was ready for review 4 day(s) ago. PRs labelled with medium urgency are expected to be reviewed within 4 weekday(s)2. @obulat, if this PR is not ready for a review, please draft it to prevent reviewers from getting further unnecessary pings. Footnotes
|
d1c714b
to
7801638
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. Great to have the url
column simplified 👏
Fixes
Fixes #1409 by @stacimc
Description
This PR renames
audio_url
andimage_url
in the provider API scripts andImageStore
/AudioStore
classes tourl
, which simplifies the code significantly.Testing Instructions
The CI should pass. The tests have been updated to use
url
instead ofimage_url
/audio_url
Checklist
Update index.md
).main
) or a parent feature branch.Developer Certificate of Origin
Developer Certificate of Origin