-
Notifications
You must be signed in to change notification settings - Fork 215
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add Collect creator data from Europeana API #5057
Conversation
Yo @obulat, can you please review this PR? It adds a new method to the EuropeanaRecordBuilder class that collects creator data from the dcCreator field. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This PR is almost there, @dryruffian! There are two points that need to be fixed to make it working correctly.
@@ -68,6 +68,7 @@ def get_record_data(self, data: dict) -> dict | None: | |||
"foreign_identifier": self._get_foreign_identifier(data), | |||
"meta_data": self._get_meta_data_dict(data), | |||
"title": self._get_title(data), | |||
"creator": self._get_creator(item_data), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"creator": self._get_creator(item_data), | |
"creator": self._get_creator(data), |
I tested this PR using October 1st as the date. The dcCreator
is available in data
dictionary, not item_data
.
creators = item_data.get("dcCreator", []) | ||
if not creators: | ||
return None | ||
return creators if isinstance(creators, str) else ", ".join(creators) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it's better to check if the creators is an instance of list or not:
if isinstance(creators, list):
return ", ".join(creators)
elif isinstance(creators, str):
return creators
return None
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thank's @obulat for the suggestion I have updated the code to check creators is an instance of list or not. I have also added also tests for some edge cases
"item_data, expected", | ||
[ | ||
# Single creator in a list | ||
pytest.param({"dcCreator": ["Chandler"]}, "Chandler", id="single_creator"), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
😆
Hi, @obulat can you assist me with the codespell issue |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Jumping in to leave some observations!
expected_creator = ( | ||
"http://hispana.mcu.es/lod/oai:bibliotecadigital.jcyl.es:26229#ent2, " |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You might need to fix these strings before fixing the grammar.
expected_creator = ( | |
"http://hispana.mcu.es/lod/oai:bibliotecadigital.jcyl.es:26229#ent2, " | |
expected_creator = ( | |
"http://hispana.mcu.es/lod/oai:bibliotecadigital.jcyl.es:26229#ent2, " |
But also, given the signature of the new _get_creator
method, it should return a string, not a set.
Based on the contributor urgency of this PR, the following reviewers are being gently reminded to review this PR: @obulat Excluding weekend1 days, this PR was ready for review 4 day(s) ago. PRs labelled with contributor urgency are expected to be reviewed within 3 weekday(s)2. @dryruffian, if this PR is not ready for a review, please draft it to prevent reviewers from getting further unnecessary pings. Footnotes
|
Hi @krysal, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is almost there! Just 2 tiny changes needed
@@ -163,7 +163,7 @@ def _get_creator(self, data: dict) -> str | None: | |||
if isinstance(creators, list): | |||
if not creators: # Explicitly handle empty list |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It would be better to move the if not creators
condition higher, so we return None
both for an empty list and an empty string (""
).
catalog/tests/dags/providers/provider_api_scripts/test_europeana.py
Outdated
Show resolved
Hide resolved
catalog/tests/dags/providers/provider_api_scripts/test_europeana.py
Outdated
Show resolved
Hide resolved
.codespell/ignore_lines.txt
Outdated
@@ -19,4 +19,4 @@ | |||
|
|||
;; packages/js/eslint-plugin/configs/vue.ts | |||
;; `te` gets matched with `the` and others | |||
const i18nDestructureRules = ["t", "tc", "te", "td", "d", "n"].map( | |||
const i18nDestructureRules = ["t", "tc", "te", "td", "d", "n"].map( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
const i18nDestructureRules = ["t", "tc", "te", "td", "d", "n"].map( | |
const i18nDestructureRules = ["t", "tc", "te", "td", "d", "n"].map( |
catalog/tests/dags/providers/provider_api_scripts/test_europeana.py
Outdated
Show resolved
Hide resolved
Co-authored-by: Krystle Salazar <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I added the last bit of change requests to this PR, and now the CI is passing.
Thank you for your contribution, @dryruffian! Europeana has very high-quality media, and now Openverse will have better data for these media items, and they will be easier to find ✨
* Add Collect creator data from Europeana API (Fixes WordPress#2834) * Add Collect creator method data from Europeana API (Fixes WordPress#2834) Co-authored-by: Krystle Salazar <[email protected]> Co-authored-by: Olga Bulat <[email protected]>
Fixes
Fixes #2834 by @obulat
Description
Added a New method
_get_creator
in classEuropeanaRecordBuilder
to collect cerator data formdcCreator
field, which contains the creator information, is now extracted and included in the record data.Checklist
Update index.md
).main
) or a parent feature branch.ov just catalog/generate-docs
for catalogPRs) or the media properties generator (
ov just catalog/generate-docs media-props
for the catalog or
ov just api/generate-docs
for the API) where applicable.Developer Certificate of Origin
Developer Certificate of Origin