-
Notifications
You must be signed in to change notification settings - Fork 213
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Europeana script should collect the creator #2834
Labels
💻 aspect: code
Concerns the software code in the repository
✨ goal: improvement
Improvement to an existing user-facing feature
good first issue
New-contributor friendly
help wanted
Open to participation from the community
🟨 priority: medium
Not blocking but should be addressed soon
🧱 stack: catalog
Related to the catalog and Airflow DAGs
Comments
obulat
added
🟨 priority: medium
Not blocking but should be addressed soon
✨ goal: improvement
Improvement to an existing user-facing feature
💻 aspect: code
Concerns the software code in the repository
🧱 stack: catalog
Related to the catalog and Airflow DAGs
labels
Aug 15, 2023
AetherUnbound
added
good first issue
New-contributor friendly
help wanted
Open to participation from the community
labels
May 29, 2024
Hi @obulat, def _get_creator(self, item_data: dict) -> str | None:
creators = item_data.get("dcCreator", [])
if not creators:
return None
return creators if isinstance(creators, str) else ", ".join(creators) will adding this function to the |
@pytest.mark.parametrize(
"item_data, expected",
[
# Single creator in a list
pytest.param({"dcCreator": ["Chandler"]}, "Chandler", id="single_creator"),
# Multiple creators in a list
pytest.param(
{"dcCreator": ["Chandler", "Joey"]},
"Chandler, Joey",
id="multiple_creators",
),
# Empty creator list
pytest.param({"dcCreator": []}, None, id="empty_creator_list"),
# Missing dcCreator key
pytest.param({}, None, id="no_dcCreator"),
# dcCreator is a string
pytest.param({"dcCreator": "Chandler"}, "Chandler", id="dcCreator_string"),
# dcCreator is None
pytest.param({"dcCreator": None}, None, id="dcCreator_none"),
# Empty string in creator list
pytest.param({"dcCreator": [""]}, "", id="empty_string_in_list"),
],
)
def test_get_creator(item_data, expected, record_builder):
assert record_builder._get_creator(item_data) == expected This is a test for the Europeana script. Please let me know if this is fine. |
dryruffian
added a commit
to dryruffian/openverse
that referenced
this issue
Oct 20, 2024
8 tasks
dryruffian
added a commit
to dryruffian/openverse
that referenced
this issue
Oct 21, 2024
dryruffian
added a commit
to dryruffian/openverse
that referenced
this issue
Oct 21, 2024
dryruffian
added a commit
to dryruffian/openverse
that referenced
this issue
Oct 26, 2024
dryruffian
added a commit
to dryruffian/openverse
that referenced
this issue
Oct 26, 2024
dryruffian
added a commit
to dryruffian/openverse
that referenced
this issue
Oct 26, 2024
dryruffian
added a commit
to dryruffian/openverse
that referenced
this issue
Oct 27, 2024
dryruffian
added a commit
to dryruffian/openverse
that referenced
this issue
Oct 27, 2024
obulat
pushed a commit
to dryruffian/openverse
that referenced
this issue
Oct 29, 2024
obulat
pushed a commit
to dryruffian/openverse
that referenced
this issue
Oct 29, 2024
obulat
pushed a commit
to dryruffian/openverse
that referenced
this issue
Oct 29, 2024
obulat
pushed a commit
to dryruffian/openverse
that referenced
this issue
Oct 29, 2024
obulat
pushed a commit
to dryruffian/openverse
that referenced
this issue
Oct 29, 2024
obulat
pushed a commit
to dryruffian/openverse
that referenced
this issue
Oct 29, 2024
obulat
pushed a commit
to dryruffian/openverse
that referenced
this issue
Oct 29, 2024
obulat
pushed a commit
to dryruffian/openverse
that referenced
this issue
Oct 29, 2024
Danil49
pushed a commit
to Danil49/openverse
that referenced
this issue
Oct 29, 2024
* Add Collect creator data from Europeana API (Fixes WordPress#2834) * Add Collect creator method data from Europeana API (Fixes WordPress#2834) Co-authored-by: Krystle Salazar <[email protected]> Co-authored-by: Olga Bulat <[email protected]>
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Labels
💻 aspect: code
Concerns the software code in the repository
✨ goal: improvement
Improvement to an existing user-facing feature
good first issue
New-contributor friendly
help wanted
Open to participation from the community
🟨 priority: medium
Not blocking but should be addressed soon
🧱 stack: catalog
Related to the catalog and Airflow DAGs
Problem
The Europeana script does not collect the creator data.
Description
Europeana is an aggregator of high-quality media from many European GLAM institutions, and it is important for search relevancy to collect all of the relevant data. Creator is not currently not collected at all.
The creators are available in the
dcCreator
field within the returned data, here's an example with our sample data:openverse/catalog/tests/dags/providers/provider_api_scripts/resources/europeana/europeana_example.json
Line 6882 in aedc9c1
Here's what it looks like within a real return data:
It looks like there could be a list of creators, we should probably join them together with commas (e.g.
", ".join(item_data.get("dcCreator", ""))
).In order to accomplish this, we'll need to modify the Europeana provider ingestion script. We should add a new function to the
EuropeanaRecordBuilder
class to retrieve this value. A good example to use would be_get_filesize
. We'll then need to capture this information inget_record_data
, by adding it to the dictionary with the"creator"
key here:openverse/catalog/dags/providers/provider_api_scripts/europeana.py
Line 73 in 7f4fb7c
We'll also need to add tests for this function and update any other Europeana tests that might be affected. An example test for
_get_filesize
can be found here.The text was updated successfully, but these errors were encountered: