Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Debug penn_museum near_eastern collection #580

Open
jacobthill opened this issue Nov 12, 2024 · 0 comments
Open

Debug penn_museum near_eastern collection #580

jacobthill opened this issue Nov 12, 2024 · 0 comments
Assignees

Comments

@jacobthill
Copy link
Contributor

Penn musuem has 3 collections. They provide all museum data in one large csv file which we filter for each collection to get only the records for that collection. The csv file does not contain thumbnails for we run a post_harvest task that fetches the thumbnail from the schema.org data. We then delete all records with no thumbnail. The Babylonian and Egyptian collections work as expected but the Near Eastern collection does not. The log for the post_harvest task shows that it queried all urls for thumbnails and clicking on many of them will show that there are thumbnails. But the record that gets saved to :/opt/app/dlme/dlme-airflow/shared/source_data/penn_museum/near_eastern is an empty json file. This causes the rest of the DAG to succeed with no changes. I'm not sure how to debug this but we need to be careful to not hit Penn's site more than necessary.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants