Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add backoff to Stocksnap DAG for 5XX errors #4878

Closed
AetherUnbound opened this issue Sep 6, 2024 · 3 comments · Fixed by #5014
Closed

Add backoff to Stocksnap DAG for 5XX errors #4878

AetherUnbound opened this issue Sep 6, 2024 · 3 comments · Fixed by #5014
Assignees
Labels
💻 aspect: code Concerns the software code in the repository 🛠 goal: fix Bug fix good first issue New-contributor friendly help wanted Open to participation from the community 🟨 priority: medium Not blocking but should be addressed soon 🧱 stack: catalog Related to the catalog and Airflow DAGs 🔧 tech: airflow Involves Apache Airflow 🐍 tech: python Involves Python

Comments

@AetherUnbound
Copy link
Collaborator

Airflow log link

Note: Airflow is currently only accessible to maintainers & those given
access. If you would like access to Airflow, please reach out to a member of
@WordPress/openverse-maintainers
.

https://airflow.openverse.org/dags/stocksnap_workflow/grid?dag_run_id=manual__2024-09-05T18%3A39%3A42%2B00%3A00&task_id=ingest_data.pull_image_data&base_date=2024-09-05T18%3A39%3A42%2B0000&tab=logs

Description

It looks like we're starting to encounter ephemeral 5XX errors with certain Stocksnap errors (separate from #4101). We should add a backoff.on_exception wrapper to the _get_filesize function of the ingestion class so these errors can be retried:

def _get_filesize(self, image_url):

This can be done using a decorator like Freesound:

@backoff.on_exception(backoff.expo, flaky_exceptions, max_tries=3)

Although we'd want the check to look like the global one for Science Museum (see #4715)

# Science Museum's API tends to be flaky, so we add a backoff on every request
# for 5XX error codes.
# See: https://github.com/WordPress/openverse/issues/4710
get_response_json = backoff.on_exception(
backoff.expo,
HTTPError,
max_time=60 * 30, # 30 minutes
# Only retry on 5XX errors
giveup=lambda e: e.response.status_code not in {502, 503, 504},
)(ProviderDataIngester.get_response_json)

Reproduction

Since these are temporal issues, I wasn't able to reproduce the 502 in the logs

DAG status

I've left this enabled, though I will add a silenced alert clause for this linking to this issue.

@AetherUnbound AetherUnbound added good first issue New-contributor friendly help wanted Open to participation from the community 🐍 tech: python Involves Python 💻 aspect: code Concerns the software code in the repository 🔧 tech: airflow Involves Apache Airflow 🛠 goal: fix Bug fix 🟨 priority: medium Not blocking but should be addressed soon 🧱 stack: catalog Related to the catalog and Airflow DAGs labels Sep 6, 2024
@openverse-bot openverse-bot moved this to 📋 Backlog in Openverse Backlog Sep 6, 2024
@MarleaM
Copy link
Contributor

MarleaM commented Sep 26, 2024

@AetherUnbound I'd love to work on this issue!

@AetherUnbound
Copy link
Collaborator Author

Hi @MarleaM, thank you for your interest in contributing to Openverse! I've assigned this issue to you. If you have any questions, you may leave them here.

Please check out our welcome and general setup documentation pages for getting started with setting up your local environment.

@openverse-bot openverse-bot moved this from 📋 Backlog to 📅 To Do in Openverse Backlog Sep 26, 2024
@MarleaM
Copy link
Contributor

MarleaM commented Sep 29, 2024

hi @AetherUnbound ! I believe I have a solution for this ready, however I am having trouble getting ov to run (i'm on windows & having issues with WSL).
I think that what I have should address the issue, though. Should I go ahead and push?

Thanks for your help : )

@openverse-bot openverse-bot moved this from 📅 To Do to 🏗 In Progress in Openverse Backlog Oct 1, 2024
@openverse-bot openverse-bot moved this from 🏗 In Progress to ✅ Done in Openverse Backlog Oct 9, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
💻 aspect: code Concerns the software code in the repository 🛠 goal: fix Bug fix good first issue New-contributor friendly help wanted Open to participation from the community 🟨 priority: medium Not blocking but should be addressed soon 🧱 stack: catalog Related to the catalog and Airflow DAGs 🔧 tech: airflow Involves Apache Airflow 🐍 tech: python Involves Python
Projects
Archived in project
Development

Successfully merging a pull request may close this issue.

2 participants