This repository has been archived by the owner on Aug 4, 2023. It is now read-only.
-
Notifications
You must be signed in to change notification settings - Fork 54
Omit DAGs that are known to fail from alerts #643
Merged
Merged
Changes from all commits
Commits
Show all changes
15 commits
Select commit
Hold shift + click to select a range
07a4e15
Skip slack alerting for dags configured in Airflow variable
stacimc 572a8b7
Always log message before sending Slack notification
stacimc d016bb2
Add method to fetch GitHub issue
stacimc 5188d2b
Add DAG to check for DAGs that need alerts reenabled
stacimc 3ad3629
Add tests
stacimc 4245edd
Fix Slack tests
stacimc 710d1f7
Add test that send_alert skips
stacimc 7f0682e
Update message format to prevent links unfurling
stacimc 78aea5a
Rename files and small refactor to make it easier to add silenced not…
stacimc 04e643b
Skip reporting task when all DAGs are configured correctly
stacimc 3101544
Reverse method order, update variable
stacimc 9416388
Improve formatting
stacimc ef9951d
Update test to test skip when no dags to reenable
stacimc 2b2a496
Add and parameters to send functions
stacimc d0041cc
Do not unfurl GitHub links in check_silenced_dags slack alerts
stacimc File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
56 changes: 56 additions & 0 deletions
56
openverse_catalog/dags/maintenance/check_silenced_dags/check_silenced_dags.py
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,56 @@ | ||
import logging | ||
from typing import Tuple | ||
|
||
from airflow.exceptions import AirflowException, AirflowSkipException | ||
from airflow.models import Variable | ||
from common.github import GitHubAPI | ||
from common.slack import send_alert | ||
|
||
|
||
logger = logging.getLogger(__name__) | ||
|
||
|
||
def get_issue_info(issue_url: str) -> Tuple[str, str, str]: | ||
""" | ||
Parses out the owner, repo, and issue_number from a GitHub issue url. | ||
""" | ||
url_split = issue_url.split("/") | ||
if len(url_split) < 4: | ||
raise AirflowException(f"Issue url {issue_url} could not be parsed.") | ||
return url_split[-4], url_split[-3], url_split[-1] | ||
|
||
|
||
def get_dags_with_closed_issues(github_pat, silenced_dags): | ||
gh = GitHubAPI(github_pat) | ||
|
||
dags_to_reenable = [] | ||
for dag_id, issue_url in silenced_dags.items(): | ||
owner, repo, issue_number = get_issue_info(issue_url) | ||
github_issue = gh.get_issue(repo, issue_number, owner) | ||
|
||
if github_issue.get("state") == "closed": | ||
# If the associated issue has been closed, this DAG can have | ||
# alerting reenabled. | ||
dags_to_reenable.append((dag_id, issue_url)) | ||
return dags_to_reenable | ||
|
||
|
||
def check_configuration(github_pat: str, airflow_variable: str): | ||
silenced_dags = Variable.get(airflow_variable, {}, deserialize_json=True) | ||
dags_to_reenable = get_dags_with_closed_issues(github_pat, silenced_dags) | ||
|
||
if not dags_to_reenable: | ||
raise AirflowSkipException( | ||
"All DAGs configured to silence messages have work still in progress." | ||
" No configuration updates needed." | ||
) | ||
|
||
message = ( | ||
"The following DAGs have Slack messages silenced, but the associated issue is" | ||
f" closed. Please remove them from the `{airflow_variable}` Airflow variable" | ||
" or assign a new issue." | ||
) | ||
for (dag, issue) in dags_to_reenable: | ||
message += f"\n - <{issue}|{dag}>" | ||
send_alert(message, username="Silenced DAG Check", unfurl_links=False) | ||
return message | ||
60 changes: 60 additions & 0 deletions
60
openverse_catalog/dags/maintenance/check_silenced_dags/check_silenced_dags_dag.py
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,60 @@ | ||
""" | ||
Checks for DAGs that have silenced Slack alerts which may need to be turned back | ||
on. | ||
|
||
When a DAG has known failures, it can be ommitted from Slack error reporting by adding | ||
an entry to the `silenced_slack_alerts` Airflow variable. This is a dictionary where the | ||
key is the `dag_id` of the affected DAG, and the value is the URL of a GitHub issue | ||
tracking the error. | ||
|
||
The `check_silenced_alert` DAG iterates over the entries in the `silenced_slack_alerts` | ||
configuration and verifies that the associated GitHub issues are still open. If an issue | ||
has been closed, it is assumed that the DAG should have Slack reporting reenabled, and | ||
an alert is sent to prompt manual update of the configuration. This prevents developers | ||
from forgetting to reenable Slack reporting after the issue has been resolved. | ||
|
||
The DAG runs weekly. | ||
""" | ||
|
||
import logging | ||
from datetime import datetime, timedelta | ||
|
||
from airflow.models import DAG, Variable | ||
from airflow.operators.python import PythonOperator | ||
from common.constants import DAG_DEFAULT_ARGS | ||
from maintenance.check_silenced_dags import check_silenced_dags | ||
|
||
|
||
logger = logging.getLogger(__name__) | ||
|
||
|
||
DAG_ID = "check_silenced_dags" | ||
MAX_ACTIVE = 1 | ||
GITHUB_PAT = Variable.get("GITHUB_API_KEY", default_var="not_set") | ||
|
||
|
||
dag = DAG( | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I know I keep saying this so apologies if this comes across as pushy, but we could definitely use the TaskFlow API for this DAG! 😄 |
||
dag_id=DAG_ID, | ||
default_args={ | ||
**DAG_DEFAULT_ARGS, | ||
"retry_delay": timedelta(minutes=1), | ||
}, | ||
start_date=datetime(2022, 7, 29), | ||
schedule_interval="@weekly", | ||
max_active_tasks=MAX_ACTIVE, | ||
max_active_runs=MAX_ACTIVE, | ||
catchup=False, | ||
# Use the docstring at the top of the file as md docs in the UI | ||
doc_md=__doc__, | ||
tags=["maintenance"], | ||
) | ||
|
||
with dag: | ||
PythonOperator( | ||
task_id="check_silenced_alert_configuration", | ||
python_callable=check_silenced_dags.check_configuration, | ||
op_kwargs={ | ||
"github_pat": GITHUB_PAT, | ||
"airflow_variable": "silenced_slack_alerts", | ||
}, | ||
) |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It'd be nice if this created a GitHub issue or something that could actually be assigned and tracked or have the maintainers pinged. Or maybe even just left a comment on the issue itself like "Please un-silence the DAG errors". Or maybe both, a new issue and a ping on the old, with the new issue just tracking the work of actually updating the prod configuration.
Just worried a slack ping could easily get lost (especially if lots of people are on vacation or distracted by something else, for example) in a way that a GitHub issue won't, as it acts more like a formal "todo" item.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh I love this idea! I'm going to create a follow-up issue and link back, this would be fantastic.