Skip to content
This repository has been archived by the owner on Aug 4, 2023. It is now read-only.

Use Airflow variable to omit DAGs from any Slack notification #644

Merged
merged 22 commits into from
Sep 26, 2022

Conversation

stacimc
Copy link
Contributor

@stacimc stacimc commented Aug 2, 2022

Fixes

Description

This is an extension of #643, which added an Airflow variable (silenced_slack_alerts) to conditionally omit DAGs from Slack alerts. This PR extends that logic to allow omitting DAGs from notifications (rather than just alerts).

The driving use case for this one is the backfill of Europeana (discussed in WordPress/openverse#1642). Now that we have enabled catchup on this DAG, when we turn it on in production it will immediately start backfilling years of missing DagRuns. This will cause our notifications Slack channel to be flooded with messages for each of these runs. We'd like to be able to be able to temporarily turn this off and on without requiring a code change each time.

Implementation notes

Originally, I added a second Airflow variable (silenced_slack_notifications) that worked identically to silenced_slack_alerts, but for notifications. I scrapped this in favor of changing the existing Airflow variable to allow it to skip Slack messages of any type. The new variable is named silenced_slack_notifications and looks like this:

{
   "cleveland_museum_workflow": [
        {
            "issue": "https://github.com/WordPress/openverse-catalog/issues/229",  # An open issue
            "predicate": "KeyError"  # Silence KeyErrors  
        },
        {
            "issue": "https://github.com/WordPress/openverse-catalog/issues/438",  # A closed issue
            "predicate": "Airflow DAG Load Data Complete"  # Silence DAG completion notifications
        }
    ],
    "wikimedia_commons_workflow": [
        # ... and so on
     ]
}

It is a dictionary of DAG ids to lists of SilencedSlackNotifications, themselves a dict mapping a GitHub issue to a single predicate string.

  • This configuration allows us to silence multiple different errors or notifications for a single DAG. While it is possible to associate them to the same GH issue, the structure encourages you to open separate issues.
  • The structure is a little more verbose, but intended to be more readable and easier to type within the code for documentation purposes.
  • Each predicate matches against Slack message body OR the message username. This allows us to do things like silence all KeyError alerts, or for example the notifications sent on report_load_completion (by targeting the user name).

Note for Deploying

It is safe to simply merge this code, although it changes the shape of the Airflow variables, because:

  • No dags are currently silenced in production
  • It also changes the name of the Airflow variable, so it would be safe to switch over regardless.

Testing Instructions

Make sure you have the slack_message_override set to True in your Airflow variables. I used the testing configuration for Cleveland in the notes section above, to populate the Airflow silenced_slack_notifications variable (Admin -> Variables -> New).

You're encouraged to play around with different configurations, and adjust the testing expectations accordingly. For Cleveland, I:

  • Edited the provider script to manually raise a KeyError during ingestion
  • Ran the DAG
  • Confirmed that while the KeyError was thrown during pull_data, no Slack message was sent. You can also see a message in the logs ("Skipping Slack notification for cleveland_museum_workflow")
  • Confirmed that the report_load_completion task passed, but the slack message was not sent. This is also recorded in the logs.
  • Manually edited the script to raise a different error (not KeyError) instead. Re-run the DAG and confirmed this alert is sent.
  • Verify that the report_load_completion message IS sent for another DAG.

And now check the check_silenced_dags DAG:

Checklist

  • My pull request has a descriptive title (not a vague title like Update index.md).
  • My pull request targets the default branch of the repository (main) or a parent feature branch.
  • My commit messages follow best practices.
  • My code follows the established code style of the repository.
  • I added or updated tests for the changes I made (if applicable).
  • I added or updated documentation (if applicable).
  • I tried running the project locally and verified that there are no visible errors.

Developer Certificate of Origin

Developer Certificate of Origin
Developer Certificate of Origin
Version 1.1

Copyright (C) 2004, 2006 The Linux Foundation and its contributors.
1 Letterman Drive
Suite D4700
San Francisco, CA, 94129

Everyone is permitted to copy and distribute verbatim copies of this
license document, but changing it is not allowed.


Developer's Certificate of Origin 1.1

By making a contribution to this project, I certify that:

(a) The contribution was created in whole or in part by me and I
    have the right to submit it under the open source license
    indicated in the file; or

(b) The contribution is based upon previous work that, to the best
    of my knowledge, is covered under an appropriate open source
    license and I have the right under that license to submit that
    work with modifications, whether created in whole or in part
    by me, under the same open source license (unless I am
    permitted to submit under a different license), as indicated
    in the file; or

(c) The contribution was provided directly to me by some other
    person who certified (a), (b) or (c) and I have not modified
    it.

(d) I understand and agree that this project and the contribution
    are public and that a record of the contribution (including all
    personal information I submit with it, including my sign-off) is
    maintained indefinitely and may be redistributed consistent with
    this project or the open source license(s) involved.

@stacimc stacimc added 🟨 priority: medium Not blocking but should be addressed soon 🌟 goal: addition Addition of new feature 💻 aspect: code Concerns the software code in the repository labels Aug 2, 2022
@stacimc stacimc self-assigned this Aug 2, 2022
@stacimc
Copy link
Contributor Author

stacimc commented Aug 2, 2022

Considering doing some additional refactoring so that adding a DAG to silenced_slack_notifications silences notifications (like report_load_completion) only, but does not also silence alerts. I think it's reasonable to expect we'll want to control those separately.

Edit: done, more information in PR description.

Base automatically changed from feature/omit-dags-from-alerts to main August 10, 2022 20:14
@stacimc
Copy link
Contributor Author

stacimc commented Aug 18, 2022

I'm going to pick this back up as part of milestone v1.3.2 👍

@stacimc stacimc force-pushed the feature/omit-dags-from-slack-notifications branch from c06106e to 5fa08e9 Compare September 2, 2022 21:25
@stacimc stacimc marked this pull request as ready for review September 7, 2022 00:00
@stacimc stacimc requested a review from a team as a code owner September 7, 2022 00:00
@stacimc stacimc changed the title Use Airflow variable to omit DAGs from all Slack notifications Use Airflow variable to omit DAGs from any Slack notifications Sep 9, 2022
@stacimc stacimc changed the title Use Airflow variable to omit DAGs from any Slack notifications Use Airflow variable to omit DAGs from any Slack notification Sep 9, 2022
Copy link
Contributor

@AetherUnbound AetherUnbound left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is rad, and it's fantastic that you've made it generic for both notifications and alerts 😻 Since we do already use "notifications" and "alerts" as specific and distinct names for certain kinds of Slack pings, would you be OK with changing silenced_slack_notifications to something more generic like silenced_slack_messages? That way it's a bit clearer (or at least hopefully less confusing) that this covers both notifications & alerts.

@@ -65,6 +65,20 @@
log = logging.getLogger(__name__)


class SilencedSlackNotification(TypedDict):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice typed dict!

openverse_catalog/dags/common/slack.py Outdated Show resolved Hide resolved
openverse_catalog/dags/common/slack.py Outdated Show resolved Hide resolved
tests/dags/common/test_slack.py Show resolved Hide resolved
"test_dag_id": [
{
"issue": "https://github.com/WordPress/openverse/issues/1",
"predicate": "kEYErrOR",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

image
🤣

@AetherUnbound
Copy link
Contributor

I ran all of the testing steps supplied and they worked great until the check_silenced_dags test, where I encountered this error:

[2022-09-14, 05:02:31 UTC] {taskinstance.py:1909} ERROR - Task failed with exception
Traceback (most recent call last):
  File "/usr/local/airflow/.local/lib/python3.10/site-packages/airflow/operators/python.py", line 171, in execute
    return_value = self.execute_callable()
  File "/usr/local/airflow/.local/lib/python3.10/site-packages/airflow/operators/python.py", line 189, in execute_callable
    return self.python_callable(*self.op_args, **self.op_kwargs)
  File "/usr/local/airflow/openverse_catalog/dags/maintenance/check_silenced_dags/check_silenced_dags.py", line 61, in check_configuration
    send_alert(message, username="Silenced DAG Check", unfurl_links=False)
TypeError: send_alert() missing 1 required positional argument: 'dag_id'

@openverse-bot
Copy link
Contributor

Based on the medium urgency of this PR, the following reviewers are being gently reminded to review this PR:

@krysal
This reminder is being automatically generated due to the urgency configuration.

Excluding weekend1 days, this PR was updated 4 day(s) ago. PRs labelled with medium urgency are expected to be reviewed within 4 weekday(s)2.

@stacimc, if this PR is not ready for a review, please draft it to prevent reviewers from getting further unnecessary pings.

Footnotes

  1. Specifically, Saturday and Sunday.

  2. For the purpose of these reminders we treat Monday - Friday as weekdays. Please note that the that generates these reminders runs at midnight UTC on Monday - Friday. This means that depending on your timezone, you may be pinged outside of the expected range.

@stacimc stacimc force-pushed the feature/omit-dags-from-slack-notifications branch from 65d2e89 to dba529b Compare September 23, 2022 18:05
Copy link
Contributor

@AetherUnbound AetherUnbound left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks great! All of the testing steps succeeded, and I love all the documentation. This will for sure be useful 🚀

DAGs.md Show resolved Hide resolved
Copy link
Member

@zackkrida zackkrida left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lovely. lgtm

@stacimc stacimc merged commit cec6893 into main Sep 26, 2022
@stacimc stacimc deleted the feature/omit-dags-from-slack-notifications branch September 26, 2022 18:30
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
💻 aspect: code Concerns the software code in the repository 🌟 goal: addition Addition of new feature 🟨 priority: medium Not blocking but should be addressed soon
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants