Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Move DAG status information into Airflow Variable #1368

Open
1 task
AetherUnbound opened this issue Nov 2, 2022 · 0 comments
Open
1 task

Move DAG status information into Airflow Variable #1368

AetherUnbound opened this issue Nov 2, 2022 · 0 comments
Labels
💻 aspect: code Concerns the software code in the repository ✨ goal: improvement Improvement to an existing user-facing feature 🟩 priority: low Low priority and doesn't need to be rushed 🧱 stack: catalog Related to the catalog and Airflow DAGs 🔧 tech: airflow Involves Apache Airflow 🐍 tech: python Involves Python

Comments

@AetherUnbound
Copy link
Collaborator

Description

DAG status information presently exists in a public handbook page here: https://make.wordpress.org/openverse/handbook/openverse-catalog/dag-status-information/. This page is manually edited and we have frequently forgotten to come back and edit it after a DAG has been re-enabled.

It would be fantastic if we could capture this information in an Airflow Variable tied to one or several GitHub issues and use a mechanism similar to the one defined in WordPress/openverse-catalog#644 to regularly check that the GitHub issues remain open and whether the DAG should still be enabled/disabled. The structure of the variable could be similar:

{
    "science_museum_workflow": {
        "issues": [
            "https://github.com/WordPress/openverse-catalog/issues/738"
        ],
        "reason": "Invalid license name encountered in most recent run, we want to try and use the new skip_ingestion_errors parameter with this next week."
    },
    "wikimedia_commons_workflow": {
        # ... and so on
    }
}

This could then also serve as a means of generating the DAG status page automatically, so we still had an easy external reference for which DAGs were paused at any given time! We could add some additional checks, like ensuring a DAG shouldn't be paused without an associated record in this Variable, or that the Variable should not have a record for an unpaused DAG, etc.

Additional context

Implementation

  • 🙋 I would be interested in implementing this feature.
@AetherUnbound AetherUnbound added ✨ goal: improvement Improvement to an existing user-facing feature 🐍 tech: python Involves Python 💻 aspect: code Concerns the software code in the repository 🔧 tech: airflow Involves Apache Airflow 🟩 priority: low Low priority and doesn't need to be rushed labels Nov 2, 2022
@obulat obulat added the 🧱 stack: catalog Related to the catalog and Airflow DAGs label Feb 23, 2023
@github-project-automation github-project-automation bot moved this to 📋 Backlog in Openverse Backlog Apr 17, 2023
@obulat obulat transferred this issue from WordPress/openverse-catalog Apr 17, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
💻 aspect: code Concerns the software code in the repository ✨ goal: improvement Improvement to an existing user-facing feature 🟩 priority: low Low priority and doesn't need to be rushed 🧱 stack: catalog Related to the catalog and Airflow DAGs 🔧 tech: airflow Involves Apache Airflow 🐍 tech: python Involves Python
Projects
Status: 📋 Backlog
Development

No branches or pull requests

2 participants