-
Notifications
You must be signed in to change notification settings - Fork 14.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Initialize finished counter at zero #23080
Conversation
Congratulations on your first Pull Request and welcome to the Apache Airflow community! If you have any issues or are unsure about any anything please check our Contribution Guide (https://github.com/apache/airflow/blob/main/CONTRIBUTING.rst)
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This increments that counter by 0. That doesn't seem right https://statsd.readthedocs.io/en/v3.3/reference.html#StatsClient.incr
Incrementing the timer by zero tells the receiver that the metric exists (sets it at zero) which is useful in cases where you want to capture the rate change from 0 to 1. Initializing metrics at zero is best practice for Prometheus metrics (which is how we're consuming the Statsd metrics from Airflow jobs). I've tested that should work with the Statsd Prometheus exporter locally:
|
Oh, somehow I thought this was changing the existing metric we emitted. Looking at it again I see it's not. Clearly not. |
The PR most likely needs to run full matrix of tests because it modifies parts of the core of Airflow. However, committers might decide to merge it quickly and take the risk. If they don't merge it quickly - please rebase it to the latest main at your convenience, or amend the last commit of the PR, and push it with --force-with-lease. |
15545b7
to
3595aa7
Compare
Thanks @ashb, I rebased to the latest main. |
Sets initial count of task finished state to zero. This enables acquiring the rate from zero to one (particularly useful if you want to alert on any failures). We're using the Prometheus statsd-exporter. Since counters are usually used with a PromQL function like `rate`, it's important that counters are initialized at zero, otherwise when a task finishes the rate function will not have a previous value to compare the state count to. For example, what we'd like to do: ``` sum by (dag_id, task_id) (rate(airflow_ti_finish{state='failed'}[1h])) > 0 ``` This tells us the failure rate of tasks over time. What I've tried to do instead to ensure the metric captures the change from zero to one: ``` (sum by (dag_id, task_id) (rate(airflow_ti_finish{state='failed'}[1h])) > 0) or sum by (dag_id, task_id) (airflow_ti_finish{state='failed'} != 0 unless (airflow_ti_finish{state='failed'} offset 1m)) ``` Two useful posts on this subject: https://www.robustperception.io/why-predeclare-metrics https://www.section.io/blog/beware-prometheus-counters-that-do-not-begin-at-zero/
3db7bf0
to
a272afe
Compare
@ashb just a gentle nudge about this one - tests are passing now 😃 |
Awesome work, congrats on your first merged pull request! |
Thanks @bilbof, congrats on your first commit 👍 |
Sets initial count of task finished state to zero. This enables acquiring the rate from zero to one (particularly useful if you want to alert on task failures).
We're using the Prometheus statsd-exporter. Since counters are usually used with a PromQL function like
rate
, it's importantthat counters are initialized at zero, otherwise when a task finishes the rate function will not have a previous value to compare the state count to.
For example, what we'd like to do, which tells us the failure rate of tasks over time:
Two useful posts on this subject:
https://www.robustperception.io/why-predeclare-metrics
https://www.section.io/blog/beware-prometheus-counters-that-do-not-begin-at-zero/