-
Notifications
You must be signed in to change notification settings - Fork 14.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[AIP-49] [OTel Integration] Add tagging to existing stats #30496
[AIP-49] [OTel Integration] Add tagging to existing stats #30496
Conversation
Interesting. I'll fix the tests |
airflow/models/taskinstance.py
Outdated
@@ -540,6 +540,10 @@ def __init__( | |||
# can be changed when calling 'run' | |||
self.test_mode = False | |||
|
|||
@property | |||
def stats_tags(self) -> dict[str, str]: | |||
return prune_dict({"dag_id": self.dag_id, "run_id": str(self.run_id), "task_id": self.task_id}) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
isn't run_ID a meaningless autogenerated number ? If so, I think it'd add little value to the metric, while adding a lot of cardinality, making the costs 📈 for users
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
isn't run_ID a meaningless autogenerated number ? If so, I think it'd add little value to the metric, while adding a lot of cardinality, making the costs 📈 for users
I don't think it's a meaningless, but yes - it is auto-generated. For example, scheduled__2023-04-11T19:10:00+00:00
but I have to agree, that it will generate unnecessary cardinality since each of these will attribute to a distinct time series and users would probably going to see dots instead of lines.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Removed it in 3c32c7d
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Rethinking this, that was already in there (here). All I did was move it from an inline declaration to a property. So removing it now would/could/should be considered a breaking change. How strongly do we feel about that? I'm tempted to leave it in at this point unless there is a reason to drop it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Even though this may be a breaking change, after thinking about it, I believe adding run_id was not a good design decision as the presence of that label would not create a continuous time series, but rather a multiple metrics data that are going to become high-cardinality pattern.
My usual thought process for determining if something is a metric data or not, is whether the series could be something that we could connect it as a 'time series' (connecting dots to make it into the chart) or not. Having each data point with distinct run_id would not make it into time series, but just a series of distinct individual 'events' that would simply be plotted as dots in the chart (since each of them would have their own distinct run_id).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agree.
Sorry for the delay, I was out of town celebrating my anniversary :P I will look at the failing tests and get to the comments today. Thanks for your patience! |
Congrats! That must have been a celerbration! 🎉 🎉 🎉 🎉 🎉 |
15 years! I can't believe anyone would put up with me that long! |
Congrats! Considering the fact that my daughter is 16 years old now, I think my marriage has lasted +1 year of that time, and I do feel the same about how understanding my wife has been with me :-) . |
64c780f
to
3c32c7d
Compare
Rebased and made a handful of adjustments based on feedback above. Still pending a decision on including the email address(es); I'll adjust the test and/or the method accordingly once we have some consensus there. Also pending a deeper look at the tests which are failing because |
@o-nikolas mentioned that it was likely because the stats_tags wasn't be stored in the db, so I thought making it a |
The code link you added is for the DagRun object, but the exception is referring to a TaskInstance object, I think you're missing a similar property on the TI (if that's what was intended, or you're trying to get that property on a different object than you think you're using). |
Nah, I'm just a dough-head and linked the wrong place. Fixed the link. It's in both places. |
It's a typo. stats_tags vs stat_tags |
Nice, the easiest of fixes! 😃 |
And among the more embarrassing ones :P Running static checks and double-checking it was only the one place I made that mistake and I'll have another commit pushed shortly. That just leaves the email(s) question pending. |
Alright, a whole slew of new ones failing now that the typo is out of the way. I have to update a handful of unit tests that were checking for called_with(). I'll worry about that tomorrow. [DONE] |
In this test we are specifically checking to make sure that there are no tags in the I was planning to add tagging to every metric, but perhaps I overdid it and it's not wanted or needed on some? |
2070795
to
1c09535
Compare
Sorry for the churn. Post-vacation brain melt, I guess :P All tests are currently passing. Still pending discussion/decisions: |
I wrote a couple of comments, but they're currently in |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I left out a few comments on the topics.
- I believe it is better to remove the
email_id
from the label (tag) - Also, it may be helpful if we could remove
run_id
from the label.
airflow/models/taskinstance.py
Outdated
@@ -540,6 +540,10 @@ def __init__( | |||
# can be changed when calling 'run' | |||
self.test_mode = False | |||
|
|||
@property | |||
def stats_tags(self) -> dict[str, str]: | |||
return prune_dict({"dag_id": self.dag_id, "run_id": str(self.run_id), "task_id": self.task_id}) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Even though this may be a breaking change, after thinking about it, I believe adding run_id was not a good design decision as the presence of that label would not create a continuous time series, but rather a multiple metrics data that are going to become high-cardinality pattern.
My usual thought process for determining if something is a metric data or not, is whether the series could be something that we could connect it as a 'time series' (connecting dots to make it into the chart) or not. Having each data point with distinct run_id would not make it into time series, but just a series of distinct individual 'events' that would simply be plotted as dots in the chart (since each of them would have their own distinct run_id).
Alright, the last couple commits there should remove the email and run_id tags and update the unit tests accordingly. |
8a29124
to
1d26c9c
Compare
1d26c9c
to
6484b32
Compare
@uranusjr @vincbeck @howardyoo - Are we good? Can I get a review/approval on this one when you get time? I think I've covered all comments. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. @uranusjr ?
hey @ferruzzi I'm curious about some of these metrics and if you had further plans to change them. Like e.g.: |
@conorbev "plans" may be a strong word, but I do intend to do more work on the OTel stuff as I get time. The reason they still have the embedded values is because not all metrics backends support tagging, so the names themselves have to remain the same for compatibility. The tags are there to hep for backends which benefit from them (like OTel) but if we start changing the names of existing metrics we need to go through a whole deprecation cycle. Or we emit everything twice, once with the embedded values and again without, but then anyone using OTel or other tag-friendly backend will be seeing everything double. It's a bit of a no-win situation at the moment and would require a community discussion and consensus on how to proceed. |
Groundwork PR for the AIP-49 OpenTelemetry integration work. Many stats had tagging already, this PR cleans up some of them and adds tagging to the ones that were missing it.
cc @o-nikolas @vincbeck @vandonr-amz @syedahsn