[AIP-49] [OTel Integration] Add tagging to existing stats #30496

ferruzzi · 2023-04-05T23:50:05Z

Groundwork PR for the AIP-49 OpenTelemetry integration work. Many stats had tagging already, this PR cleans up some of them and adds tagging to the ones that were missing it.

cc @o-nikolas @vincbeck @vandonr-amz @syedahsn

ferruzzi · 2023-04-06T01:39:05Z

Interesting. I'll fix the tests

airflow/dag_processing/manager.py

airflow/dag_processing/processor.py

airflow/executors/base_executor.py

vandonr-amz · 2023-04-10T21:37:16Z

airflow/models/taskinstance.py

@@ -540,6 +540,10 @@ def __init__(
        # can be changed when calling 'run'
        self.test_mode = False

+    @property
+    def stats_tags(self) -> dict[str, str]:
+        return prune_dict({"dag_id": self.dag_id, "run_id": str(self.run_id), "task_id": self.task_id})


isn't run_ID a meaningless autogenerated number ? If so, I think it'd add little value to the metric, while adding a lot of cardinality, making the costs 📈 for users

isn't run_ID a meaningless autogenerated number ? If so, I think it'd add little value to the metric, while adding a lot of cardinality, making the costs 📈 for users

I don't think it's a meaningless, but yes - it is auto-generated. For example, scheduled__2023-04-11T19:10:00+00:00 but I have to agree, that it will generate unnecessary cardinality since each of these will attribute to a distinct time series and users would probably going to see dots instead of lines.

Removed it in 3c32c7d

Rethinking this, that was already in there (here). All I did was move it from an inline declaration to a property. So removing it now would/could/should be considered a breaking change. How strongly do we feel about that? I'm tempted to leave it in at this point unless there is a reason to drop it?

Even though this may be a breaking change, after thinking about it, I believe adding run_id was not a good design decision as the presence of that label would not create a continuous time series, but rather a multiple metrics data that are going to become high-cardinality pattern.

My usual thought process for determining if something is a metric data or not, is whether the series could be something that we could connect it as a 'time series' (connecting dots to make it into the chart) or not. Having each data point with distinct run_id would not make it into time series, but just a series of distinct individual 'events' that would simply be plotted as dots in the chart (since each of them would have their own distinct run_id).

airflow/stats.py

ferruzzi · 2023-04-11T17:59:34Z

Sorry for the delay, I was out of town celebrating my anniversary :P I will look at the failing tests and get to the comments today. Thanks for your patience!

potiuk · 2023-04-11T18:10:14Z

Sorry for the delay, I was out of town celebrating my anniversary :P I will look at the failing tests and get to the comments today. Thanks for your patience!

Congrats! That must have been a celerbration! 🎉 🎉 🎉 🎉 🎉

ferruzzi · 2023-04-11T18:26:54Z

Congrats! That must have been a celerbration

15 years! I can't believe anyone would put up with me that long!

howardyoo · 2023-04-11T19:20:14Z

Congrats! That must have been a celerbration

15 years! I can't believe anyone would put up with me that long!

Congrats! Considering the fact that my daughter is 16 years old now, I think my marriage has lasted +1 year of that time, and I do feel the same about how understanding my wife has been with me :-) .

airflow/models/taskinstance.py

ferruzzi · 2023-04-12T02:31:39Z

Rebased and made a handful of adjustments based on feedback above. Still pending a decision on including the email address(es); I'll adjust the test and/or the method accordingly once we have some consensus there.

Also pending a deeper look at the tests which are failing because AttributeError: 'TaskInstance' object has no attribute 'stat_tags' when it has the @property here.

ferruzzi · 2023-04-12T03:13:42Z

@o-nikolas mentioned that it was likely because the stats_tags wasn't be stored in the db, so I thought making it a property and assembling it there was a clever solution. I may have to inline it (8 uses) but I was hoping for a more DRY answer.

o-nikolas · 2023-04-12T03:21:27Z

Also pending a deeper look at the tests which are failing because AttributeError: 'TaskInstance' object has no attribute 'stat_tags' when it has the @Property here.
@o-nikolas mentioned that it was likely because the stats_tags wasn't be stored in the db, so I thought making it a property and assembling it there was a clever solution. I may have to inline it (8 uses) but I was hoping for a more DRY answer.

The code link you added is for the DagRun object, but the exception is referring to a TaskInstance object, I think you're missing a similar property on the TI (if that's what was intended, or you're trying to get that property on a different object than you think you're using).

ferruzzi · 2023-04-12T03:23:18Z

Nah, I'm just a dough-head and linked the wrong place. Fixed the link. It's in both places.

ferruzzi · 2023-04-12T03:25:59Z

It's a typo. stats_tags vs stat_tags

o-nikolas · 2023-04-12T03:27:58Z

It's a typo. stats_tags vs stat_tags

Nice, the easiest of fixes! 😃

ferruzzi · 2023-04-12T03:29:54Z

Nice, the easiest of fixes! smiley

And among the more embarrassing ones :P Running static checks and double-checking it was only the one place I made that mistake and I'll have another commit pushed shortly.

That just leaves the email(s) question pending.

ferruzzi · 2023-04-12T05:16:54Z

Alright, a whole slew of new ones failing now that the typo is out of the way. I have to update a handful of unit tests that were checking for called_with(). I'll worry about that tomorrow.

[DONE]

ferruzzi · 2023-04-12T17:54:54Z

In this test we are specifically checking to make sure that there are no tags in the dagrun.{dag.dag_id}.first_task_scheduling_delay metric, but that they do exist in the dagrun.first_task_scheduling_delay metric. Is that actually the desired behavior or is that a side effect of the test that wasn't intended?

I was planning to add tagging to every metric, but perhaps I overdid it and it's not wanted or needed on some?

ferruzzi · 2023-04-13T02:24:00Z

Sorry for the churn. Post-vacation brain melt, I guess :P All tests are currently passing.

Still pending discussion/decisions:

Do we want emails in the tags (discussion here)
I think I am going to add the dag_run_id back in as that was there before I started tinkering and removing it feels like a breaking change. (discussion here)

howardyoo · 2023-04-13T03:58:45Z

Sorry for the churn. Post-vacation brain melt, I guess :P All tests are currently passing.

Still pending discussion/decisions:

Do we want emails in the tags (discussion here)

I think I am going to add the dag_run_id back in as that was there before I started tinkering and removing it feels like a breaking change. (discussion here)

I wrote a couple of comments, but they're currently in pending mode (so I guess only I can see them?) Not sure how to make them visible :-(

howardyoo

I left out a few comments on the topics.

I believe it is better to remove the email_id from the label (tag)
Also, it may be helpful if we could remove run_id from the label.

airflow/models/taskinstance.py

airflow/dag_processing/processor.py

howardyoo · 2023-04-13T03:56:02Z

airflow/models/taskinstance.py

@@ -540,6 +540,10 @@ def __init__(
        # can be changed when calling 'run'
        self.test_mode = False

+    @property
+    def stats_tags(self) -> dict[str, str]:
+        return prune_dict({"dag_id": self.dag_id, "run_id": str(self.run_id), "task_id": self.task_id})


Even though this may be a breaking change, after thinking about it, I believe adding run_id was not a good design decision as the presence of that label would not create a continuous time series, but rather a multiple metrics data that are going to become high-cardinality pattern.

My usual thought process for determining if something is a metric data or not, is whether the series could be something that we could connect it as a 'time series' (connecting dots to make it into the chart) or not. Having each data point with distinct run_id would not make it into time series, but just a series of distinct individual 'events' that would simply be plotted as dots in the chart (since each of them would have their own distinct run_id).

ferruzzi · 2023-04-13T19:05:44Z

Alright, the last couple commits there should remove the email and run_id tags and update the unit tests accordingly.

…ests

ferruzzi · 2023-04-17T17:08:32Z

@uranusjr @vincbeck @howardyoo - Are we good? Can I get a review/approval on this one when you get time? I think I've covered all comments.

airflow/dag_processing/processor.py

potiuk

LGTM. @uranusjr ?

conorbev · 2023-07-13T22:09:12Z

hey @ferruzzi I'm curious about some of these metrics and if you had further plans to change them. Like e.g.: Stats.incr(f"ti.start.{self.task.dag_id}.{self.task.task_id}", tags=self.stats_tags) <-- this metric would now have dag_id and task_id encoded both as part of the metric name and as separate tags ?

ferruzzi · 2023-07-14T02:22:15Z

@conorbev "plans" may be a strong word, but I do intend to do more work on the OTel stuff as I get time.

The reason they still have the embedded values is because not all metrics backends support tagging, so the names themselves have to remain the same for compatibility. The tags are there to hep for backends which benefit from them (like OTel) but if we start changing the names of existing metrics we need to go through a whole deprecation cycle. Or we emit everything twice, once with the embedded values and again without, but then anyone using OTel or other tag-friendly backend will be seeing everything double. It's a bit of a no-win situation at the moment and would require a community discussion and consensus on how to proceed.

ferruzzi requested review from jedcunningham, ephraimbuddy, kaxil, XD-DENG, ashb and o-nikolas as code owners April 5, 2023 23:50

boring-cyborg bot added the area:Scheduler including HA (high availability) scheduler label Apr 5, 2023

uranusjr reviewed Apr 6, 2023

View reviewed changes

airflow/dag_processing/manager.py Outdated Show resolved Hide resolved

vandonr-amz reviewed Apr 10, 2023

View reviewed changes

uranusjr reviewed Apr 11, 2023

View reviewed changes

airflow/models/taskinstance.py Outdated Show resolved Hide resolved

ferruzzi force-pushed the ferruzzi/otel/m2-stat-tagging branch from 64c780f to 3c32c7d Compare April 12, 2023 02:27

ferruzzi force-pushed the ferruzzi/otel/m2-stat-tagging branch from 2070795 to 1c09535 Compare April 12, 2023 19:09

howardyoo reviewed Apr 13, 2023

View reviewed changes

ferruzzi force-pushed the ferruzzi/otel/m2-stat-tagging branch from 8a29124 to 1d26c9c Compare April 13, 2023 20:46

ferruzzi added 10 commits April 13, 2023 13:47

Fix dt typing

38d5f64

add stat tagging to dag_processing

07bf70e

add stat tagging to base_executor

169741c

add stat tagging to models/dagrun

174f863

add stat tagging to models/taskinstance

767810b

PR fixes pt 1

1cdc9c8

stat_tags vs stats_tags typo fix should fix most of the failing t…

681b0d2

…ests

Update unit tests

56fcc10

Remove email tags

8160479

remove run_id from tags

6484b32

ferruzzi force-pushed the ferruzzi/otel/m2-stat-tagging branch from 1d26c9c to 6484b32 Compare April 13, 2023 20:48

vincbeck approved these changes Apr 17, 2023

View reviewed changes

potiuk reviewed Apr 17, 2023

View reviewed changes

airflow/dag_processing/processor.py Show resolved Hide resolved

potiuk approved these changes Apr 17, 2023

View reviewed changes

o-nikolas approved these changes Apr 17, 2023

View reviewed changes

uranusjr merged commit 7e1dace into apache:main Apr 18, 2023
1 check passed

wookiist pushed a commit to wookiist/airflow that referenced this pull request Apr 19, 2023

[OTel Integration] Add tagging to existing stats (apache#30496)

822c0f2

ephraimbuddy added this to the Airflow 2.7.0 milestone May 8, 2023

ephraimbuddy added the type:new-feature Changelog: New Features label May 8, 2023

ephraimbuddy modified the milestone: Airflow 2.7.0 May 8, 2023

ephraimbuddy added changelog:skip Changes that should be skipped from the changelog (CI, tests, etc..) and removed type:new-feature Changelog: New Features labels Aug 2, 2023

ferruzzi changed the title ~~[OTel Integration] Add tagging to existing stats~~ [AIP-49] [OTel Integration] Add tagging to existing stats Oct 27, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[AIP-49] [OTel Integration] Add tagging to existing stats #30496

[AIP-49] [OTel Integration] Add tagging to existing stats #30496

ferruzzi commented Apr 5, 2023 •

edited

Loading

ferruzzi commented Apr 6, 2023

vandonr-amz Apr 10, 2023

howardyoo Apr 11, 2023

ferruzzi Apr 12, 2023

ferruzzi Apr 12, 2023

howardyoo Apr 13, 2023

potiuk Apr 17, 2023

ferruzzi commented Apr 11, 2023

potiuk commented Apr 11, 2023

ferruzzi commented Apr 11, 2023

howardyoo commented Apr 11, 2023

ferruzzi commented Apr 12, 2023 •

edited

Loading

ferruzzi commented Apr 12, 2023

o-nikolas commented Apr 12, 2023

ferruzzi commented Apr 12, 2023

ferruzzi commented Apr 12, 2023

o-nikolas commented Apr 12, 2023

ferruzzi commented Apr 12, 2023

ferruzzi commented Apr 12, 2023 •

edited

Loading

ferruzzi commented Apr 12, 2023

ferruzzi commented Apr 13, 2023

howardyoo commented Apr 13, 2023

howardyoo left a comment

howardyoo Apr 13, 2023

ferruzzi commented Apr 13, 2023 •

edited

Loading

ferruzzi commented Apr 17, 2023

potiuk left a comment

conorbev commented Jul 13, 2023

ferruzzi commented Jul 14, 2023

[AIP-49] [OTel Integration] Add tagging to existing stats #30496

[AIP-49] [OTel Integration] Add tagging to existing stats #30496

Conversation

ferruzzi commented Apr 5, 2023 • edited Loading

ferruzzi commented Apr 6, 2023

vandonr-amz Apr 10, 2023

Choose a reason for hiding this comment

howardyoo Apr 11, 2023

Choose a reason for hiding this comment

ferruzzi Apr 12, 2023

Choose a reason for hiding this comment

ferruzzi Apr 12, 2023

Choose a reason for hiding this comment

howardyoo Apr 13, 2023

Choose a reason for hiding this comment

potiuk Apr 17, 2023

Choose a reason for hiding this comment

ferruzzi commented Apr 11, 2023

potiuk commented Apr 11, 2023

ferruzzi commented Apr 11, 2023

howardyoo commented Apr 11, 2023

ferruzzi commented Apr 12, 2023 • edited Loading

ferruzzi commented Apr 12, 2023

o-nikolas commented Apr 12, 2023

ferruzzi commented Apr 12, 2023

ferruzzi commented Apr 12, 2023

o-nikolas commented Apr 12, 2023

ferruzzi commented Apr 12, 2023

ferruzzi commented Apr 12, 2023 • edited Loading

ferruzzi commented Apr 12, 2023

ferruzzi commented Apr 13, 2023

howardyoo commented Apr 13, 2023

howardyoo left a comment

Choose a reason for hiding this comment

howardyoo Apr 13, 2023

Choose a reason for hiding this comment

ferruzzi commented Apr 13, 2023 • edited Loading

ferruzzi commented Apr 17, 2023

potiuk left a comment

Choose a reason for hiding this comment

conorbev commented Jul 13, 2023

ferruzzi commented Jul 14, 2023

ferruzzi commented Apr 5, 2023 •

edited

Loading

ferruzzi commented Apr 12, 2023 •

edited

Loading

ferruzzi commented Apr 12, 2023 •

edited

Loading

ferruzzi commented Apr 13, 2023 •

edited

Loading