-
Notifications
You must be signed in to change notification settings - Fork 2.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[connector/exceptions] Add support for exemplars in exceptionsconnector #31819
[connector/exceptions] Add support for exemplars in exceptionsconnector #31819
Conversation
This PR was marked stale due to lack of activity. It will be closed in 14 days. |
Closed as inactive. Feel free to reopen if this PR is still being worked on. |
@@ -15,6 +15,10 @@ type Dimension struct { | |||
Default *string `mapstructure:"default"` | |||
} | |||
|
|||
type ExemplarsConfig struct { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think you might not need the suffix Config here?
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #31819 +/- ##
==========================================
+ Coverage 81.88% 83.11% +1.23%
==========================================
Files 1858 1977 +119
Lines 172718 189370 +16652
==========================================
+ Hits 141423 157400 +15977
- Misses 26984 27139 +155
- Partials 4311 4831 +520 ☔ View full report in Codecov by Sentry. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@marctc This is a very nice enhancement. Thank you!
exc.exemplars.At(i).SetTimestamp(timestamp) | ||
} | ||
exc.exemplars.CopyTo(dp.Exemplars()) | ||
exc.attrs.CopyTo(dp.Attributes()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@marctc - Under heavy load conditions or in case of an application gone into some sort of crash loop, could exc.exemplars possibly have too many entries to copy?
The situation may be worse if the end users add custom dimensions that can possibly have large number of unique measurements (cardinality)
So, I wonder if it could probably overwhelm the system (or in worst case, panic) in the CopyTo operation.
I see that spans-to-metrics connector has some safeguard mechanisms like max_per_data_point
(along with dimensions_cache_size
and resource_metrics_cache_size
)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you think for the scope of this PR is necessary or should we address it in a follow up PR?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can do a follow up PR but I got a bit curious and tried to run an overnight stress test after patching this change in my local setup. The collectors crashed multiple times. My recommendation would be to restrict just one exemplar per data point for now.
panic: runtime error: index out of range [340043] with length 340043
goroutine 70 [running]:
go.opentelemetry.io/collector/pdata/pmetric.ExemplarSlice.CopyTo({0xc02cf84f90?, 0xc02cf3e72c?}, {0xc14ea5f7d8?, 0xc05ab969b0?})
go.opentelemetry.io/collector/[email protected]/pmetric/generated_exemplarslice.go:134 +0x12d
github.com/open-telemetry/opentelemetry-collector-contrib/connector/exceptionsconnector.(*metricsConnector).collectExceptions(0xc000c62900, {0xc1363f6690?, 0xc05ab969b0?})
github.com/open-telemetry/opentelemetry-collector-contrib/connector/[email protected]/connector_metrics.go:161 +0x3cb
github.com/open-telemetry/opentelemetry-collector-contrib/connector/exceptionsconnector.(*metricsConnector).exportMetrics(0xc000c62900, {0x3530320, 0xc001171da0})
github.com/open-telemetry/opentelemetry-collector-contrib/connector/[email protected]/connector_metrics.go:131 +0x25f
github.com/open-telemetry/opentelemetry-collector-contrib/connector/exceptionsconnector.(*metricsConnector).ConsumeTraces(0xc000c62900, {0x3530320, 0xc001171da0}, {0xc026bbd308?, 0xc002c47498?})
github.com/open-telemetry/opentelemetry-collector-contrib/connector/[email protected]/connector_metrics.go:122 +0x14b
go.opentelemetry.io/collector/internal/fanoutconsumer.(*tracesConsumer).ConsumeTraces(0xc001171cb0?, {0x3530320, 0xc001171da0}, {0xc026bbd308?, 0xc002c47498?})
assert.NotZero(t, exemplar.Timestamp()) | ||
assert.NotZero(t, exemplar.TraceID()) | ||
assert.NotZero(t, exemplar.SpanID()) | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would it be a good idea to add a benchmark test ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There's already a benchmark (BenchmarkConnectorConsumeTraces
). Do you have anything in mind exactly to benchmark?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was thinking, digging Exception events in the real world workload, sort of requires deep-packet inspection and may be we add a test that essentially find 1 in 10 traces with baggage containing exception (say, as one of the 5 span event entries) and see how fast can we generate metrics and/or log events.
But I am goo with current change. I think its functionally correct and pretty awesome work. Thank You for doing this Marc.
if !c.config.Exemplars.Enabled || traceID.IsEmpty() { | ||
return | ||
} | ||
e := exc.exemplars.AppendEmpty() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can this Append result in an unbounded growth and that in fact might be the root cause for the panic I noticed in my overnight test.
So, an alternate approach could be to reset exc.exemplars to a new slice after exc.exemplars.CopyTo(dp.Exemplars())
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we could just add that count mechanism that spansmetricsconnector is using, wdyt?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hello Marc - That should be good IMO. However, what do you think of resetting exc.exemplars
back to pmetric.NewExemplarSlice()
after doing CopyTo
I guess I do not understand the need to hang on to older exemplars post that CopyTo operation. Can you please help me understand?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You are totally right, what about now?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good to me Marc. Thank you.
…/opentelemetry-collector-contrib into exceptionsconnector/add_exemplars
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for doing this Marc.
The tests are failing, apparently:
|
@jpkrohling now is ready to merge! |
Description:
Adds support for exemplars fort the generated metrics from exceptions.
It relates the span and trace id with metrics.
The ingestion needs to be enabled via config (disbled by default).
Link to tracking Issue: Resolves #24409
Documentation: Added documentation for enabling generation of exemplars.