Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[connector/exceptions] Add support for exemplars in exceptionsconnector #31819

Merged

Conversation

marctc
Copy link
Contributor

@marctc marctc commented Mar 18, 2024

Description:
Adds support for exemplars fort the generated metrics from exceptions.
It relates the span and trace id with metrics.

The ingestion needs to be enabled via config (disbled by default).

Link to tracking Issue: Resolves #24409

Documentation: Added documentation for enabling generation of exemplars.

@marctc marctc changed the title Add support for exemplars in exceptionsconnector [connector/exceptions] Add support for exemplars in exceptionsconnector Mar 18, 2024
Copy link
Contributor

github-actions bot commented Apr 3, 2024

This PR was marked stale due to lack of activity. It will be closed in 14 days.

@github-actions github-actions bot added the Stale label Apr 3, 2024
Copy link
Contributor

Closed as inactive. Feel free to reopen if this PR is still being worked on.

@github-actions github-actions bot closed this Apr 18, 2024
@jpkrohling jpkrohling reopened this Apr 30, 2024
@@ -15,6 +15,10 @@ type Dimension struct {
Default *string `mapstructure:"default"`
}

type ExemplarsConfig struct {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you might not need the suffix Config here?

connector/exceptionsconnector/config.go Show resolved Hide resolved
Copy link

codecov bot commented Apr 30, 2024

Codecov Report

Attention: Patch coverage is 80.64516% with 6 lines in your changes are missing coverage. Please review.

Project coverage is 83.11%. Comparing base (cbf003e) to head (77e9bf1).
Report is 481 commits behind head on main.

Files Patch % Lines
connector/exceptionsconnector/connector_metrics.go 80.64% 4 Missing and 2 partials ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main   #31819      +/-   ##
==========================================
+ Coverage   81.88%   83.11%   +1.23%     
==========================================
  Files        1858     1977     +119     
  Lines      172718   189370   +16652     
==========================================
+ Hits       141423   157400   +15977     
- Misses      26984    27139     +155     
- Partials     4311     4831     +520     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@github-actions github-actions bot removed the Stale label May 1, 2024
Copy link
Contributor

@niftyhoot niftyhoot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@marctc This is a very nice enhancement. Thank you!

exc.exemplars.At(i).SetTimestamp(timestamp)
}
exc.exemplars.CopyTo(dp.Exemplars())
exc.attrs.CopyTo(dp.Attributes())
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@marctc - Under heavy load conditions or in case of an application gone into some sort of crash loop, could exc.exemplars possibly have too many entries to copy?
The situation may be worse if the end users add custom dimensions that can possibly have large number of unique measurements (cardinality)
So, I wonder if it could probably overwhelm the system (or in worst case, panic) in the CopyTo operation.

I see that spans-to-metrics connector has some safeguard mechanisms like max_per_data_point (along with dimensions_cache_size and resource_metrics_cache_size)

https://github.com/open-telemetry/opentelemetry-collector-contrib/blob/main/connector/spanmetricsconnector/config.go#L93

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you think for the scope of this PR is necessary or should we address it in a follow up PR?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can do a follow up PR but I got a bit curious and tried to run an overnight stress test after patching this change in my local setup. The collectors crashed multiple times. My recommendation would be to restrict just one exemplar per data point for now.

panic: runtime error: index out of range [340043] with length 340043

goroutine 70 [running]:
go.opentelemetry.io/collector/pdata/pmetric.ExemplarSlice.CopyTo({0xc02cf84f90?, 0xc02cf3e72c?}, {0xc14ea5f7d8?, 0xc05ab969b0?})
	go.opentelemetry.io/collector/[email protected]/pmetric/generated_exemplarslice.go:134 +0x12d
github.com/open-telemetry/opentelemetry-collector-contrib/connector/exceptionsconnector.(*metricsConnector).collectExceptions(0xc000c62900, {0xc1363f6690?, 0xc05ab969b0?})
	github.com/open-telemetry/opentelemetry-collector-contrib/connector/[email protected]/connector_metrics.go:161 +0x3cb
github.com/open-telemetry/opentelemetry-collector-contrib/connector/exceptionsconnector.(*metricsConnector).exportMetrics(0xc000c62900, {0x3530320, 0xc001171da0})
	github.com/open-telemetry/opentelemetry-collector-contrib/connector/[email protected]/connector_metrics.go:131 +0x25f
github.com/open-telemetry/opentelemetry-collector-contrib/connector/exceptionsconnector.(*metricsConnector).ConsumeTraces(0xc000c62900, {0x3530320, 0xc001171da0}, {0xc026bbd308?, 0xc002c47498?})
	github.com/open-telemetry/opentelemetry-collector-contrib/connector/[email protected]/connector_metrics.go:122 +0x14b
go.opentelemetry.io/collector/internal/fanoutconsumer.(*tracesConsumer).ConsumeTraces(0xc001171cb0?, {0x3530320, 0xc001171da0}, {0xc026bbd308?, 0xc002c47498?})

assert.NotZero(t, exemplar.Timestamp())
assert.NotZero(t, exemplar.TraceID())
assert.NotZero(t, exemplar.SpanID())

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would it be a good idea to add a benchmark test ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There's already a benchmark (BenchmarkConnectorConsumeTraces). Do you have anything in mind exactly to benchmark?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was thinking, digging Exception events in the real world workload, sort of requires deep-packet inspection and may be we add a test that essentially find 1 in 10 traces with baggage containing exception (say, as one of the 5 span event entries) and see how fast can we generate metrics and/or log events.
But I am goo with current change. I think its functionally correct and pretty awesome work. Thank You for doing this Marc.

if !c.config.Exemplars.Enabled || traceID.IsEmpty() {
return
}
e := exc.exemplars.AppendEmpty()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can this Append result in an unbounded growth and that in fact might be the root cause for the panic I noticed in my overnight test.
So, an alternate approach could be to reset exc.exemplars to a new slice after exc.exemplars.CopyTo(dp.Exemplars())

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we could just add that count mechanism that spansmetricsconnector is using, wdyt?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hello Marc - That should be good IMO. However, what do you think of resetting exc.exemplars back to pmetric.NewExemplarSlice() after doing CopyTo
I guess I do not understand the need to hang on to older exemplars post that CopyTo operation. Can you please help me understand?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You are totally right, what about now?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me Marc. Thank you.

@marctc marctc requested a review from niftyhoot May 7, 2024 14:13
Copy link
Contributor

@niftyhoot niftyhoot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for doing this Marc.

@jpkrohling
Copy link
Member

The tests are failing, apparently:

    connector_metrics_test.go:201: 
        	Error Trace:	/home/runner/work/opentelemetry-collector-contrib/opentelemetry-collector-contrib/connector/exceptionsconnector/connector_metrics_test.go:201
        	            				/home/runner/work/opentelemetry-collector-contrib/opentelemetry-collector-contrib/connector/exceptionsconnector/connector_metrics_test.go:169
        	            				/home/runner/work/opentelemetry-collector-contrib/opentelemetry-collector-contrib/connector/exceptionsconnector/connector_metrics_test.go:87
        	Error:      	Not equal: 
        	            	expected: 2
        	            	actual  : 1
        	Test:       	TestConnectorConsumeTraces/Test_two_consumptions

@marctc
Copy link
Contributor Author

marctc commented May 23, 2024

@jpkrohling now is ready to merge!

@crobert-1 crobert-1 added the ready to merge Code review completed; ready to merge by maintainers label May 23, 2024
@jpkrohling jpkrohling merged commit 4fad287 into open-telemetry:main May 24, 2024
169 checks passed
@github-actions github-actions bot added this to the next release milestone May 24, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
connector/exceptions ready to merge Code review completed; ready to merge by maintainers
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[connector/exceptions] Relate generated metrics from the source trace/span
5 participants