Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Splunk HEC Exporter to emit health metrics #36519

Open
harsh8398 opened this issue Nov 25, 2024 · 7 comments
Open

Splunk HEC Exporter to emit health metrics #36519

harsh8398 opened this issue Nov 25, 2024 · 7 comments
Labels
enhancement New feature or request exporter/splunkhec

Comments

@harsh8398
Copy link

harsh8398 commented Nov 25, 2024

Component(s)

exporter/splunkhec

Is your feature request related to a problem? Please describe.

The goal is for the users of OTEL collector HEC exporter to be able to emit metrics based on different HTTP error types. So we can understand the health information subsequently.

Describe the solution you'd like

We propose adding health metrics surrounding the HTTP call. To make this change generally applicable and to not be specific to edge processor metric, i.e., edge_processor_export_error_count, we can allow for customizable metric name and allow for customizable dimension name (errorType) and its values for HTTP status codes by reusing HecTelemetry config.

Pros:

  • Good enhancement to the exporter so OSS can re-use
  • Ensure compatibility of the metrics as the exporter evolves over time

Cons:

  • More intrusive
  • More and more metrics around exporter so it can become hard to maintain

Describe alternatives you've considered

In alternate scenario, we propose exposing pushMetricsData and pushLogData to allow devs to wrap Splunk HEC Exporter into their project. This is not possible today because of how these two functions are wrapped in the exporterhelper before they are attached to ConsumeMetrics or ConsumeLogs functions. This prevents the users of this exporter from receiving HTTP status code errors since the consumer retries forever and never returns to the wrapped function.

Pros:

  • Increased customization. Users of the code can customize what they want to do with the errors
  • Less bloated metrics

Cons:

  • Wrapper is generally not configurable via OTEL config. This disobeys some of the OTEL principles and creates a risk for incompatibility as the exporter API can change over time.

Additional context

We want to add metrics indicating exporter health. This in turn will be consumed by UI to provide actionable alerts to the Edge Processor (EP) users, initially, the most obvious errors at the exporter. The metric spec is as follows and suggests the following health error classes to classify errors at the exporter:

edge_processor_export_error_count
"""Shows the number of errors that have occurred when exporting data to a data destination"""

Standard dimensions: (instance, destinationId)

Dimension: errorType
Values:
1. HostNotFound
2. ConnectionForbidden
3. NotAuthenticated
4. ResourceNotFound
5. NotAuthorized
6. InvalidRequest
7. Unclassified
@harsh8398 harsh8398 added enhancement New feature or request needs triage New item requiring triage labels Nov 25, 2024
Copy link
Contributor

Pinging code owners:

See Adding Labels via Comments if you do not have permissions to add labels yourself.

@atoulme
Copy link
Contributor

atoulme commented Dec 7, 2024

Please remove this PDF and inline the request in a github issue. Please read on the obsrecv package offered by the collector which tracks already observability information per component.

@harsh8398
Copy link
Author

Hi @atoulme, could you please point me to the docs for the package. I couldn't see it listed if I search in go packages.

@atoulme
Copy link
Contributor

atoulme commented Dec 12, 2024

@atoulme
Copy link
Contributor

atoulme commented Dec 12, 2024

As part of DMX-10614 we want to add metrics indicating exporter health.
this link refers to a private resource. Please consider removing.

@atoulme
Copy link
Contributor

atoulme commented Dec 12, 2024

I am not entirely sure why you state 2 options here either. I recommend you look at how requests for enhancements are filed in this repository and follow the format. See https://github.com/open-telemetry/opentelemetry-collector-contrib/issues/new?assignees=&labels=enhancement%2Cneeds+triage&projects=&template=feature_request.yaml

@harsh8398
Copy link
Author

I updated the description now @atoulme

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request exporter/splunkhec
Projects
None yet
Development

No branches or pull requests

2 participants