Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Feature/SPM]: Support spanmetrics connector #4345

Closed
albertteoh opened this issue Mar 27, 2023 · 5 comments · Fixed by #4452
Closed

[Feature/SPM]: Support spanmetrics connector #4345

albertteoh opened this issue Mar 27, 2023 · 5 comments · Fixed by #4452

Comments

@albertteoh
Copy link
Contributor

albertteoh commented Mar 27, 2023

Requirement

As a jaeger operator, I want to use the newly introduced spanmetrics connector for the following reasons:

  • Introduces a number of improvements to the spanmetrics processor.
  • The spanmetrics processor is deprecated and so future enhancements will no longer be added to the processor.
  • Avoid being in a situation where the spanmetrics processor is decommissioned preventing me from upgrading my OTEL collector version.

Problem

The known breaking issues include:

Proposal

Make the metric names configurable.

Perhaps introduce an spm parameter namespace where metric names can be configured. e.g. --spm.calls-metric-name and --spm.latency-metric-name.

This would also require an update to the example provided in docker-compose/monitor.

Other suggestions welcome.

Open questions

No response

@warning-explosive
Copy link

I have a workaround based on metricstransform processor. Here an example of otel-collector-config.yaml:

receivers:
  otlp:
    protocols:
      grpc:
      http:

processors:
  batch:
  metricstransform/insert:
    transforms:
      - include: calls
        match_type: strict
        action: update
        new_name: calls_total
        operations:
        - action: update_label
          label: span.name
          new_label: operation
      - include: duration
        match_type: strict
        action: update
        new_name: latency
        operations:
          - action: update_label
            label: span.name
            new_label: operation

exporters:
  otlp:
    endpoint: "jaeger:4317"
    tls:
      insecure: true
  prometheus:
    endpoint: "otel-collector:9464"
    resource_to_telemetry_conversion:
      enabled: true
    enable_open_metrics: true

connectors:
  spanmetrics:
    histogram:
      explicit:
        buckets: [100us, 1ms, 2ms, 6ms, 10ms, 100ms, 250ms]
    dimensions:
      - name: http.method
        default: GET
      - name: http.status_code
    dimensions_cache_size: 1000
    aggregation_temporality: "AGGREGATION_TEMPORALITY_CUMULATIVE"

service:
  pipelines:
    traces:
      receivers: [otlp]
      processors: [batch]
      exporters: [otlp, spanmetrics]
    metrics:
      receivers: [otlp, spanmetrics]
      processors: [metricstransform/insert]
      exporters: [prometheus]

@Owenxh
Copy link

Owenxh commented Mar 31, 2023

I want to use the spanmetrics connector too.

@utezduyar
Copy link

I have a workaround based on metricstransform processor. Here an example of otel-collector-config.yaml:

receivers:
  otlp:
    protocols:
      grpc:
      http:

processors:
  batch:
  metricstransform/insert:
    transforms:
      - include: calls
        match_type: strict
        action: update
        new_name: calls_total
        operations:
        - action: update_label
          label: span.name
          new_label: operation
      - include: duration
        match_type: strict
        action: update
        new_name: latency
        operations:
          - action: update_label
            label: span.name
            new_label: operation

exporters:
  otlp:
    endpoint: "jaeger:4317"
    tls:
      insecure: true
  prometheus:
    endpoint: "otel-collector:9464"
    resource_to_telemetry_conversion:
      enabled: true
    enable_open_metrics: true

connectors:
  spanmetrics:
    histogram:
      explicit:
        buckets: [100us, 1ms, 2ms, 6ms, 10ms, 100ms, 250ms]
    dimensions:
      - name: http.method
        default: GET
      - name: http.status_code
    dimensions_cache_size: 1000
    aggregation_temporality: "AGGREGATION_TEMPORALITY_CUMULATIVE"

service:
  pipelines:
    traces:
      receivers: [otlp]
      processors: [batch]
      exporters: [otlp, spanmetrics]
    metrics:
      receivers: [otlp, spanmetrics]
      processors: [metricstransform/insert]
      exporters: [prometheus]

This worked well for me however something is not entirely right. I have a demo application with 5 services talking to each other. All 4 of them has data under Monitor but one of them does not have data. One application out of 4 is very similar to this one that is not working which puzzled me.

Maybe something about the name of the span? Any tips on how to debug it?

@warning-explosive
Copy link

I have a workaround based on metricstransform processor. Here an example of otel-collector-config.yaml:

receivers:

otlp:

protocols:
  grpc:
  http:

processors:

batch:

metricstransform/insert:

transforms:
  - include: calls
    match_type: strict
    action: update
    new_name: calls_total
    operations:
    - action: update_label
      label: span.name
      new_label: operation
  - include: duration
    match_type: strict
    action: update
    new_name: latency
    operations:
      - action: update_label
        label: span.name
        new_label: operation

exporters:

otlp:

endpoint: "jaeger:4317"
tls:
  insecure: true

prometheus:

endpoint: "otel-collector:9464"
resource_to_telemetry_conversion:
  enabled: true
enable_open_metrics: true

connectors:

spanmetrics:

histogram:
  explicit:
    buckets: [100us, 1ms, 2ms, 6ms, 10ms, 100ms, 250ms]
dimensions:
  - name: http.method
    default: GET
  - name: http.status_code
dimensions_cache_size: 1000
aggregation_temporality: "AGGREGATION_TEMPORALITY_CUMULATIVE"

service:

pipelines:

traces:
  receivers: [otlp]
  processors: [batch]
  exporters: [otlp, spanmetrics]
metrics:
  receivers: [otlp, spanmetrics]
  processors: [metricstransform/insert]
  exporters: [prometheus]

This worked well for me however something is not entirely right. I have a demo application with 5 services talking to each other. All 4 of them has data under Monitor but one of them does not have data. One application out of 4 is very similar to this one that is not working which puzzled me.

Maybe something about the name of the span? Any tips on how to debug it?

Several possible issues and solutions that comes to my mind:

  1. Spanmetrics namespaces - check if there are any
  2. Rejected/lost/dropped spans - check otel-collector own metrics, maybe you have some configuration or connectivity issues
  3. Check your app exporter - substitute otel exporter with console exporter so as to be sure that you have expected spans
  4. See debug logs - https://github.com/open-telemetry/opentelemetry-collector/blob/main/docs/troubleshooting.md

@utezduyar
Copy link

Thanks! The problem was due to the application missing http semantic conventions. Otherwise your workaround works like a charm.

yurishkuro added a commit that referenced this issue May 27, 2023
## Which problem is this PR solving?
Resolves #4345

## Short description of the changes
- Adds support for spanmetrics connector via config parameters, while
maintaining backwards compatibility with spanmetrics processor.
- Some quality of life changes added to help contributors with making
changes to the SPM feature.

## Testing

- Add test cases for connector case, and rely on existing tests to
assert backwards compatibility.
- Manual testing via docker-compose to ensure both the service-level and
operation-level metrics are visible in the Monitor tab.

---------

Signed-off-by: albertteoh <[email protected]>
Co-authored-by: Yuri Shkuro <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants