-
Notifications
You must be signed in to change notification settings - Fork 2.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Google Cloud exporter reports unexpected error #36602
Comments
/label processor/interval |
Pinging code owners for processor/interval: @RichieSams @sh0rez @djaglowski. See Adding Labels via Comments if you do not have permissions to add labels yourself. For example, comment '/label priority:p2 -needs-triaged' to set the priority and remove the needs-triaged label. |
The out-of-order error can happen within a batch of metrics, or between two batches. Given the current batch doesn't conflict, it likely means that there was a previous batch sent with a later start time. Note that this does not "merge" the instance label--it just deletes it: resource/merge_instances:
attributes:
- key: service.instance.id
action: delete That is probably part of the problem. If the instance id is required to differentiate between two series and you drop it, you will get errors like the one you got. You will need to use a different processor, like the metrics transform processor, to aggregate away the attribute, rather than removing it: https://github.com/open-telemetry/opentelemetry-collector-contrib/tree/main/processor/metricstransformprocessor#aggregate-labels I also see: |
@dashpole Thanks for the feedback. Nice catch about the missing namespace or pod name. This should not have any impact for the moment, as I have only one pod, but I will fix it to have a clean setup. |
I'm not sure if the |
The It uses the unique combination of scope info, resource info, metric info, and datapoint info to identify a given datapoint. If any of the above are different, then |
You could temporarily add a text exporter, in addition to the google cloud exporter. Which could help with debugging what values are being exported. |
Ah... So I completely misuse it 😅
This is what I did. You may see in the first post of the issue that there is the |
To give a little bit more context about my use case. I have a service on which I don't have a lot of control (I cannot change the sent telemetry, neither the |
I think you can move a resource attribute to a metric attribute using the transform processor, and then use the metrics transform processor to aggregate. The transform processor would use the |
@AlexisBRENON Has this issue been resolved, or is there something else necessary here? Removing |
I currently have a setup that seems to work fine. the pipeline is as follow: processors:
memory_limiter:
check_interval: 1s
limit_percentage: 65
spike_limit_percentage: 20
batch:
interval:
interval: 10s
resourcedetection:
detectors: [gcp]
timeout: 10s
cumulativetodelta:
max_staleness: 24h
transform/resource:
error_mode: ignore
metric_statements:
- context: "resource"
statements:
- set(attributes["service.instance.id"], attributes["service.namespace"]) # Override instance ID to allow aggregation
- set(attributes["k8s.namespace.name"], "my-namespace") # Set namespace for Managed Prometheus export
groupbyattrs: # Group all metrics from the same client
keys:
- service.name
- service.version
- service.namespace
transform/aggregate: # Aggregate metrics from the same client
error_mode: ignore
metric_statements:
- context: datapoint
statements:
- set(time, TruncateTime(Now(), Duration("10s"))) # Align timestamps to allow aggregation
- set(start_time, TruncateTime(start_time, Duration("10s"))) # Align timestamps to allow aggregation
- delete_key(attributes, "http.host")
- delete_key(attributes, "net.host.port")
- delete_key(attributes, "http.server_name")
- delete_key(attributes, "server.address")
- delete_key(attributes, "server.port")
- context: metric
statements:
- aggregate_on_attributes("sum") where type != METRIC_DATA_TYPE_GAUGE
- aggregate_on_attributes("mean") where type == METRIC_DATA_TYPE_GAUGE
deltatocumulative:
service:
pipelines:
metrics:
receivers: [otlp]
processors:
- memory_limiter
- batch
- interval
- resourcedetection
- cumulativetodelta
- transform/resource
- groupbyattrs
- transform/aggregate
- deltatocumulative
exporters:
- googlemanagedprometheus I think we can close this issue 👍 |
Component(s)
exporter/googlecloud
processor/interval
What happened?
Description
I send HTTP server requests duration metrics to an OTel collector which send them to Google Cloud Monitoring.
The Google Cloud Exporter may report errors while the Debug exporter outputs the metrics as expected.
Steps to Reproduce
Use the Python WSGI instrumentation to report HTTP requests duration.
Expected Result
I expect to export the metrics to Google Cloud Monitoring without any error in the logs.
Actual Result
There is some cases (not always) where Google cloud exporter sends error messages about out of order data points.
Collector version
v0.114.0 custom distribution to add the
interval
processorEnvironment information
Environment
Kubernetes
Collector deployed with the Kubernetes Operator.
OpenTelemetry Collector configuration
Log output
Additional context
As you may see, I have two timeseries, differentiated with the
http.method
data point attribute (POST
andPUT
). So I don't understand why the google cloud exporter complain about out of order points.There are some cases where the error logs do not shows up. On the few examples that I found, it is when my datapoints have ordered timestamp.
But are the timestamps expected to be ordered across different timeseries ? If so, how can I make sure that the exporter sends the timeseries in the right order ?
This seems linked to the addition of the
interval
processor (and theresource/merge_instance
one to merge multiple instances metrics) as I don't have such kind of errors if I remove them.The text was updated successfully, but these errors were encountered: