Ability to remove/flush unused attributes in metric instruments #2997

legendecas · 2022-05-27T08:52:29Z

Removing/flushing unused dynamic attributes in metric instruments is very important to not leaking memories.

Spec issue: open-telemetry/opentelemetry-specification#1297
Prometheus Suggestion: https://prometheus.io/docs/instrumenting/writing_clientlibs/#labels

Metrics with labels SHOULD support a remove() method with the same signature as labels() that will remove a Child from the metric no longer exporting it, and a clear() method that removes all Children from the metric. These invalidate caching of Children.

Opening this issue to keep track of this problem.

The text was updated successfully, but these errors were encountered:

jmacd · 2022-07-27T16:44:34Z

@legendecas There are at least four issues being discussed here and in #3105.

If you're using delta temporality, you should be able to avoid memory buildup regardless of which attributes are in use. Memory buildup is a problem for the consumer downstream, in this case.

If you're using cumulative temporality and you find a need to delete() the way Prometheus suggersts, you may instead (in the current OTel spec) use an asynchronous instrument and unregister the callback producing data for that instrument when you want to stop reporting it. Support for callback unregister was explicitly for this purpose, as (IMO) we do not want to try to define the delete operation. This was debated in open-telemetry/opentelemetry-specification#2317.

If you're using cumulative temporality and you simply want to strip attributes, so that less memory is needed in the long term, restarting (or reconfiguring) the SDK with a View that eliminates those attributes should give the correct results. That is the sense in which open-telemetry/opentelemetry-specification#1297 defines safe attribute removal for SDKs--if you've implemented views you already have the machinery in place to safely remove attributes.

The final potential topic here is being discussed in open-telemetry/opentelemetry-specification#1891, question is what we want to do when cardinality is high and there are stale entries that could be removed from memory. Please join that discussion. We're able to do this with per-series start timestamps; however this complicates other things, and I'd like official guidance from Prometheus on this.

legendecas · 2022-07-29T06:22:46Z

If you're using delta temporality, you should be able to avoid memory buildup regardless of which attributes are in use. Memory buildup is a problem for the consumer downstream, in this case.

When using delta temporality with an async instrument, the SDK needs to export delta values for the async instrument thus it has to remember the last reported metric streams: https://github.com/open-telemetry/opentelemetry-specification/blob/main/specification/metrics/supplementary-guidelines.md#asynchronous-example-delta-temporality.

jmacd · 2022-08-04T19:16:15Z

Thanks @legendecas. You are correct. My statement about delta temporality only applies to synchronous instruments. This is the sense in which Lightstep would like to support a temporality preference named "stateless" that would configure delta temporality for synchronous and cumulative temporality for asynchronous instruments.

As you point out, asynchronous instruments when configured with delta temporality have the same problem as synchronous instruments configured with cumulative temporality, making async/delta have roughly the same problem as discussed in open-telemetry/opentelemetry-specification#1891. Please head to that discussion, I will continue there with a takeaway from this one.

dyladan · 2022-09-14T16:05:03Z

@legendecas what would you think about documenting this issue and handling handling attribute removal after GA? I would like to get the GA release out before kubecon and this is the last big topic holding it.

legendecas · 2022-09-14T16:11:51Z

Given that the specification of metrics api is already declared stable, I'm fine with tagging this issue as after-GA.

asafm · 2023-02-22T15:34:05Z

I've raised an issue for adding Delete() for an instrument in the API specifications: open-telemetry/opentelemetry-specification#3062

github-actions · 2023-07-10T06:37:01Z

This issue is stale because it has been open 60 days with no activity. Remove stale label or comment or this will be closed in 14 days.

eplightning · 2023-08-04T08:59:07Z

Does this issue cover asynchronous instruments as well?

I have encountered the issue where SDK keeps exporting metrics which no longer get reported by my callbacks:

    this.objectState = capiMeter().createObservableGauge('capi.object.state', {
      valueType: ValueType.INT,
      unit: '{state}',
      description: 'The current state of an object'
    });
    
    this.objectState.addCallback(async (result) => {
      for (const obj of await this.objectStorage.getMetrics()) {
        result.observe(obj.active ? 1 : 0, {
          'capi.object.name': obj.name
        });
      }
    });
    
    // SDK will keep exporting all metrics for all attribute sets that ever got observed inside the callback, not just the recent call

pichlermarc · 2023-08-09T13:29:37Z

Does this issue cover asynchronous instruments as well?

I have encountered the issue where SDK keeps exporting metrics which no longer get reported by my callbacks:

    this.objectState = capiMeter().createObservableGauge('capi.object.state', {
      valueType: ValueType.INT,
      unit: '{state}',
      description: 'The current state of an object'
    });
    
    this.objectState.addCallback(async (result) => {
      for (const obj of await this.objectStorage.getMetrics()) {
        result.observe(obj.active ? 1 : 0, {
          'capi.object.name': obj.name
        });
      }
    });
    
    // SDK will keep exporting all metrics for all attribute sets that ever got observed inside the callback, not just the recent call

@eplightning Yep, that's also something that would be covered by it. Does it keep reporting also when un-registering the callback (calling .removeCallback() on objectState)? 🤔

eplightning · 2023-08-11T12:59:49Z

@pichlermarc

Doesn't seem to change anything: old metrics remain. Tested on Prometheus and console exporters:

    const cb = async (result: ObservableResult<ShardMetricLabels>) => {
      for (const shard of await this.shardStorage.getMetrics()) {
        result.observe(shard.active ? 1 : 0, {
          'capi.shard.name': `${shard.name}`
        });
      }
    };

    this.metricShardState.addCallback(cb);
    console.log('registered');

    setTimeout(() => {
      this.metricShardState.removeCallback(cb);
      console.log('unregistered');
    }, 1000 * 120);

eplightning · 2023-08-11T13:08:54Z

Spec update was recently merged which I believe concerns this issue as well (at least the async instruments): open-telemetry/opentelemetry-specification#3242

pichlermarc · 2023-08-30T11:56:34Z

Spec update was recently merged which I believe concerns this issue as well (at least the async instruments): open-telemetry/opentelemetry-specification#3242

@eplightning thanks for mentioning this and for double-checking! I created an issue to implement the spec for asynchronous instruments. #4096

I think we can keep this issue here at waiting-for-spec as we're waiting for open-telemetry/opentelemetry-specification#1297

pkeuter · 2024-03-28T14:33:54Z

@pichlermarc I was wondering what the status is on this one. There seems to be no way (outside of restarting the application) to flush any old attribute sets that were not observed for a long time. Is there any workaround for this that you know of?

enzo-cappa · 2024-04-08T20:02:56Z

This is a problem for Kafka Consumer. Due to how they work, partitions are assigned dynamically. Also, metrics are commonly reported per partition.

Due to this issue, when a partition is unassigned OpenTelemetry will keep reporting the "last known value" for the metrics of partitions for which the consumer that is no longer assigned. This causes all kind of problems.

The only workaround we could find is to send "0" as the value for metrics corresponding to partitions that are not currently assigned. This is far from ideal, as 0 is actually a valid value. We end up having some logic when consuming the metrics so we can exclude the invalid values.

pichlermarc · 2024-04-09T12:57:39Z

@pichlermarc I was wondering what the status is on this one. There seems to be no way (outside of restarting the application) to flush any old attribute sets that were not observed for a long time. Is there any workaround for this that you know of?

@pkeuter there is currently no workaround for this.

The way forward here is to implement #4095 (which would limit memory use as old metrics age out) or drive the specification issue open-telemetry/opentelemetry-specification#1297, then implement the outcome here.

Contributions are welcome, but keep in mind that Implementing #4095 is likely non-trivial.

asafm · 2024-06-01T17:25:03Z

How about using async instruments? Then you control which partition metrics is reported for.

pichlermarc · 2024-10-15T09:03:26Z

Removing up-for-grabs as this is not actionable without a specification to implement.

netiperher · 2024-11-04T08:50:37Z

This is such a fundamental limitation with the opentelemetry SDK (and specification) that I recommend my clients to skip the SDK for dynamic attributes and emits metrics datapoints themselves, e.g. by attaching a custom metrics producer using the experimental metricProducers api:

  const fooMetric: GaugeMetricData = {
    dataPointType: DataPointType.GAUGE,
    descriptor: {
      name: 'foo',
      description: 'Foo metrics',
      unit: '',
      type: InstrumentType.OBSERVABLE_GAUGE,
      valueType: ValueType.DOUBLE,
    },
    aggregationTemporality: AggregationTemporality.CUMULATIVE,
    dataPoints: [],
  };
  const startTime = millisToHrTime(Date.now());

  const rawMetricsProducer: MetricProducer = {
    collect: async () => {
      return {
        resourceMetrics: {
          resource,
          scopeMetrics: [
            {
              scope: {
                name: 'raw',
              },
              metrics: [
                {
                  ...fooMetric,
                  dataPoints: [
                    {
                      startTime,
                      endTime: millisToHrTime(Date.now()),
                      attributes: { 'some-attribute': 'some-dynamic-value' },
                      value: 42,
                    },
                  ],
                },
              ],
            },
          ],
        },
        errors: [],
      };
    },
  };

  const meterProvider = new MeterProvider({
    resource,
    readers: [
      new PeriodicExportingMetricReader({
        exporter: new OTLPMetricExporter(),
        metricProducers: [rawMetricsProducer],
      }),
    ],
  });

legendecas added the feature-request label May 27, 2022

dyladan added this to the Metrics GA milestone Jun 1, 2022

dyladan added the up-for-grabs Good for taking. Extra help will be provided by maintainers label Jun 1, 2022

legendecas added the waiting-for-spec label Jun 8, 2022

legendecas mentioned this issue Jul 25, 2022

Single-shot metrics #3105

Open

2 tasks

This was referenced Aug 4, 2022

Metrics SDK: define standard circuit breakers for uncollected, infinite cardinality recordings open-telemetry/opentelemetry-specification#1891

Closed

Specify how to stop and restart metrics reporting open-telemetry/opentelemetry-specification#2711

Open

legendecas added the signal:metrics Issues and PRs related to general metrics signal label Sep 1, 2022

legendecas modified the milestones: Metrics GA, Metrics After GA Sep 15, 2022

pichlermarc mentioned this issue Feb 27, 2023

Potential memory leak with the prometheus metrics exporter #3621

Closed

github-actions bot added the stale label Jul 10, 2023

pichlermarc added never-stale and removed stale labels Jul 10, 2023

This was referenced Aug 30, 2023

[sdk-metrics] Implement experimenal cardinality limit specification #4095

Closed

[sdk-metrics] Do not report metric data for previosuly observed attribute sets that were not observed during the current cycle #4096

Open

pichlermarc removed the up-for-grabs Good for taking. Extra help will be provided by maintainers label Oct 15, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Ability to remove/flush unused attributes in metric instruments #2997

Ability to remove/flush unused attributes in metric instruments #2997

legendecas commented May 27, 2022

jmacd commented Jul 27, 2022

legendecas commented Jul 29, 2022

jmacd commented Aug 4, 2022

dyladan commented Sep 14, 2022

legendecas commented Sep 14, 2022

asafm commented Feb 22, 2023

github-actions bot commented Jul 10, 2023

eplightning commented Aug 4, 2023 •

edited

Loading

pichlermarc commented Aug 9, 2023

eplightning commented Aug 11, 2023 •

edited

Loading

eplightning commented Aug 11, 2023 •

edited

Loading

pichlermarc commented Aug 30, 2023

pkeuter commented Mar 28, 2024

enzo-cappa commented Apr 8, 2024

pichlermarc commented Apr 9, 2024

asafm commented Jun 1, 2024

pichlermarc commented Oct 15, 2024

netiperher commented Nov 4, 2024

Ability to remove/flush unused attributes in metric instruments #2997

Ability to remove/flush unused attributes in metric instruments #2997

Comments

legendecas commented May 27, 2022

jmacd commented Jul 27, 2022

legendecas commented Jul 29, 2022

jmacd commented Aug 4, 2022

dyladan commented Sep 14, 2022

legendecas commented Sep 14, 2022

asafm commented Feb 22, 2023

github-actions bot commented Jul 10, 2023

eplightning commented Aug 4, 2023 • edited Loading

pichlermarc commented Aug 9, 2023

eplightning commented Aug 11, 2023 • edited Loading

eplightning commented Aug 11, 2023 • edited Loading

pichlermarc commented Aug 30, 2023

pkeuter commented Mar 28, 2024

enzo-cappa commented Apr 8, 2024

pichlermarc commented Apr 9, 2024

asafm commented Jun 1, 2024

pichlermarc commented Oct 15, 2024

netiperher commented Nov 4, 2024

eplightning commented Aug 4, 2023 •

edited

Loading

eplightning commented Aug 11, 2023 •

edited

Loading

eplightning commented Aug 11, 2023 •

edited

Loading