PoC: Assess the impact of direct Prometheus queries on reconciliation duration #526

skhalash · 2023-11-08T17:16:07Z

Description

This effort is a follow-up to the work done in #516 we need to understand the impact of direct Prometheus queries on reconciliation duration.

Acceptance Criteria

Build a PoC that setup a simple Prometheus alongside the Telemetry Manager
Develop a set of PromQL queries based on OTel Collector metrics to detect specific scenarios, such as backend connectivity issues, backpressure/throttling due to backend overload, and reaching the ingestion limit of a pipeline.
Check the different patterns we identified in the ADR for impact on reconcilation, especially the duration for one reconcilation
Document the decisions in the ADR as part of the story

Related Issues
#425

skhalash · 2024-01-10T09:09:06Z

Selected Queries:

rate(otelcol_exporter_send_failed_metric_points[5m]) > 0
rate(otelcol_exporter_enqueue_failed_metric_points[5m]) > 0
(otelcol_exporter_queue_size/otelcol_exporter_queue_capacity)*100 > 90

rate(otelcol_processor_dropped_metric_points[5m]) > 0
rate(otelcol_processor_refused_metric_points[5m]) > 0

rate(otelcol_receiver_refused_metric_points[5m]) > 0

Prometheus Query Execution Time

Testing individual Prometheus queries has demonstrated that the execution time on k3d is negligible, typically in the range of 10s of milliseconds. In a real cluster environment, the execution time might be somewhat higher, b ut still primarily influenced by network latency.

skhalash · 2024-01-12T11:58:55Z

To wrap it up, the consensus was to implement the integration of Prometheus and Telemetry Manager using the Alerting approach.

ADR
PoC

skhalash changed the title ~~PoC: Assess the Impact of direct Prometheus queries on reconciliation duration~~ PoC: Assess the impact of direct Prometheus queries on reconciliation duration Nov 8, 2023

a-thaler mentioned this issue Nov 17, 2023

Advanced pipeline status based on data flow #425

Closed

18 tasks

a-thaler added the area/metrics MetricPipeline label Jan 8, 2024

skhalash self-assigned this Jan 8, 2024

skhalash mentioned this issue Jan 11, 2024

docs: Add Integrate Prometheus with Telemetry Manager using Alerting ADR #703

Merged

8 tasks

skhalash added this to the 1.7.0 milestone Jan 11, 2024

skhalash added area/traces TracePipeline kind/decision Marks a decision document labels Jan 11, 2024

skhalash closed this as completed Jan 12, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

PoC: Assess the impact of direct Prometheus queries on reconciliation duration #526

PoC: Assess the impact of direct Prometheus queries on reconciliation duration #526

skhalash commented Nov 8, 2023 •

edited by a-thaler

Loading

skhalash commented Jan 10, 2024 •

edited

Loading

skhalash commented Jan 12, 2024

PoC: Assess the impact of direct Prometheus queries on reconciliation duration #526

PoC: Assess the impact of direct Prometheus queries on reconciliation duration #526

Comments

skhalash commented Nov 8, 2023 • edited by a-thaler Loading

skhalash commented Jan 10, 2024 • edited Loading

Selected Queries:

Prometheus Query Execution Time

skhalash commented Jan 12, 2024

skhalash commented Nov 8, 2023 •

edited by a-thaler

Loading

skhalash commented Jan 10, 2024 •

edited

Loading