feat: add opentelemetry counters for sent and acked messages #2532

agrawal-siddharth · 2024-06-18T02:23:14Z

Also add network latency, queue length and error counts.

The metrics (other than error counts) are now reported periodically, every second.

yirutang · 2024-06-21T00:09:39Z

...oud-bigquerystorage/src/main/java/com/google/cloud/bigquery/storage/v1/ConnectionWorker.java

  private Attributes telemetryAttributes;
+  private long incomingRequestCountBuffered;


nit, I would group all of them under telemetryMetrics.

yirutang · 2024-06-21T00:11:21Z

...oud-bigquerystorage/src/main/java/com/google/cloud/bigquery/storage/v1/ConnectionWorker.java

+                "Reports time taken in milliseconds for a response to arrive once a message has been sent over the network.")
+            .setExplicitBucketBoundariesAdvice(METRICS_LATENCY_BUCKETS)
+            .build();
+    instrumentConnectionEstablishCount =


I believe this can be derived from network_response_latency, if you put connection_id as the metrics field.

I have added writer_id as an attribute. I still have this metric, however, as it directly provides information about establishing a connection.

yirutang · 2024-06-21T00:11:50Z

...oud-bigquerystorage/src/main/java/com/google/cloud/bigquery/storage/v1/ConnectionWorker.java

+              measurement.record(length, getTelemetryAttributes());
+            });
+    writeMeter
+        .gaugeBuilder("inflight_queue_length")


It will need connection_id as metrics field, otherwise it doesn't make too much sense?

I have added writer_id as an attribute.

yirutang · 2024-06-21T00:12:57Z

...oud-bigquerystorage/src/main/java/com/google/cloud/bigquery/storage/v1/ConnectionWorker.java

+            .build();
+    instrumentSentRequestRows =
+        writeMeter
+            .counterBuilder("append_rows_sent")


I think we should maintain less metrics. Can we just add a "result" field to append_requests/rows/bytes?

Done. I have removed the following metrics: append_requests, append_request_bytes, append_rows, waiting_queue_length, connection_retry_count, append_requests_error, append_request_bytes_error, append_rows_error.

I now use the "error_code" attribute on each of the following metrics: append_requests_acked, append_request_bytes_acked, append_rows_acked, connection_end_count.

shollyman · 2024-07-08T23:49:22Z

...oud-bigquerystorage/src/main/java/com/google/cloud/bigquery/storage/v1/ConnectionWorker.java

+  private LongCounter instrumentErrorRequestCount;
+  private LongCounter instrumentErrorRequestSize;
+  private LongCounter instrumentErrorRequestRows;
+  private static final List<Long> METRICS_LATENCY_BUCKETS =


are these millis/micros/nanos?

Renamed this to METRICS_MILLISECONDS_LATENCY_BUCKETS.

shollyman · 2024-07-09T00:01:17Z

...oud-bigquerystorage/src/main/java/com/google/cloud/bigquery/storage/v1/ConnectionWorker.java

  @VisibleForTesting
  Attributes getTelemetryAttributes() {
    return telemetryAttributes;
  }

+  private void periodicallyReportOpenTelemetryMetrics() {
+    Duration durationSinceLastRefresh = Duration.between(instantLastSentMetrics, Instant.now());
+    if (durationSinceLastRefresh.compareTo(METRICS_UPDATE_INTERVAL) > 0) {


Are metrics updates really that costly on the producer side that you don't just update metrics at time of event?

In opencensus, flushing/updates were mostly an exporter concern.

I am testing using an exporter to Google Cloud Monitoring. I encountered "exceeded max frequency" errors with this exporter. To resolve this issue, I have switched to updating the instruments only once every second, which I believe should be sufficient for our needs.

Upon further inspection, I narrowed down the issue I was seeing to the frequency of the exporter. I have restored all metrics to be instrumented in real time.

yirutang · 2024-07-23T20:33:38Z

...oud-bigquerystorage/src/main/java/com/google/cloud/bigquery/storage/v1/ConnectionWorker.java

-  private LongCounter instrumentIncomingRequestSize;
-  private LongCounter instrumentIncomingRequestRows;
+  private static final List<Long> METRICS_MILLISECONDS_LATENCY_BUCKETS =
+      ImmutableList.of(0L, 50L, 100L, 500L, 1000L, 5000L, 10000L, 20000L, 30000L, 60000L, 120000L);


@GaoleMeng Do these buckets look good to you? Do we need a bucket at 50L? Maybe add a 2000L?

this is too sparse, in backend we are using power of 1.5 bucket, that means it's
1 1.5 1.5^2 1.5^3.... millisecond

We were once using power of 4, but found that was too sparse, so we reduced it to power of 1.5
could we do similar bucketing here?

The power of 1.5 sequence looks like this:

1, 2, 3, 5, 8, 11, 17, 26, 38, 58, 86, 130, 195, 292, 438, 657, 985, 1478, 2217, 3325, 4988, 7482, 11223, 16834, 25251, 37877, 56815, 85223, 127834, 191751, 287627, 431440, 647160, 970740, 1456110

Would it be useful to provide all of these buckets? Alternatively, we could just provide every other bucket, so the list looks like this:

1, 3, 8, 17, 38, 86, 195, 438, 985, 2217, 4988, 11223, 25251, 56815, 127834, 287627, 647160, 1456110

yirutang · 2024-07-23T22:20:23Z

...oud-bigquerystorage/src/main/java/com/google/cloud/bigquery/storage/v1/ConnectionWorker.java

@@ -333,6 +351,7 @@ private Attributes buildOpenTelemetryAttributes() {
    if (!tableName.isEmpty()) {
      builder.put(telemetryKeyTableId, tableName);
    }
+    builder.put(telemetryKeyWriterId, writerId);


Add some comment to buildOpenTelemetryAttributes, what kind of attributes it is building and does this apply to all metrics?

yirutang · 2024-07-23T23:26:17Z

...oud-bigquerystorage/src/main/java/com/google/cloud/bigquery/storage/v1/ConnectionWorker.java

+      ImmutableList.of(0L, 50L, 100L, 500L, 1000L, 5000L, 10000L, 20000L, 30000L, 60000L, 120000L);
+
+  private static final class OpenTelemetryMetrics {
+    private LongCounter instrumentSentRequestCount;


Discussed with Gaole, we think that maybe the Sent and Ack won't make a significant difference. Let's just record Ack for now for simplicity.

yirutang · 2024-07-23T23:27:02Z

...oud-bigquerystorage/src/main/java/com/google/cloud/bigquery/storage/v1/ConnectionWorker.java

        writeMeter
-            .counterBuilder("append_requests")
-            .setDescription("Counts number of incoming requests")
+            .counterBuilder("append_requests_acked")


This can be a TODO, I am wondering if it is possible to have a Retry attribute to the metric.

yirutang

LGTM, please address the bucket length issue.

yirutang · 2024-07-24T21:03:16Z

...oud-bigquerystorage/src/main/java/com/google/cloud/bigquery/storage/v1/ConnectionWorker.java

+  // Buckets are based on a list of 1.5 ^ n
+  private static final List<Long> METRICS_MILLISECONDS_LATENCY_BUCKETS =
+      ImmutableList.of(
+          1L, 3L, 8L, 17L, 38L, 86L, 195L, 438L, 985L, 2217L, 4988L, 11223L, 25251L, 56815L,


Do we need to have 1L, 3L and 8L?

I have removed these. I now start at 0 (as the lowest bucket boundary) and end at 647160 which represents about 10 minutes.

Also add network latency, queue length and error counts. The metrics (other than error counts) are now reported periodically, every second.

…pis#2532) Also add network latency, queue length and error counts. The metrics (other than error counts) are now reported periodically, every second.

agrawal-siddharth requested a review from a team as a code owner June 18, 2024 02:23

agrawal-siddharth requested a review from alvarowolfx June 18, 2024 02:23

product-auto-label bot added size: m Pull request size is medium. api: bigquerystorage Issues related to the googleapis/java-bigquerystorage API. labels Jun 18, 2024

agrawal-siddharth requested review from yirutang and shollyman June 18, 2024 02:23

yirutang reviewed Jun 21, 2024

View reviewed changes

yirutang requested a review from GaoleMeng June 21, 2024 00:13

shollyman reviewed Jul 9, 2024

View reviewed changes

agrawal-siddharth force-pushed the openMetrics2 branch 2 times, most recently from 3caa290 to b6b30cf Compare July 19, 2024 15:36

product-auto-label bot added size: l Pull request size is large. and removed size: m Pull request size is medium. labels Jul 19, 2024

agrawal-siddharth force-pushed the openMetrics2 branch from b6b30cf to 796ae3e Compare July 19, 2024 20:42

product-auto-label bot added size: m Pull request size is medium. and removed size: l Pull request size is large. labels Jul 19, 2024

agrawal-siddharth force-pushed the openMetrics2 branch 3 times, most recently from 0b87d97 to ad164fb Compare July 22, 2024 16:58

yirutang reviewed Jul 23, 2024

View reviewed changes

agrawal-siddharth force-pushed the openMetrics2 branch from ad164fb to 73c9193 Compare July 24, 2024 00:27

yirutang approved these changes Jul 24, 2024

View reviewed changes

agrawal-siddharth force-pushed the openMetrics2 branch from 73c9193 to 0572a47 Compare July 24, 2024 20:50

yirutang reviewed Jul 24, 2024

View reviewed changes

feat: add opentelemetry counters for sent and acked messages

8ac0ed2

Also add network latency, queue length and error counts. The metrics (other than error counts) are now reported periodically, every second.

agrawal-siddharth force-pushed the openMetrics2 branch from 0572a47 to 8ac0ed2 Compare July 24, 2024 21:21

agrawal-siddharth merged commit 2fc5c55 into googleapis:main Jul 24, 2024
19 checks passed

release-please bot mentioned this pull request Jul 24, 2024

chore(main): release 3.8.0 #2576

Merged

agrawal-siddharth deleted the openMetrics2 branch August 14, 2024 16:34

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: add opentelemetry counters for sent and acked messages #2532

feat: add opentelemetry counters for sent and acked messages #2532

agrawal-siddharth commented Jun 18, 2024

yirutang Jun 21, 2024

agrawal-siddharth Jul 16, 2024

yirutang Jun 21, 2024

agrawal-siddharth Jul 19, 2024

yirutang Jun 21, 2024

agrawal-siddharth Jul 19, 2024

yirutang Jun 21, 2024

agrawal-siddharth Jul 20, 2024

shollyman Jul 8, 2024

agrawal-siddharth Jul 19, 2024

shollyman Jul 9, 2024

agrawal-siddharth Jul 19, 2024

agrawal-siddharth Jul 20, 2024

yirutang Jul 23, 2024

GaoleMeng Jul 24, 2024

agrawal-siddharth Jul 24, 2024

yirutang Jul 23, 2024

agrawal-siddharth Jul 24, 2024

yirutang Jul 23, 2024

agrawal-siddharth Jul 24, 2024

yirutang Jul 23, 2024

agrawal-siddharth Jul 24, 2024

yirutang left a comment

yirutang Jul 24, 2024

agrawal-siddharth Jul 24, 2024

		private Attributes telemetryAttributes;
		private long incomingRequestCountBuffered;

feat: add opentelemetry counters for sent and acked messages #2532

feat: add opentelemetry counters for sent and acked messages #2532

Conversation

agrawal-siddharth commented Jun 18, 2024

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

yirutang left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment