Store level latency metric #7149

sandeepsukhani · 2022-09-14T06:58:00Z

What this PR does / why we need it:
In Reads/Writes dashboard, we have panels for plotting index QPS and latency, but they are specific to boltdb-shipper and bigtable. Each index store exposes its metric, but we can't have all the supported store types in those dashboards.
We have added experimental support for tsdb which is going to be the one we will recommend when it becomes production ready so it is going to be yet another store which a lot of people would be interested in monitoring.

To avoid having latency and qps panel for each index type, I added loki_index_request_duration_seconds metric in PR #6880 but it was only measuring index latencies. In this PR I am renaming it to loki_store_request_duration_seconds to measure overall store request latencies including PUT chunks call.

I have added this metric in a separate panel to Reads / Writes dashboards. We should drop store-specific panels in the next major release. Renaming this metric should not be a problem since we have not done a release since I added it.

…tion_seconds metric and measure put chunks store call latency

…s dashboards

grafanabot · 2022-09-14T07:02:41Z

./tools/diff_coverage.sh ../loki-main/test_results.txt test_results.txt ingester,distributor,querier,querier/queryrange,iter,storage,chunkenc,logql,loki

Change in test coverage per package. Green indicates 0 or positive change, red indicates that test coverage for a package fell.

+           ingester	0%
+        distributor	0%
+            querier	0%
+ querier/queryrange	0%
+               iter	0%
+            storage	0%
+           chunkenc	0%
+              logql	0%
+               loki	0%

grafanabot · 2022-09-14T07:11:30Z

./tools/diff_coverage.sh ../loki-main/test_results.txt test_results.txt ingester,distributor,querier,querier/queryrange,iter,storage,chunkenc,logql,loki

Change in test coverage per package. Green indicates 0 or positive change, red indicates that test coverage for a package fell.

+           ingester	0%
+        distributor	0%
+            querier	0%
+ querier/queryrange	0%
+               iter	0%
+            storage	0%
+           chunkenc	0%
+              logql	0%
+               loki	0%

periklis

Besides missing make loki-mixin all clear and safe to me.

grafanabot · 2022-09-14T07:26:23Z

./tools/diff_coverage.sh ../loki-main/test_results.txt test_results.txt ingester,distributor,querier,querier/queryrange,iter,storage,chunkenc,logql,loki

Change in test coverage per package. Green indicates 0 or positive change, red indicates that test coverage for a package fell.

+           ingester	0%
+        distributor	0%
+            querier	0%
+ querier/queryrange	0%
+               iter	0%
+            storage	0%
+           chunkenc	0%
+              logql	0%
+               loki	0%

dannykopping

LGTM, but if this metric is just for index stores then I'd suggest we change the name to not confuse users (like myself initially) that may think this metric measures all store requests, including chunk stores.

pkg/storage/stores/composite_store.go

dannykopping · 2022-09-14T07:33:20Z

pkg/storage/stores/metrics.go

 			Namespace: "loki",
-			Name:      "index_request_duration_seconds",
-			Help:      "Time (in seconds) spent in serving index query requests",
+			Name:      "store_request_duration_seconds",


The changes from #6880 never made it into a release, correct?

Is this metric only for index stores?

The changes from #6880 never made it into a release, correct?

yes, latest release 2.6.1 doesn't have it.

Is this metric only for index stores?

All the read requests here are purely index queries, while the write request i.e Put and PutOne, uploads the chunk and indexes it as well. It would be hard to purely track just the index write request without refactoring the code.

Yeah I think having a metric named loki_store_request_duration_seconds that doesn't include chunk store reads but only includes writes will be quite confusing.

How hard do you anticipate the refactoring would be? I want to try optimise for clarity as much as possible given that our system is already very complex.

I looked at the code, and it seems quite complex and would take time. The other option would be to keep the old metric loki_index_request_duration_seconds which tracks just the index requests for now, and update just the reads dashboards until we refactor the code. I will keep the refactoring work on my to-do list until then.

The other option would be to keep the old metric loki_index_request_duration_seconds which tracks just the index requests for now, and update just the reads dashboards until we refactor the code. I will keep the refactoring work on my to-do list until then.

Thanks Sandeep, I think that might work out better for our users - if you don't mind?

@dannykopping I gave it a try. Here is the PR #7154
Please let me know what do you think if you get a chance to have a look.

production/loki-mixin-compiled/dashboards/loki-reads.json

production/loki-mixin-compiled/dashboards/loki-writes.json

sandeepsukhani · 2022-09-14T07:51:59Z

LGTM, but if this metric is just for index stores then I'd suggest we change the name to not confuse users (like myself initially) that may think this metric measures all store requests, including chunk stores.

I can't think of a better name. We will have to refactor the code to measure just the index latency on the chunk flush operations, which would let us rename the metric to loki_index_request_duration_seconds latencies. I think it might be tricky to do it, so I can give it a try in a separate PR.

sandeepsukhani · 2022-09-15T03:50:37Z

Closing this in favour of #7163

sandeepsukhani added 2 commits September 14, 2022 11:58

Rename loki_index_request_duration_seconds to loki_store_request_dura…

7091993

…tion_seconds metric and measure put chunks store call latency

Add loki_store_request_duration_seconds metric to the reads and write…

e57d4c9

…s dashboards

sandeepsukhani requested a review from a team as a code owner September 14, 2022 06:58

pull-request-size bot added the size/M label Sep 14, 2022

lint jsonnet

477d6e3

periklis approved these changes Sep 14, 2022

View reviewed changes

make loki-mixin

1cffed0

pull-request-size bot added size/L and removed size/M labels Sep 14, 2022

dannykopping reviewed Sep 14, 2022

View reviewed changes

sandeepsukhani closed this Sep 15, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Store level latency metric #7149

Store level latency metric #7149

sandeepsukhani commented Sep 14, 2022 •

edited

Loading

grafanabot commented Sep 14, 2022

grafanabot commented Sep 14, 2022

periklis left a comment

grafanabot commented Sep 14, 2022

dannykopping left a comment

dannykopping Sep 14, 2022

dannykopping Sep 14, 2022

sandeepsukhani Sep 14, 2022 •

edited

Loading

dannykopping Sep 14, 2022

sandeepsukhani Sep 14, 2022

dannykopping Sep 14, 2022

sandeepsukhani Sep 14, 2022

sandeepsukhani commented Sep 14, 2022

sandeepsukhani commented Sep 15, 2022

Store level latency metric #7149

Store level latency metric #7149

Conversation

sandeepsukhani commented Sep 14, 2022 • edited Loading

grafanabot commented Sep 14, 2022

grafanabot commented Sep 14, 2022

periklis left a comment

Choose a reason for hiding this comment

grafanabot commented Sep 14, 2022

dannykopping left a comment

Choose a reason for hiding this comment

dannykopping Sep 14, 2022

Choose a reason for hiding this comment

dannykopping Sep 14, 2022

Choose a reason for hiding this comment

sandeepsukhani Sep 14, 2022 • edited Loading

Choose a reason for hiding this comment

dannykopping Sep 14, 2022

Choose a reason for hiding this comment

sandeepsukhani Sep 14, 2022

Choose a reason for hiding this comment

dannykopping Sep 14, 2022

Choose a reason for hiding this comment

sandeepsukhani Sep 14, 2022

Choose a reason for hiding this comment

sandeepsukhani commented Sep 14, 2022

sandeepsukhani commented Sep 15, 2022

sandeepsukhani commented Sep 14, 2022 •

edited

Loading

sandeepsukhani Sep 14, 2022 •

edited

Loading