-
Notifications
You must be signed in to change notification settings - Fork 3.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Store level latency metric #7149
Store level latency metric #7149
Conversation
…tion_seconds metric and measure put chunks store call latency
./tools/diff_coverage.sh ../loki-main/test_results.txt test_results.txt ingester,distributor,querier,querier/queryrange,iter,storage,chunkenc,logql,loki Change in test coverage per package. Green indicates 0 or positive change, red indicates that test coverage for a package fell. + ingester 0%
+ distributor 0%
+ querier 0%
+ querier/queryrange 0%
+ iter 0%
+ storage 0%
+ chunkenc 0%
+ logql 0%
+ loki 0% |
./tools/diff_coverage.sh ../loki-main/test_results.txt test_results.txt ingester,distributor,querier,querier/queryrange,iter,storage,chunkenc,logql,loki Change in test coverage per package. Green indicates 0 or positive change, red indicates that test coverage for a package fell. + ingester 0%
+ distributor 0%
+ querier 0%
+ querier/queryrange 0%
+ iter 0%
+ storage 0%
+ chunkenc 0%
+ logql 0%
+ loki 0% |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Besides missing make loki-mixin
all clear and safe to me.
./tools/diff_coverage.sh ../loki-main/test_results.txt test_results.txt ingester,distributor,querier,querier/queryrange,iter,storage,chunkenc,logql,loki Change in test coverage per package. Green indicates 0 or positive change, red indicates that test coverage for a package fell. + ingester 0%
+ distributor 0%
+ querier 0%
+ querier/queryrange 0%
+ iter 0%
+ storage 0%
+ chunkenc 0%
+ logql 0%
+ loki 0% |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, but if this metric is just for index stores then I'd suggest we change the name to not confuse users (like myself initially) that may think this metric measures all store requests, including chunk stores.
Namespace: "loki", | ||
Name: "index_request_duration_seconds", | ||
Help: "Time (in seconds) spent in serving index query requests", | ||
Name: "store_request_duration_seconds", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The changes from #6880 never made it into a release, correct?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this metric only for index stores?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The changes from #6880 never made it into a release, correct?
yes, latest release 2.6.1
doesn't have it.
Is this metric only for index stores?
All the read requests here are purely index queries, while the write request i.e Put
and PutOne
, uploads the chunk and indexes it as well. It would be hard to purely track just the index write request without refactoring the code.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah I think having a metric named loki_store_request_duration_seconds
that doesn't include chunk store reads but only includes writes will be quite confusing.
How hard do you anticipate the refactoring would be? I want to try optimise for clarity as much as possible given that our system is already very complex.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I looked at the code, and it seems quite complex and would take time. The other option would be to keep the old metric loki_index_request_duration_seconds
which tracks just the index requests for now, and update just the reads dashboards until we refactor the code. I will keep the refactoring work on my to-do list until then.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The other option would be to keep the old metric loki_index_request_duration_seconds which tracks just the index requests for now, and update just the reads dashboards until we refactor the code. I will keep the refactoring work on my to-do list until then.
Thanks Sandeep, I think that might work out better for our users - if you don't mind?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@dannykopping I gave it a try. Here is the PR #7154
Please let me know what do you think if you get a chance to have a look.
I can't think of a better name. We will have to refactor the code to measure just the index latency on the chunk flush operations, which would let us rename the metric to |
Closing this in favour of #7163 |
What this PR does / why we need it:
In Reads/Writes dashboard, we have panels for plotting index QPS and latency, but they are specific to
boltdb-shipper
andbigtable
. Each index store exposes its metric, but we can't have all the supported store types in those dashboards.We have added experimental support for
tsdb
which is going to be the one we will recommend when it becomes production ready so it is going to be yet another store which a lot of people would be interested in monitoring.To avoid having latency and qps panel for each index type, I added
loki_index_request_duration_seconds
metric in PR #6880 but it was only measuring index latencies. In this PR I am renaming it toloki_store_request_duration_seconds
to measure overall store request latencies includingPUT
chunks call.I have added this metric in a separate panel to Reads / Writes dashboards. We should drop store-specific panels in the next major release. Renaming this metric should not be a problem since we have not done a release since I added it.