-
Notifications
You must be signed in to change notification settings - Fork 589
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Replace hdr_hist
with log_hist
#12460
Conversation
7234833
to
0934d1b
Compare
0934d1b
to
0d5e0ec
Compare
PERF_TEST(log_hist, record) { | ||
log_hist_internal h; | ||
perf_tests::start_measuring_time(); | ||
#pragma nounroll |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
btw I think nounroll doesn't actually work on GCC
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's a bummer... is there anything that works on clang and gcc?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we don't use gcc, so this isn't a concern? does it work on clang? can you wrap the loop in do_not_optimize
?
@@ -185,7 +185,7 @@ class log_hist : public log_hist_base { | |||
*/ | |||
void record(uint64_t val) { | |||
_sample_sum += val; | |||
const int i = std::clamp( | |||
const unsigned i = std::clamp( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nice, how did you arrive at this being the cause?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The instruction was sign extending the register holding i
into the register used to index the vector. After I switched it to unsigned clang used the register storing i
to index into the vector without moving it to a different register. I guess the unsigned
was enough to let clang know that i
isn't going to be negative and won't need any sign extension when converting to a 64 bit index into the vector.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
l33t
src/v/utils/log_hist.h
Outdated
seastar::metrics::histogram internal_histogram_logform() const; | ||
|
||
protected: | ||
int _number_of_buckets; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks unused.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Right you are, will remove.
src/v/utils/log_hist.h
Outdated
|
||
protected: | ||
int _number_of_buckets; | ||
uint64_t _first_bucket_upper_bound; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nitpick: This could probably be a template parameter, with explicit template instantiation for the couple of used values.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good point, will switch this over to explicit template instantiation.
src/v/utils/log_hist.h
Outdated
protected: | ||
int _number_of_buckets; | ||
uint64_t _first_bucket_upper_bound; | ||
std::vector<uint64_t> _counts; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nitpick: With some more templating, this could be a std::array
, which takes the runtime of the benchmark from ~128ms to ~110ms.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice! Will make a couple changes to the commit above and add it to the PR with you marked as a co-author.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd be tempted to switch it around a bit:
pick caf95eb834 utils: remove unneeded movslq from record function
pick eab354af54 utils: refactor seastar_histogram_logform to allow for conversions from log_hist_internal to a public seastar hist
f 0c848330c9 utils/log_hist: Use std::array
f ee75bdb32f utils: remove log_hist_base
pick af9b60e904 utils: allow for pausing auto latency measurements in log_hist
pick 68b4bbf212 utils: add histogram benchmark
pick 0d5e0ec328 treewide: replace hdr_hist with log_hist
@ballard26 - i assume this comes with a vtools patch to remove hdrhist - i don't see the cmake changes. |
There is still one outstanding use of |
ee75bdb
to
5b842fa
Compare
…om log_hist_internal to a public seastar hist Co-authored-by: Ben Pope <[email protected]>
This reduces the memory required to store a histogram from kilobytes to 208 bytes. It also speeds up recording to the histogram by 2x.
5b842fa
to
c2959f5
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm
PERF_TEST(log_hist, record) { | ||
log_hist_internal h; | ||
perf_tests::start_measuring_time(); | ||
#pragma nounroll |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we don't use gcc, so this isn't a concern? does it work on clang? can you wrap the loop in do_not_optimize
?
the linter error seems like a false positive i ran format locally on this pr and it seems fine. |
This PR replaces all but one use-case of
hdr_hist
withlog_hist
.src/v/cluster/self_test
can't be converted tolog_hist
yet aslog_hist
doesn't support generating quantiles.This replacement reduces per-histogram memory usage from potentially hundreds kilobytes to 208 bytes. It also reduces the time to record a value to the histogram by about half.
Public metric histograms generated with;
ssx::metrics::report_default_histogram(hdr_hist);
are now replaced with;
which both produce the same Prometheus histogram given that the same values were recorded to each. Hence it is a drop in replacement with no noticeable effects in the metrics.
The replacement for internal histograms, however, does not produce identical Prometheus histograms as what it replaces. In this case this;
is now replaced with;
There are a couple differences between the two Prometheus histograms. The first is that the upper bound for the first bucket will be 8 instead of 10 now. The second is that buckets will now have upper bounds of
2^n -1
, hence comparing[replacement, current]
the buckets are now[8, 10], [16, 20], [32, 41], [64, 83], [128, 167], [256, 335]...
.This shouldn't be a huge problem as Prometheus supports in-frequent bucket bound changes. And since a upgrade necessitates a restart histogram values will be unstable anyway.
Beyond this the PR also refactors
log_hist
to add a couple of things;log_hist_internal
this allows us to share a single internal histogram for both public and internal metricshdr_hist::record
tolog_hist:record
Backports Required
Release Notes