You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
We currently emit BoltDB timers like TxStats.WriteTime as a sample as if we are sampling each transaction's elapsed commit time.
However bbolt internally is just recording a counter of all all the time spent writing transactions. By recording it as a summary we get something kinda meaningless. The best we can do is take the max of this and then treat it like a counter with something like irate to get the time-per-second spent writing data to disk. But that's confusing and wasteful. We should just emit it as a counter and updated docs to match.
We document it in Vault and Consul docs as "time spent writing in milliseconds" which is likely to confuse anyone who tries to interpret this data!
Slightly less wrong - we document and record several metrics like raft.boltdb.txstats.pageAlloc as gauges even though they have the same semantics as counters. We do at least correctly note that they are a count of all allocs since process start here but it's confusing that we record that as a gauge when the only useful thing to be done with it is treat it like a counter and see the rate at which it's increasing over time! Consider the doc difference between number of spills (counter) and this.
The text was updated successfully, but these errors were encountered:
banks
changed the title
BoltDB timing metrics are counters not timers
BoltDB metrics are mostly counters not timers or gauges
Jul 10, 2023
We currently emit BoltDB timers like
TxStats.WriteTime
as a sample as if we are sampling each transaction's elapsed commit time.However bbolt internally is just recording a counter of all all the time spent writing transactions. By recording it as a summary we get something kinda meaningless. The best we can do is take the max of this and then treat it like a counter with something like
irate
to get the time-per-second spent writing data to disk. But that's confusing and wasteful. We should just emit it as a counter and updated docs to match.We document it in Vault and Consul docs as "time spent writing in milliseconds" which is likely to confuse anyone who tries to interpret this data!
Slightly less wrong - we document and record several metrics like
raft.boltdb.txstats.pageAlloc
as gauges even though they have the same semantics as counters. We do at least correctly note that they are a count of all allocs since process start here but it's confusing that we record that as a gauge when the only useful thing to be done with it is treat it like a counter and see the rate at which it's increasing over time! Consider the doc difference between number of spills (counter) and this.The text was updated successfully, but these errors were encountered: