-
Notifications
You must be signed in to change notification settings - Fork 3.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
catalog: impossible to read metrics names for latencies #64373
Labels
C-bug
Code not up to spec/doc, specs & docs deemed correct. Solution expected to change code/behavior.
Comments
tbg
added
the
C-bug
Code not up to spec/doc, specs & docs deemed correct. Solution expected to change code/behavior.
label
Apr 29, 2021
tbg
added a commit
to tbg/cockroach
that referenced
this issue
Apr 29, 2021
One of our observability achilles heels has always been that time series are not included in `debug.zip`. But let's aim lower, they're also not included in roachtest artifacts. This PR tries to address at least the latter, and opens up a potential interim solution for the former. We actually have had the ability to export time series from a cluster for quite some time, and have recently improved it in cockroachdb#57481. However, the real problem is visualizing the data. We have something that allows you to [explore] a dump, but it's unusable until we also do cockroachdb#54178, which isn't going to happen anytime soon. For better or worse, in the short term, the most attractive way to get the data visualized is to dump raw KV pairs into a local cockroachdb instance and open the DB console. That's what this commit achieves. Here's a "demo": ``` ./cockroach debug tsdump --format raw --insecure \ --host $(roachprod ip --external tobias-ui: 1) > ts.gob ``` ``` COCKROACH_DEBUG_TS_IMPORT_FILE=ts.gob ./cockroach start-single-node --insecure ``` Open the UI and browse: ![image](https://user-images.githubusercontent.com/5076964/116429966-ae470880-a846-11eb-8129-13885bcbab6e.png) We anticipate trying this out in `roachtest`. It's unclear if it will ever be included in `debug.zip`, but it seems worthwhile given that our real solution - the observability server - is a ways out. There are gotchas. - the code behind COCKROACH_DEBUG_TS_IMPORT_FILE has to make all kinds of assumptions about which node each store belongs to and in practice this means that unless each node has exactly one store, and the IDs match, the operation will either fail or produce an incorrect mapping, silently. What's worse, if this conflicts with the actual running node, the running node's opinion will win (so things might look at at first, then flip around, not sure). - some metrics are missing, see cockroachdb#64373. I think everything except the latency metrics are there, such as `sql.txn.latency-*`. - to get the console to even realize these metrics are there, we have to write fake NodeStatuses. Definitely don't use this on any cluster you care about. Given those, the only situation in which I will personally use this is that of a vanilla roachtest, where NodeIDs and StoreIDs line up nicely. [dump]: https://github.com/cockroachdb/cockroach/blob/master/scripts/localmetrics/README.md Release note: None
tbg
added a commit
to tbg/cockroach
that referenced
this issue
May 7, 2021
One of our observability achilles heels has always been that time series are not included in `debug.zip`. But let's aim lower, they're also not included in roachtest artifacts. This PR tries to address at least the latter, and opens up a potential interim solution for the former. We actually have had the ability to export time series from a cluster for quite some time, and have recently improved it in cockroachdb#57481. However, the real problem is visualizing the data. We have something that allows you to [explore] a dump, but it's unusable until we also do cockroachdb#54178, which isn't going to happen anytime soon. For better or worse, in the short term, the most attractive way to get the data visualized is to dump raw KV pairs into a local cockroachdb instance and open the DB console. That's what this commit achieves. Here's a "demo": ``` ./cockroach debug tsdump --format raw --insecure \ --host $(roachprod ip --external tobias-ui: 1) > ts.gob ``` ``` COCKROACH_DEBUG_TS_IMPORT_FILE=ts.gob ./cockroach start-single-node --insecure ``` Open the UI and browse: ![image](https://user-images.githubusercontent.com/5076964/116429966-ae470880-a846-11eb-8129-13885bcbab6e.png) We anticipate trying this out in `roachtest`. It's unclear if it will ever be included in `debug.zip`, but it seems worthwhile given that our real solution - the observability server - is a ways out. There are gotchas. - the code behind COCKROACH_DEBUG_TS_IMPORT_FILE has to make all kinds of assumptions about which node each store belongs to and in practice this means that unless each node has exactly one store, and the IDs match, the operation will either fail or produce an incorrect mapping, silently. What's worse, if this conflicts with the actual running node, the running node's opinion will win (so things might look at at first, then flip around, not sure). We simply limit to `start-single-node` on the first start, with a single store, to minimize confusion. Any other configuration will either ignore the var or error out outright. - some metrics are missing, see cockroachdb#64373. I think everything except the latency metrics are there, such as `sql.txn.latency-*`. - to get the console to even realize these metrics are there, we have to write fake NodeStatuses. Definitely don't use this on any cluster you care about. Given those, the only situation in which I will personally use this is that of a vanilla roachtest, where NodeIDs and StoreIDs line up nicely. [dump]: https://github.com/cockroachdb/cockroach/blob/master/scripts/localmetrics/README.md Release note: None
tbg
added a commit
to tbg/cockroach
that referenced
this issue
May 7, 2021
One of our observability achilles heels has always been that time series are not included in `debug.zip`. But let's aim lower, they're also not included in roachtest artifacts. This PR tries to address at least the latter, and opens up a potential interim solution for the former. We actually have had the ability to export time series from a cluster for quite some time, and have recently improved it in cockroachdb#57481. However, the real problem is visualizing the data. We have something that allows you to [explore] a dump, but it's unusable until we also do cockroachdb#54178, which isn't going to happen anytime soon. For better or worse, in the short term, the most attractive way to get the data visualized is to dump raw KV pairs into a local cockroachdb instance and open the DB console. That's what this commit achieves. Here's a "demo": ``` ./cockroach debug tsdump --format raw --insecure \ --host $(roachprod ip --external tobias-ui: 1) > ts.gob ``` ``` COCKROACH_DEBUG_TS_IMPORT_FILE=ts.gob ./cockroach start-single-node --insecure ``` Open the UI and browse: ![image](https://user-images.githubusercontent.com/5076964/116429966-ae470880-a846-11eb-8129-13885bcbab6e.png) We anticipate trying this out in `roachtest`. It's unclear if it will ever be included in `debug.zip`, but it seems worthwhile given that our real solution - the observability server - is a ways out. There are gotchas. - the code behind COCKROACH_DEBUG_TS_IMPORT_FILE has to make all kinds of assumptions about which node each store belongs to and in practice this means that unless each node has exactly one store, and the IDs match, the operation will either fail or produce an incorrect mapping, silently. What's worse, if this conflicts with the actual running node, the running node's opinion will win (so things might look at at first, then flip around, not sure). We simply limit to `start-single-node` on the first start, with a single store, to minimize confusion. Any other configuration will either ignore the var or error out outright. - some metrics are missing, see cockroachdb#64373. I think everything except the latency metrics are there, such as `sql.txn.latency-*`. - to get the console to even realize these metrics are there, we have to write fake NodeStatuses. Definitely don't use this on any cluster you care about. Given those, the only situation in which I will personally use this is that of a vanilla roachtest, where NodeIDs and StoreIDs line up nicely. [dump]: https://github.com/cockroachdb/cockroach/blob/master/scripts/localmetrics/README.md Release note: None
tbg
added a commit
to tbg/cockroach
that referenced
this issue
May 8, 2021
One of our observability achilles heels has always been that time series are not included in `debug.zip`. But let's aim lower, they're also not included in roachtest artifacts. This PR tries to address at least the latter, and opens up a potential interim solution for the former. We actually have had the ability to export time series from a cluster for quite some time, and have recently improved it in cockroachdb#57481. However, the real problem is visualizing the data. We have something that allows you to [explore] a dump, but it's unusable until we also do cockroachdb#54178, which isn't going to happen anytime soon. For better or worse, in the short term, the most attractive way to get the data visualized is to dump raw KV pairs into a local cockroachdb instance and open the DB console. That's what this commit achieves. Here's a "demo": ``` ./cockroach debug tsdump --format raw --insecure \ --host $(roachprod ip --external tobias-ui: 1) > ts.gob ``` ``` COCKROACH_DEBUG_TS_IMPORT_FILE=ts.gob ./cockroach start-single-node --insecure ``` Open the UI and browse: ![image](https://user-images.githubusercontent.com/5076964/116429966-ae470880-a846-11eb-8129-13885bcbab6e.png) We anticipate trying this out in `roachtest`. It's unclear if it will ever be included in `debug.zip`, but it seems worthwhile given that our real solution - the observability server - is a ways out. There are gotchas. - the code behind COCKROACH_DEBUG_TS_IMPORT_FILE has to make all kinds of assumptions about which node each store belongs to and in practice this means that unless each node has exactly one store, and the IDs match, the operation will either fail or produce an incorrect mapping, silently. What's worse, if this conflicts with the actual running node, the running node's opinion will win (so things might look at at first, then flip around, not sure). We simply limit to `start-single-node` on the first start, with a single store, to minimize confusion. Any other configuration will either ignore the var or error out outright. - some metrics are missing, see cockroachdb#64373. I think everything except the latency metrics are there, such as `sql.txn.latency-*`. - to get the console to even realize these metrics are there, we have to write fake NodeStatuses. Definitely don't use this on any cluster you care about. Given those, the only situation in which I will personally use this is that of a vanilla roachtest, where NodeIDs and StoreIDs line up nicely. [dump]: https://github.com/cockroachdb/cockroach/blob/master/scripts/localmetrics/README.md Release note: None
craig bot
pushed a commit
that referenced
this issue
May 8, 2021
64329: cli,ts: allow (hacky) visualization of timeseries dumps r=knz a=tbg One of our observability achilles heels has always been that time series are not included in `debug.zip`. But let's aim lower, they're also not included in roachtest artifacts. This PR tries to address at least the latter, and opens up a potential interim solution for the former. We actually have had the ability to export time series from a cluster for quite some time, and have recently improved it in #57481. However, the real problem is visualizing the data. We have something that allows you to [explore] a dump, but it's unusable until we also do #54178, which isn't going to happen anytime soon. For better or worse, in the short term, the most attractive way to get the data visualized is to dump raw KV pairs into a local cockroachdb instance and open the DB console. That's what this commit achieves. Here's a "demo": ``` ./cockroach debug tsdump --format raw --insecure \ --host $(roachprod ip --external tobias-ui: 1) > ts.gob ``` ``` COCKROACH_DEBUG_TS_IMPORT_FILE=ts.gob ./cockroach start-single-node --insecure ``` Open the UI and browse: ![image](https://user-images.githubusercontent.com/5076964/116429966-ae470880-a846-11eb-8129-13885bcbab6e.png) We anticipate trying this out in `roachtest`. It's unclear if it will ever be included in `debug.zip`, but it seems worthwhile given that our real solution - the observability server - is a ways out. There are gotchas. - the code behind COCKROACH_DEBUG_TS_IMPORT_FILE has to make all kinds of assumptions about which node each store belongs to and in practice this means that unless each node has exactly one store, and the IDs match, the operation will either fail or produce an incorrect mapping, silently. What's worse, if this conflicts with the actual running node, the running node's opinion will win (so things might look at at first, then flip around, not sure). - some metrics are missing, see #64373. I think everything except the latency metrics are there, such as `sql.txn.latency-*`. - to get the console to even realize these metrics are there, we have to write fake NodeStatuses. Definitely don't use this on any cluster you care about. Given those, the only situation in which I will personally use this is that of a vanilla roachtest, where NodeIDs and StoreIDs line up nicely. [explore]: https://github.com/cockroachdb/cockroach/blob/master/scripts/localmetrics/README.md Release note: None Co-authored-by: Tobias Grieger <[email protected]>
tbg
added a commit
to tbg/cockroach
that referenced
this issue
Jun 23, 2021
One of our observability achilles heels has always been that time series are not included in `debug.zip`. But let's aim lower, they're also not included in roachtest artifacts. This PR tries to address at least the latter, and opens up a potential interim solution for the former. We actually have had the ability to export time series from a cluster for quite some time, and have recently improved it in cockroachdb#57481. However, the real problem is visualizing the data. We have something that allows you to [explore] a dump, but it's unusable until we also do cockroachdb#54178, which isn't going to happen anytime soon. For better or worse, in the short term, the most attractive way to get the data visualized is to dump raw KV pairs into a local cockroachdb instance and open the DB console. That's what this commit achieves. Here's a "demo": ``` ./cockroach debug tsdump --format raw --insecure \ --host $(roachprod ip --external tobias-ui: 1) > ts.gob ``` ``` COCKROACH_DEBUG_TS_IMPORT_FILE=ts.gob ./cockroach start-single-node --insecure ``` Open the UI and browse: ![image](https://user-images.githubusercontent.com/5076964/116429966-ae470880-a846-11eb-8129-13885bcbab6e.png) We anticipate trying this out in `roachtest`. It's unclear if it will ever be included in `debug.zip`, but it seems worthwhile given that our real solution - the observability server - is a ways out. There are gotchas. - the code behind COCKROACH_DEBUG_TS_IMPORT_FILE has to make all kinds of assumptions about which node each store belongs to and in practice this means that unless each node has exactly one store, and the IDs match, the operation will either fail or produce an incorrect mapping, silently. What's worse, if this conflicts with the actual running node, the running node's opinion will win (so things might look at at first, then flip around, not sure). We simply limit to `start-single-node` on the first start, with a single store, to minimize confusion. Any other configuration will either ignore the var or error out outright. - some metrics are missing, see cockroachdb#64373. I think everything except the latency metrics are there, such as `sql.txn.latency-*`. - to get the console to even realize these metrics are there, we have to write fake NodeStatuses. Definitely don't use this on any cluster you care about. Given those, the only situation in which I will personally use this is that of a vanilla roachtest, where NodeIDs and StoreIDs line up nicely. [dump]: https://github.com/cockroachdb/cockroach/blob/master/scripts/localmetrics/README.md Release note: None
I added a hack in #69469 a while back. cockroach/pkg/ts/catalog/metrics.go Lines 35 to 96 in 1da1968
It's ugly, but gets the job done. |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Labels
C-bug
Code not up to spec/doc, specs & docs deemed correct. Solution expected to change code/behavior.
Describe the problem
The timeseries catalog at
./pkg/ts/catalog
is supposed to let you read off all of the timeseries names that we maintain:However, in practice this isn't possible because histograms have an additional level of indirection. For example,
sql.txn.latency
is a histogram, but the timeseries this exports are of typesql.txn.latency-p99
. As a result,the
(*ts.Server).Dump{Raw}
endpoints (hit by./cockroach debug tsdump
) fail to export latency data (as theydon't learn the correct names to look for).
One solution to this could be to leave the catalog alone, and fix
To Reproduce
./cockroach debug tsdump
against a local CRDB node and notice thatsql.txn.latency
(or any other histogram) is missing. Personally I noticed this while working on #64329 as data was missing for the latency graphs.Expected behavior
The complete timeseries names should be discoverable from the catalog.
The actual names of the metrics are in 1 and are included in the return value of 2 and in 3.
Environment:
Additional context
The likely impact of this bug is that when we use #64329 for roachtest metrics visualization purposes, we won't be able to look at any latencies unless we add more hacks a la
cockroach/pkg/ts/catalog/metrics.go
Lines 40 to 47 in eced6fa
Jira issue: CRDB-7022
The text was updated successfully, but these errors were encountered: