-
Notifications
You must be signed in to change notification settings - Fork 3.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
roachtest: output and investigate intent stats after TPCC tests #65193
Comments
Current state of things: What to do with this number? |
If we've had a look at some TPCC tests and found intent counts to be 0 in all expected cases then I don't think there's much more we need to do here. Most of the intent leaks we've found have been along sad paths (typically client disconnects and errors), and I think it's more fruitful to actively provoke intent leaks with targeted integration tests such as |
I actually tried to see what we can get out of it and what I propose is we could just collect "health" metrics that we want to observe and push them together with stats.json that we push for performance histograms. We can add that to any tests and that would provide us historical perspective or maybe early signs of failures if we have those non-functional metrics. |
Right. We're actually considering what to do about roachperf and roachtest data in the test-eng team these days, since the current system isn't really covering our needs. Not sure where we'll end up, but we're moving in the direction of exposing Prometheus metrics about workloads and tests, which we can pull into some data platform once we figure out what system we want to use for processing and visualizing it (see e.g. #66313 which added this to For these intent counts, I think it'd make sense to simply use the Prometheus intent metrics we already expose since that's the direction we're heading in, rather than adding another data channel for this. But would be curious to get @cockroachdb/test-eng's thoughts. |
If we going to make something dedicated for that it makes no sense to add this. Printing that to logs is pretty much useless. Should we park this then until there's clarity with metrics? |
Yeah, let's wait and see. |
As part of the intent buildup investigation in #60585 we should output the intent counts as reported by
MVCCStats
after TPCC workloads complete, and make sure they are generally 0 in all expected cases (e.g. barring node restarts).Epic: CRDB-2554
Jira issue: CRDB-7476
The text was updated successfully, but these errors were encountered: