-
Notifications
You must be signed in to change notification settings - Fork 3.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
roachtest: assert that tokens available returns to near full in perturbation tests #133410
Labels
A-replication-admission-control-v2
Related to introduction of replication AC v2
branch-release-24.3
Used to mark GA and release blockers, technical advisories, and bugs for 24.3
C-enhancement
Solution expected to add code/behavior + preserve backward-compat (pg compat issues are exception)
GA-blocker
T-kv
KV Team
Comments
kvoli
added
C-enhancement
Solution expected to add code/behavior + preserve backward-compat (pg compat issues are exception)
GA-blocker
T-kv
KV Team
A-replication-admission-control-v2
Related to introduction of replication AC v2
branch-release-24.3
Used to mark GA and release blockers, technical advisories, and bugs for 24.3
labels
Oct 24, 2024
kvoli
changed the title
roachtest: assert that tokens available returns to near full in pertubation tests
roachtest: assert that tokens available returns to near full in perturbation tests
Oct 29, 2024
craig bot
pushed a commit
that referenced
this issue
Oct 29, 2024
133234: workload: tpcc consistency check added flag as-of. r=shailendra-patel a=shailendra-patel While running the consistency checker on the tpcc database with an active tpcc workload, the consistency check fails with a retryable error, such as restart transaction:`TransactionRetryWithProtoRefreshError: ReadWithinUncertaintyIntervalError:` To fix this, added a new flag `as-of` which allows to run consistency check using `AS OF SYSTEM TIME`. Epic: none Release note: None 133616: roachtest: validate token return in perturbation/* tests r=kvoli a=andrewbaptist This commit adds validation that all RAC tokens are returned on all stable nodes at the end of the test. Fixes: #133410 Release note: None 133683: license: don't hit EnvOrDefaultInt64 in hot path r=fqazi a=tbg Saves 0.3%cpu on sysbench. Fixes #133088. Release note: None Epic: None Co-authored-by: Shailendra Patel <[email protected]> Co-authored-by: Andrew Baptist <[email protected]> Co-authored-by: Tobias Grieger <[email protected]>
craig bot
pushed a commit
that referenced
this issue
Oct 29, 2024
…133690 #133693 133234: workload: tpcc consistency check added flag as-of. r=srosenberg,nameisbhaskar,vidit-bhat a=shailendra-patel While running the consistency checker on the tpcc database with an active tpcc workload, the consistency check fails with a retryable error, such as restart transaction:`TransactionRetryWithProtoRefreshError: ReadWithinUncertaintyIntervalError:` To fix this, added a new flag `as-of` which allows to run consistency check using `AS OF SYSTEM TIME`. Epic: none Release note: None 133347: crossclsuter/logical: add settings/stats to ldr ingest chunking r=dt a=dt 133607: sql: check object type when revoking privilege r=rafiss a=rafiss fixes #131157 Release note (bug fix): Fix an unhandled error that could occur when using `REVOKE ... ON SEQUENCE FROM ... user` on an object that is not a sequence. 133608: schemachanger: force prod values in expensive test r=rafiss a=rafiss fixes #133437 Release note: None 133616: roachtest: validate token return in perturbation/* tests r=kvoli a=andrewbaptist This commit adds validation that all RAC tokens are returned on all stable nodes at the end of the test. Fixes: #133410 Release note: None 133681: roachtest: minor fixes in rebalance/by-load test r=arulajmani a=kvoli `%` was not escaped, causing it to be substituted with values which were meant to go later. e.g., from: ``` node 0 has core count normalized CPU utilization ts datapoint not in [0%!,(float64=1.4920845083839689)100{[{{%!](string=cr.node.sys.cpu.combined.percent-normalized) %!] ... ``` To ``` node idx 0 has core count normalized CPU utilization ts datapoint not in [0%,100%] ... ``` --- The `rebalance/by-load/*` roachtests compare the CPU of nodes and assert that the distribution of node cpu is bounded +- 20%. The previous metric: ``` sys.cpu.combined.percent_normalized ``` Would occasionally over-report the CPU, as greater than 100% (>1.0), which is impossible. Use the host CPU instead, which will look at the machines CPU utilization, rather than any cockroach processes. ``` sys.cpu.host.combined.percent_normalized ``` Part of: #133004 Part of: #133054 Part of: #132019 Part of: #133223 Part of: #132633 Release note: None 133683: license: don't hit EnvOrDefaultInt64 in hot path r=fqazi,mgartner a=tbg Saves 0.3%cpu on sysbench. Fixes #133088. Release note: None Epic: None 133686: rac2: order testingRCRange.mu before RaftMu in tests r=sumeerbhola a=kvoli `testingRCRange.mu` was being acquired, and held before acquiring `RaftMu` in `testingRCRange.admit()`, which conflicted with different ordering (reversed). This was a test only issue with `TestRangeController`. Order `testingRCRange.mu` before `RaftMu` in `admit()`. Fixes: #133650 Release note: None 133690: roachtest: always pass a Context to queries r=kvoli a=andrewbaptist Queries can hang if there is no context passed to them. In roachtests, a context can be cancelled if there is a VM preemption. It is always better to use the test context and avoid this risk. This change updates the perturbation/* tests to always pass a context. Fixes: #133625 Release note: None 133693: kvserver: deflake TestSnapshotsToDrainingNodes r=kvoli a=arulajmani This test was making tight assertions about the size of the snapshot that was sent. To do so, it was trying to reimplement the actual snapshot sending logic in `kvBatchSnapshotStrategy.Send()`. So these tight assertions weren't of much use -- they were asserting that we were correctly re-implementing `kvBatchSnapshotStrategy.Send()` in `getExpectedSnapshotSizeBytes`. We weren't, as evidenced by some rare flakes. This patch loosens assertions to deflake the test. Closes #133517 Release note: None Co-authored-by: Shailendra Patel <[email protected]> Co-authored-by: David Taylor <[email protected]> Co-authored-by: Rafi Shamim <[email protected]> Co-authored-by: Andrew Baptist <[email protected]> Co-authored-by: Austen McClernon <[email protected]> Co-authored-by: Tobias Grieger <[email protected]> Co-authored-by: Arul Ajmani <[email protected]>
craig bot
pushed a commit
that referenced
this issue
Oct 29, 2024
133234: workload: tpcc consistency check added flag as-of. r=srosenberg,nameisbhaskar,vidit-bhat a=shailendra-patel While running the consistency checker on the tpcc database with an active tpcc workload, the consistency check fails with a retryable error, such as restart transaction:`TransactionRetryWithProtoRefreshError: ReadWithinUncertaintyIntervalError:` To fix this, added a new flag `as-of` which allows to run consistency check using `AS OF SYSTEM TIME`. Epic: none Release note: None 133607: sql: check object type when revoking privilege r=rafiss a=rafiss fixes #131157 Release note (bug fix): Fix an unhandled error that could occur when using `REVOKE ... ON SEQUENCE FROM ... user` on an object that is not a sequence. 133616: roachtest: validate token return in perturbation/* tests r=kvoli a=andrewbaptist This commit adds validation that all RAC tokens are returned on all stable nodes at the end of the test. Fixes: #133410 Release note: None 133686: rac2: order testingRCRange.mu before RaftMu in tests r=sumeerbhola a=kvoli `testingRCRange.mu` was being acquired, and held before acquiring `RaftMu` in `testingRCRange.admit()`, which conflicted with different ordering (reversed). This was a test only issue with `TestRangeController`. Order `testingRCRange.mu` before `RaftMu` in `admit()`. Fixes: #133650 Release note: None 133690: roachtest: always pass a Context to queries r=kvoli a=andrewbaptist Queries can hang if there is no context passed to them. In roachtests, a context can be cancelled if there is a VM preemption. It is always better to use the test context and avoid this risk. This change updates the perturbation/* tests to always pass a context. Fixes: #133625 Release note: None Co-authored-by: Shailendra Patel <[email protected]> Co-authored-by: Rafi Shamim <[email protected]> Co-authored-by: Andrew Baptist <[email protected]> Co-authored-by: Austen McClernon <[email protected]>
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Labels
A-replication-admission-control-v2
Related to introduction of replication AC v2
branch-release-24.3
Used to mark GA and release blockers, technical advisories, and bugs for 24.3
C-enhancement
Solution expected to add code/behavior + preserve backward-compat (pg compat issues are exception)
GA-blocker
T-kv
KV Team
We should assert that the tokens available returns to near full in the
pertubation/*
roachtests, after the test has completed running the recovery phase.This will provide some degree of coverage for token leakage, or liveness bugs.
Jira issue: CRDB-43589
Epic CRDB-37515
The text was updated successfully, but these errors were encountered: