Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

roachtest: assert that tokens available returns to near full in perturbation tests #133410

Closed
kvoli opened this issue Oct 24, 2024 · 0 comments · Fixed by #133616
Closed

roachtest: assert that tokens available returns to near full in perturbation tests #133410

kvoli opened this issue Oct 24, 2024 · 0 comments · Fixed by #133616
Assignees
Labels
A-replication-admission-control-v2 Related to introduction of replication AC v2 branch-release-24.3 Used to mark GA and release blockers, technical advisories, and bugs for 24.3 C-enhancement Solution expected to add code/behavior + preserve backward-compat (pg compat issues are exception) GA-blocker T-kv KV Team

Comments

@kvoli
Copy link
Collaborator

kvoli commented Oct 24, 2024

We should assert that the tokens available returns to near full in the pertubation/* roachtests, after the test has completed running the recovery phase.

This will provide some degree of coverage for token leakage, or liveness bugs.

Jira issue: CRDB-43589

Epic CRDB-37515

@kvoli kvoli added C-enhancement Solution expected to add code/behavior + preserve backward-compat (pg compat issues are exception) GA-blocker T-kv KV Team A-replication-admission-control-v2 Related to introduction of replication AC v2 branch-release-24.3 Used to mark GA and release blockers, technical advisories, and bugs for 24.3 labels Oct 24, 2024
@kvoli kvoli changed the title roachtest: assert that tokens available returns to near full in pertubation tests roachtest: assert that tokens available returns to near full in perturbation tests Oct 29, 2024
craig bot pushed a commit that referenced this issue Oct 29, 2024
133234: workload: tpcc consistency check added flag as-of. r=shailendra-patel a=shailendra-patel

While running the consistency checker on the tpcc database with an active tpcc workload, the consistency check fails with a retryable error, such as restart transaction:`TransactionRetryWithProtoRefreshError: ReadWithinUncertaintyIntervalError:`
To fix this, added a new flag `as-of` which allows to run consistency check using `AS OF SYSTEM TIME`.

Epic: none
Release note: None

133616: roachtest: validate token return in perturbation/* tests r=kvoli a=andrewbaptist

This commit adds validation that all RAC tokens are returned on all stable nodes at the end of the test.

Fixes: #133410

Release note: None

133683: license: don't hit EnvOrDefaultInt64 in hot path r=fqazi a=tbg

Saves 0.3%cpu on sysbench.

Fixes #133088.

Release note: None
Epic: None


Co-authored-by: Shailendra Patel <[email protected]>
Co-authored-by: Andrew Baptist <[email protected]>
Co-authored-by: Tobias Grieger <[email protected]>
craig bot pushed a commit that referenced this issue Oct 29, 2024
…133690 #133693

133234: workload: tpcc consistency check added flag as-of. r=srosenberg,nameisbhaskar,vidit-bhat a=shailendra-patel

While running the consistency checker on the tpcc database with an active tpcc workload, the consistency check fails with a retryable error, such as restart transaction:`TransactionRetryWithProtoRefreshError: ReadWithinUncertaintyIntervalError:`
To fix this, added a new flag `as-of` which allows to run consistency check using `AS OF SYSTEM TIME`.

Epic: none
Release note: None

133347: crossclsuter/logical: add settings/stats to ldr ingest chunking r=dt a=dt



133607: sql: check object type when revoking privilege r=rafiss a=rafiss

fixes #131157
Release note (bug fix): Fix an unhandled error that could occur when using `REVOKE ... ON SEQUENCE FROM ... user` on an object that is not a sequence.

133608: schemachanger: force prod values in expensive test r=rafiss a=rafiss

fixes #133437
Release note: None

133616: roachtest: validate token return in perturbation/* tests r=kvoli a=andrewbaptist

This commit adds validation that all RAC tokens are returned on all stable nodes at the end of the test.

Fixes: #133410

Release note: None

133681: roachtest: minor fixes in rebalance/by-load test r=arulajmani a=kvoli

`%` was not escaped, causing it to be substituted with values which
were meant to go later.

e.g., from:

```
node 0 has core count normalized CPU utilization ts datapoint not in [0%!,(float64=1.4920845083839689)100{[{{%!](string=cr.node.sys.cpu.combined.percent-normalized) %!]
...
```

To

```
node idx 0 has core count normalized CPU utilization ts datapoint not in [0%,100%]
...
```

---

The `rebalance/by-load/*` roachtests compare the CPU of nodes and assert
that the distribution of node cpu is bounded +- 20%. The previous metric:

```
sys.cpu.combined.percent_normalized
```

Would occasionally over-report the CPU, as greater than 100% (>1.0),
which is impossible. Use the host CPU instead, which will look at the
machines CPU utilization, rather than any cockroach processes.

```
sys.cpu.host.combined.percent_normalized
```

Part of: #133004
Part of: #133054
Part of: #132019
Part of: #133223
Part of: #132633
Release note: None

133683: license: don't hit EnvOrDefaultInt64 in hot path r=fqazi,mgartner a=tbg

Saves 0.3%cpu on sysbench.

Fixes #133088.

Release note: None
Epic: None


133686: rac2: order testingRCRange.mu before RaftMu in tests r=sumeerbhola a=kvoli

`testingRCRange.mu` was being acquired, and held before acquiring `RaftMu` in `testingRCRange.admit()`, which conflicted with different ordering (reversed). This was a test only issue with `TestRangeController`.

Order `testingRCRange.mu` before `RaftMu` in `admit()`.

Fixes: #133650
Release note: None

133690: roachtest: always pass a Context to queries r=kvoli a=andrewbaptist

Queries can hang if there is no context passed to them. In roachtests, a context can be cancelled if there is a VM preemption. It is always better to use the test context and avoid this risk. This change updates the perturbation/* tests to always pass a context.

Fixes: #133625

Release note: None

133693: kvserver: deflake TestSnapshotsToDrainingNodes r=kvoli a=arulajmani

This test was making tight assertions about the size of the snapshot that was sent. To do so, it was trying to reimplement the actual snapshot sending logic in `kvBatchSnapshotStrategy.Send()`. So these tight assertions weren't of much use -- they were asserting that we were correctly re-implementing `kvBatchSnapshotStrategy.Send()` in `getExpectedSnapshotSizeBytes`. We weren't, as evidenced by some rare flakes.

This patch loosens assertions to deflake the test.

Closes #133517
Release note: None

Co-authored-by: Shailendra Patel <[email protected]>
Co-authored-by: David Taylor <[email protected]>
Co-authored-by: Rafi Shamim <[email protected]>
Co-authored-by: Andrew Baptist <[email protected]>
Co-authored-by: Austen McClernon <[email protected]>
Co-authored-by: Tobias Grieger <[email protected]>
Co-authored-by: Arul Ajmani <[email protected]>
craig bot pushed a commit that referenced this issue Oct 29, 2024
133234: workload: tpcc consistency check added flag as-of. r=srosenberg,nameisbhaskar,vidit-bhat a=shailendra-patel

While running the consistency checker on the tpcc database with an active tpcc workload, the consistency check fails with a retryable error, such as restart transaction:`TransactionRetryWithProtoRefreshError: ReadWithinUncertaintyIntervalError:`
To fix this, added a new flag `as-of` which allows to run consistency check using `AS OF SYSTEM TIME`.

Epic: none
Release note: None

133607: sql: check object type when revoking privilege r=rafiss a=rafiss

fixes #131157
Release note (bug fix): Fix an unhandled error that could occur when using `REVOKE ... ON SEQUENCE FROM ... user` on an object that is not a sequence.

133616: roachtest: validate token return in perturbation/* tests r=kvoli a=andrewbaptist

This commit adds validation that all RAC tokens are returned on all stable nodes at the end of the test.

Fixes: #133410

Release note: None

133686: rac2: order testingRCRange.mu before RaftMu in tests r=sumeerbhola a=kvoli

`testingRCRange.mu` was being acquired, and held before acquiring `RaftMu` in `testingRCRange.admit()`, which conflicted with different ordering (reversed). This was a test only issue with `TestRangeController`.

Order `testingRCRange.mu` before `RaftMu` in `admit()`.

Fixes: #133650
Release note: None

133690: roachtest: always pass a Context to queries r=kvoli a=andrewbaptist

Queries can hang if there is no context passed to them. In roachtests, a context can be cancelled if there is a VM preemption. It is always better to use the test context and avoid this risk. This change updates the perturbation/* tests to always pass a context.

Fixes: #133625

Release note: None

Co-authored-by: Shailendra Patel <[email protected]>
Co-authored-by: Rafi Shamim <[email protected]>
Co-authored-by: Andrew Baptist <[email protected]>
Co-authored-by: Austen McClernon <[email protected]>
@craig craig bot closed this as completed in 865919a Oct 29, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-replication-admission-control-v2 Related to introduction of replication AC v2 branch-release-24.3 Used to mark GA and release blockers, technical advisories, and bugs for 24.3 C-enhancement Solution expected to add code/behavior + preserve backward-compat (pg compat issues are exception) GA-blocker T-kv KV Team
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants