Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

roachtest: Allow roachtests to opt out of failing in post validation … #102135

Closed
wants to merge 12 commits into from

Conversation

smg260
Copy link
Contributor

@smg260 smg260 commented Apr 24, 2023

…when

dead nodes are detected.

Epic: none
Fixes: #102131

Release note: None

@cockroach-teamcity
Copy link
Member

This change is Reviewable

@smg260 smg260 marked this pull request as ready for review April 24, 2023 16:08
@smg260 smg260 requested a review from a team as a code owner April 24, 2023 16:08
@smg260 smg260 requested review from herkolategan and srosenberg and removed request for a team April 24, 2023 16:08
ecwall and others added 12 commits April 24, 2023 17:28
… added

Fixes cockroachdb#102028

Previously adding a new version would break this test
```
Failed
=== RUN   TestTenantUpgradeFailure/upgrade_tenant_have_it_crash_then_resume
    tenant_upgrade_test.go:352: query 'SHOW CLUSTER SETTING version': expected:
        1000023.1

        got:
        1000023.1-2

    --- FAIL: TestTenantUpgradeFailure/upgrade_tenant_have_it_crash_then_resume (60.42s)
```

Now the `-*` part of the version is ignored.

Release note: None
This commit modifies the costfuzz and unoptimized query oracle tests to
always log each "main" statement. The stmt is commented out so that the
log is still replayable. Additionally, a timestamp is included to help
with debugging later.

Epic: None

Release note: None
This code change shards the test cases under `TestDockerCLI`
into separate tests and adds helpers for generating those
tests. This allows us to benefit from sharding `acceptance_test`
so we also shard the test target into 16 shards.

The changes cut the time it takes to run acceptance tests in CI
by easily more than 50%.

Release note: None
Epic: none
The hash table has multiple slices that are limited by the maximum
length of the probing batch. Previously, we hard-coded that length as
`coldata.BatchSize() = 1024`, but that is incorrect for the hash
aggregator.  The hash aggregator (for performance reasons) buffers up
to `coldata.MaxBatchSize = 4096` tuples before using the hash table.
As a result, whenever we performed the hash aggregation on more than
1024 tuples, we wouldn't account for those slices correctly, and in the
worst case the under accounting would exceed 100KiB which could be
non-trivial if hash aggregation queries are run with some concurrency.
This commit fixes that oversight.

Epic: None

Release note: None
This change teaches DistSender to populate a BatchRequest's
header with the pprof labels set on the sender's context.
If the node processing the request on the server side has CPU
profiling with labels enabled, then the labels stored in the BatchRequest
will be applied to the root context of the goroutine executing the
request on the server. Doing so will ensure that all server-side
CPU samples of the root goroutine and all its spawned goroutine
will be labeled correctly.

Propagating these labels across RPC boundaries is useful to correlate
server side samples with the sender. For example, in a CPU profile
generated with this change, we will be able to identify which backup
job sent an ExportRequest that is causing a CPU hotspot on a remote node.

Fixes: cockroachdb#100166
Release note: None
Informs cockroachdb#100131.

This commit revives and modernizes two tests that were removed/stripped
down when snapshot isolation was previously deleted. It then expands the
tests to exercise all isolation levels.

Release note: None
This adds tracking for isolation level information of the associated
transaction in lock modes. This is only done for non-locking reads and
exclusive locks. We then use this to modify the conflict resolution
rules between non-locking reads and exclusive locks. The new conflict
resolution rules are as follows:

- If either the non-locking read or the exclusive lock belongs to a
a transaction that can tolerate write-skew (read committed or
snapshot), the non-locking read does not block on the exclusive lock.
- If both the non-locking read and exclusive lock belong to
serializable transactions and the non-locking read is doing so at a
timestamp below the exclusive lock, it does not block.
- Otherwise, the blocking behavior is dictated by the
`ExclusiveLocksBlockNonLockingReads` cluster setting.

This patch also improves how testing for the Conflicts function is
organized. This was motivated by the increased surface area that needed
testing now that we're pushing isolation levels into some lock modes.

Note that lock modes are yet to be hooked up to the concurrency control
package, and as such, the patch itself doesn't resolve the linked issue.

Informs cockroachdb#94729

Release note: None
Informs cockroachdb#100131.

This commit revives the isolation level exploration in kv correctness
tests. These were removed in 39ba88b. Very little of the code here is
new, beyond extending this code for Read Committed.

I have confirmed that `TestTxnDBWriteSkewAnomaly` fails if it does not
expect Snapshot isolation to permit write skew and I update
`IsEndTxnTriggeringRetryError` to ignore skew under Snapshot isolation.

Release note: None
One of the previous commits was misusing log.Warning where log.VEventf
was expected.

Release note: None
This patch fixes an issue when structlogging.StartHotRangesLoggingScheduler
is guarded by `DiagnosticsReportingEnabled` cluster setting and doesn't
properly handles changes of this setting.
Now, when diagnostics reporting is enabled via cluster setting, Hot range
logging scheduler starts (if not started yet), and cancels previously
started scheduler if setting is disabled.

Release note: None
…when

dead nodes are detected.

Epic: none
Fixes: cockroachdb#102131

Release note: None
@smg260 smg260 force-pushed the 102131_dead_node_opt_out branch from 6b608b3 to b4bdafc Compare April 24, 2023 17:29
@smg260 smg260 requested review from a team as code owners April 24, 2023 17:29
@smg260 smg260 requested a review from a team April 24, 2023 17:29
@smg260 smg260 requested a review from a team as a code owner April 24, 2023 17:29
@smg260 smg260 requested a review from a team April 24, 2023 17:29
@smg260 smg260 requested review from a team as code owners April 24, 2023 17:29
@smg260 smg260 requested review from mgartner and removed request for a team April 24, 2023 17:29
@smg260 smg260 removed request for a team and mgartner April 24, 2023 17:29
@smg260 smg260 marked this pull request as draft April 24, 2023 17:30
@smg260 smg260 closed this Apr 24, 2023
@smg260 smg260 deleted the 102131_dead_node_opt_out branch April 24, 2023 17:31
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

roachtest: allow tests to opt out of dead node validation
9 participants