Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ccl/changefeedccl: TestPTSRecordProtectsTargetsAndSystemTables failed #138066

Closed
cockroach-teamcity opened this issue Dec 29, 2024 · 1 comment · Fixed by #138243 or mohini-crl/cockroach#174
Assignees
Labels
A-cdc Change Data Capture branch-release-24.1 Used to mark GA and release blockers, technical advisories, and bugs for 24.1 branch-release-24.2 Used to mark GA and release blockers, technical advisories, and bugs for 24.2 branch-release-24.3 Used to mark GA and release blockers, technical advisories, and bugs for 24.3 branch-release-24.3.3-rc C-test-failure Broken test (automatically or manually discovered). O-robot Originated from a bot. P-2 Issues/test failures with a fix SLA of 3 months T-cdc

Comments

@cockroach-teamcity
Copy link
Member

cockroach-teamcity commented Dec 29, 2024

ccl/changefeedccl.TestPTSRecordProtectsTargetsAndSystemTables failed on release-24.3.3-rc @ 7dab2bb943152658ca94aeae83c450742ec361a5:

=== RUN   TestPTSRecordProtectsTargetsAndSystemTables
    test_log_scope.go:165: test logs captured to: outputs.zip/logTestPTSRecordProtectsTargetsAndSystemTables1695693227
    test_log_scope.go:76: use -show-logs to present logs inline
    protected_timestamps_test.go:500: updating PTS reader cache to 1735465165.707160503,0
    protected_timestamps_test.go:512: enqueuing range 7 for mvccGC
    protected_timestamps_test.go:500: updating PTS reader cache to 1735465168.909630025,0
    protected_timestamps_test.go:512: enqueuing range 9 for mvccGC
    protected_timestamps_test.go:500: updating PTS reader cache to 1735465172.099638197,0
    protected_timestamps_test.go:512: enqueuing range 27 for mvccGC
    protected_timestamps_test.go:500: updating PTS reader cache to 1735465175.304703659,0
    protected_timestamps_test.go:512: enqueuing range 26 for mvccGC
    protected_timestamps_test.go:500: updating PTS reader cache to 1735465177.532670544,0
    protected_timestamps_test.go:512: enqueuing range 8 for mvccGC
    protected_timestamps_test.go:544: 
        	Error Trace:	pkg/ccl/changefeedccl/protected_timestamps_test.go:544
        	Error:      	Received unexpected error:
        	            	batch timestamp 1735465159.945363138,0 must be after replica GC threshold 1735465169.194012228,0 (r9: /Table/{5-6})
        	Test:       	TestPTSRecordProtectsTargetsAndSystemTables
    panic.go:626: -- test log scope end --
test logs left over in: outputs.zip/logTestPTSRecordProtectsTargetsAndSystemTables1695693227
--- FAIL: TestPTSRecordProtectsTargetsAndSystemTables (27.55s)

Parameters:

  • attempt=1
  • deadlock=true
  • run=3
  • shard=16
Help

See also: How To Investigate a Go Test Failure (internal)

Same failure on other branches

/cc @cockroachdb/cdc

This test on roachdash | Improve this report!

Jira issue: CRDB-45879

@cockroach-teamcity cockroach-teamcity added branch-release-24.3.3-rc C-test-failure Broken test (automatically or manually discovered). O-robot Originated from a bot. release-blocker Indicates a release-blocker. Use with branch-release-2x.x label to denote which branch is blocked. T-cdc labels Dec 29, 2024
@blathers-crl blathers-crl bot added the A-cdc Change Data Capture label Dec 29, 2024
@asg0451 asg0451 self-assigned this Jan 2, 2025
@asg0451 asg0451 removed the release-blocker Indicates a release-blocker. Use with branch-release-2x.x label to denote which branch is blocked. label Jan 2, 2025
@exalate-issue-sync exalate-issue-sync bot added the P-2 Issues/test failures with a fix SLA of 3 months label Jan 2, 2025
asg0451 added a commit to asg0451/cockroach that referenced this issue Jan 6, 2025
asg0451 added a commit to asg0451/cockroach that referenced this issue Jan 6, 2025
Fix failing TestPTSRecordProtectsTargetsAndSystemTables test

Fixes: cockroachdb#135639
Fixes: cockroachdb#138066
Fixes: cockroachdb#137885
Fixes: cockroachdb#137505
Fixes: cockroachdb#136396
Fixes: cockroachdb#135805
Fixes: cockroachdb#135639

Release note: None
asg0451 added a commit to asg0451/cockroach that referenced this issue Jan 6, 2025
Fix failing TestPTSRecordProtectsTargetsAndSystemTables test

Fixes: cockroachdb#135639
Fixes: cockroachdb#138066
Fixes: cockroachdb#137885
Fixes: cockroachdb#137505
Fixes: cockroachdb#136396
Fixes: cockroachdb#135805
Fixes: cockroachdb#135639

Release note: None
asg0451 added a commit to asg0451/cockroach that referenced this issue Jan 6, 2025
Fix failing TestPTSRecordProtectsTargetsAndSystemTables test

Fixes: cockroachdb#135639
Fixes: cockroachdb#138066
Fixes: cockroachdb#137885
Fixes: cockroachdb#137505
Fixes: cockroachdb#136396
Fixes: cockroachdb#135805
Fixes: cockroachdb#135639

Release note: None
asg0451 added a commit to asg0451/cockroach that referenced this issue Jan 7, 2025
Fix failing TestPTSRecordProtectsTargetsAndSystemTables test

Fixes: cockroachdb#135639
Fixes: cockroachdb#138066
Fixes: cockroachdb#137885
Fixes: cockroachdb#137505
Fixes: cockroachdb#136396
Fixes: cockroachdb#135805
Fixes: cockroachdb#135639

Release note: None
asg0451 added a commit to asg0451/cockroach that referenced this issue Jan 13, 2025
Fix failing TestPTSRecordProtectsTargetsAndSystemTables test

Fixes: cockroachdb#135639
Fixes: cockroachdb#138066
Fixes: cockroachdb#137885
Fixes: cockroachdb#137505
Fixes: cockroachdb#136396
Fixes: cockroachdb#135805
Fixes: cockroachdb#135639

Release note: None
asg0451 added a commit to asg0451/cockroach that referenced this issue Jan 14, 2025
Fix failing TestPTSRecordProtectsTargetsAndSystemTables test

Fixes: cockroachdb#135639
Fixes: cockroachdb#138066
Fixes: cockroachdb#137885
Fixes: cockroachdb#137505
Fixes: cockroachdb#136396
Fixes: cockroachdb#135805
Fixes: cockroachdb#135639

Release note: None
asg0451 added a commit to asg0451/cockroach that referenced this issue Jan 14, 2025
Fix failing TestPTSRecordProtectsTargetsAndSystemTables test

Fixes: cockroachdb#135639
Fixes: cockroachdb#138066
Fixes: cockroachdb#137885
Fixes: cockroachdb#137505
Fixes: cockroachdb#136396
Fixes: cockroachdb#135805
Fixes: cockroachdb#135639

Release note: None
craig bot pushed a commit that referenced this issue Jan 14, 2025
137805: sql/row: use Put instead of CPut when updating value of secondary index r=yuzefovich,stevendanna a=michae2

**sql/row: use Put instead of CPut when updating value of secondary index**

When an UPDATE statement changes the value but not the key of a secondary index (e.g. an update to the stored columns of a secondary index) we need to write a new version of the secondary index KV with the new value.

We were using a CPutAllowingIfNotExists to do this, which verified that if the KV existed, the expected value was the value before update. But there's no need for this verification. We have other mechanisms to detect a write-write conflict with any other transaction that could have changed the value concurrently. We can simply use a Put to overwrite the previous value.

This also matches what we do for the primary index when the PK doesn't change.

Epic: None

Release note: None

---

**sql: change CPutAllowingIfNotExists with nil expValue to CPut**

CPutAllowingIfNotExists with empty expValue is equivalent to CPut with empty expValue, so change a few spots to use regular CPut. This almost gets rid of CPutAllowingIfNotExists entirely, but there is at least one spot in backfill (introduced in #138707) that needs to allow for both a non-empty expValue and non-existence of the KV.

Also drop "(expecting does not exist)" from CPut tracing, as CPut with empty expValue is now overwhelmingly the most common use of CPut. And this matches the tracing in #138707.

Epic: None

Release note: None

138243: changefeedccl: fix PTS test r=stevendanna a=asg0451

Fix failing TestPTSRecordProtectsTargetsAndSystemTables test

Fixes: #135639
Fixes: #138066
Fixes: #137885
Fixes: #137505
Fixes: #136396
Fixes: #135805
Fixes: #135639

Release note: None

138696: sql: add CHECK EXTERNAL CONNECTION command r=kev-cao a=kev-cao

This patch adds the `CHECK EXTERNAL CONNECTION` command and replaces the old `SHOW BACKUP CONNECTION` syntax.

Epic: None

Release note: None

138740: opt/bench: fix benchmark regression r=mgartner a=mgartner

#### opt/bench: fix benchmark regression

PR #138641 caused extra allocations for plan gist factories in optimizer
benchmarks. These allocations should not be included in benchmark
results, so they have been eliminated.

Release note: None

#### util/base64: use `bytes.Buffer` instead of `strings.Builder` in `Encoder`

For our purposes in base64-encoding plan gists, using `bytes.Buffer` in
`Encoder` causes fewer allocations, presumably because of a more
aggressive growth algorithm.

Epic: None

Release note: None


138877: opt: reduce allocations when filtering histogram buckets r=mgartner a=mgartner

`cat.HistogramBuckets` are now returned and passed by value in
`getFilteredBucket` and `(*Histogram).addBucket`, respectively,
eliminating some heap allocations.

Also, two allocations when building spans from buckets via the
`spanBuilder` have been combined into one. The new `(*spanBuilder).init`
method simplifies the API by no longer requiring that prefix datums are
passed to every invocation of `makeSpanFromBucket`. This also reduces
redundant copying of the prefix.

Epic: None

Release note: None


139029: sql/logictest: disable column family mutations in some cases r=mgartner a=mgartner

Random column family mutations are now disabled for `CREATE TABLE`
statements with unique, hash-sharded indexes. This prevents the AST
from being reserialized with a `UNIQUE` constraint with invalid options,
instead of the original `UNIQUE INDEX`. See #65929 and #107398.

Epic: None

Release note: None


139039: ccl/serverccl: revise `TestTenantVars` cpu time checks r=xinhaoz a=xinhaoz

Previously, this test verified that a portion of the test's user cpu time would be less than or equal to the entire tenant user cpu time up to that point. This check is flaky because there's no guarantee that the inactive tenant's cpu time will surpass the test cpu time. We now simply verify that the test cpu times are greater than or equal to the tenant metrics.

The test was likely  passing before 331596c because the reported tenant cpu time was accounting for the sql server prestart. A tenant's user cpu metrics are tracked from the time the `_status/load` endpoint is registered, and the commit above moved the router setup to occur just after the prestart.

Epic: none
Fixes: #119329

Release note: None

Co-authored-by: Michael Erickson <[email protected]>
Co-authored-by: Miles Frankel <[email protected]>
Co-authored-by: Kevin Cao <[email protected]>
Co-authored-by: Marcus Gartner <[email protected]>
Co-authored-by: Xin Hao Zhang <[email protected]>
craig bot pushed a commit that referenced this issue Jan 14, 2025
137947: ccl/changeedccl: Add changefeed options into nemesis tests r=wenyihu6 a=aerfrei

This work makes sure our nemesis tests for changefeeds randomize
over the options we use upon changefeed creation. This randomly adds
the key_in_value option (see below) and full_table_name option half
of the time and checks that the changefeed messages respect them in
the beforeAfter validator.

Note the following limitations: the full_table_name option, when on,
asserts that the topic in the output will be d.public.{table_name}
instead of checking for the actual name of the database/schema.

This change also does not add the key_in_value option when for the
webhook and cloudstorage sinks. Even before this change, since
key_in_value is on by default for those sinks, we remove the key
from the value in those testfeed messages for ease of testing.
Unfortunately, this makes these cases hard to test, so we leave them
out for now.

See also: #134119

Epic: [CRDB-42866](https://cockroachlabs.atlassian.net/browse/CRDB-42866)

Release note: None

138243: changefeedccl: fix PTS test r=stevendanna a=asg0451

Fix failing TestPTSRecordProtectsTargetsAndSystemTables test

Fixes: #135639
Fixes: #138066
Fixes: #137885
Fixes: #137505
Fixes: #136396
Fixes: #135805
Fixes: #135639

Release note: None

138697: crosscluster: add crdb_route parameter for LDR and PCR r=jeffswenson a=jeffswenson

The `crdb_route` query parameter determines how the destination
cluster's stream processor connects to the source cluster. There are two
options for the query parameter: "node" and "gateway". Here is an
example of using the route paraemeter to create an external connection
that is usable for LDR or PCR.

```SQL
-- A connection that routes all replication traffic via the configured
-- connection URI.
CREATE EXTERNAL CONNECTION 'external://source-db' AS
'postgresql://user:[email protected]:26257/sslmode=verify-full&crdb_route=gateway'

-- A connection that enumerates nodes in the source cluster and connects
-- directly to nodes.
CREATE EXTERNAL CONNECTION 'external://source-db' AS
'postgresql://user:[email protected]:26257/sslmode=verify-full&crdb_route=node'
```

The "node" option is the original and default behavior. The "node"
option requires the source and destination clusters to be in the same IP
network. The way it works is the connection string supplied to LDR and
PCR is used to connect to the source cluster and generate a physical sql
plan for the replication. The physical plan includes the
`--sql-addvertise-addr` for nodes in the source cluster and processors
in the destination cluster connect directly to the nodes. Using the
"node" routing is ideal because there are no extra network hops and the
source cluster can control how load is distributed across its nodes.

The "gateway" option is a new option that is introduced in order to
support routing PCR and LDR over a load balancer. When specified, the
destination cluster ignores the node addresses returned by the physical
plan and instead opens a connection for each processor to the URI
supplied by the user. This introduces an extra network hop and does not
distribute load as evenly, but it works in deployments where the source
cluster is only reachable over a load balancer.

Routing over a load balancer only requires changing the destination
clusters behavior. Nodes in the source cluster were always implemented to
act as a gateway and serve rangefeeds that are backed by data stored on
different nodes. This support exists so that the cross cluster
replication does not need to re-plan every time a range moves to a
different node.

Release note (sql change): LDR and PCR may use the `crdb_route=gateway`
query option to route the replication streams over a load balancer.

Epic: [CRDB-40896](https://cockroachlabs.atlassian.net/browse/CRDB-40896)

138877: opt: reduce allocations when filtering histogram buckets r=mgartner a=mgartner

`cat.HistogramBuckets` are now returned and passed by value in
`getFilteredBucket` and `(*Histogram).addBucket`, respectively,
eliminating some heap allocations.

Also, two allocations when building spans from buckets via the
`spanBuilder` have been combined into one. The new `(*spanBuilder).init`
method simplifies the API by no longer requiring that prefix datums are
passed to every invocation of `makeSpanFromBucket`. This also reduces
redundant copying of the prefix.

Epic: None

Release note: None


139029: sql/logictest: disable column family mutations in some cases r=mgartner a=mgartner

Random column family mutations are now disabled for `CREATE TABLE`
statements with unique, hash-sharded indexes. This prevents the AST
from being reserialized with a `UNIQUE` constraint with invalid options,
instead of the original `UNIQUE INDEX`. See #65929 and #107398.

Epic: None

Release note: None


139036: testutils,kvserver: add StartExecTrace and adopt in TestPromoteNonVoterInAddVoter r=tbg a=tbg

Every now and then we end up with tests that fail every once in a blue moon, and we can't reproduce at will.
#138864 was one of them, and execution traces helped a great deal.

This PR introduces a helper for unit tests that execution traces the test and keeps the trace on failure, and adopts it for one of these pesky unit tests.

The trace contains the goroutine ID in the filename. Additionally, the test's main goroutine is marked via a trace region. Sample below:

<img width="1226" alt="image" src="https://github.com/user-attachments/assets/3f641c28-64f7-4fba-9267-ddd48d8dda03" />

Closes #134383.

Epic: None
Release note: None


Co-authored-by: Aerin Freilich <[email protected]>
Co-authored-by: Miles Frankel <[email protected]>
Co-authored-by: Jeff Swenson <[email protected]>
Co-authored-by: Marcus Gartner <[email protected]>
Co-authored-by: Tobias Grieger <[email protected]>
@craig craig bot closed this as completed in 812ae98 Jan 14, 2025
Copy link

blathers-crl bot commented Jan 14, 2025

Based on the specified backports for linked PR #138243, I applied the following new label(s) to this issue: branch-release-24.1, branch-release-24.2, branch-release-24.3. Please adjust the labels as needed to match the branches actually affected by this issue, including adding any known older branches.

🦉 Hoot! I am a Blathers, a bot for CockroachDB. My owner is dev-inf.

@blathers-crl blathers-crl bot added branch-release-24.1 Used to mark GA and release blockers, technical advisories, and bugs for 24.1 branch-release-24.2 Used to mark GA and release blockers, technical advisories, and bugs for 24.2 branch-release-24.3 Used to mark GA and release blockers, technical advisories, and bugs for 24.3 labels Jan 14, 2025
blathers-crl bot pushed a commit that referenced this issue Jan 14, 2025
Fix failing TestPTSRecordProtectsTargetsAndSystemTables test

Fixes: #135639
Fixes: #138066
Fixes: #137885
Fixes: #137505
Fixes: #136396
Fixes: #135805
Fixes: #135639

Release note: None
InManuBytes pushed a commit to InManuBytes/cockroach that referenced this issue Jan 15, 2025
Fix failing TestPTSRecordProtectsTargetsAndSystemTables test

Fixes: cockroachdb#135639
Fixes: cockroachdb#138066
Fixes: cockroachdb#137885
Fixes: cockroachdb#137505
Fixes: cockroachdb#136396
Fixes: cockroachdb#135805
Fixes: cockroachdb#135639

Release note: None
asg0451 added a commit to asg0451/cockroach that referenced this issue Jan 16, 2025
Fix failing TestPTSRecordProtectsTargetsAndSystemTables test

Fixes: cockroachdb#135639
Fixes: cockroachdb#138066
Fixes: cockroachdb#137885
Fixes: cockroachdb#137505
Fixes: cockroachdb#136396
Fixes: cockroachdb#135805
Fixes: cockroachdb#135639

Release note: None
asg0451 added a commit to asg0451/cockroach that referenced this issue Jan 16, 2025
Fix failing TestPTSRecordProtectsTargetsAndSystemTables test

Fixes: cockroachdb#135639
Fixes: cockroachdb#138066
Fixes: cockroachdb#137885
Fixes: cockroachdb#137505
Fixes: cockroachdb#136396
Fixes: cockroachdb#135805
Fixes: cockroachdb#135639

Release note: None
asg0451 added a commit to asg0451/cockroach that referenced this issue Jan 16, 2025
Fix failing TestPTSRecordProtectsTargetsAndSystemTables test

Fixes: cockroachdb#135639
Fixes: cockroachdb#138066
Fixes: cockroachdb#137885
Fixes: cockroachdb#137505
Fixes: cockroachdb#136396
Fixes: cockroachdb#135805
Fixes: cockroachdb#135639

Release note: None
kvoli pushed a commit to kvoli/cockroach that referenced this issue Jan 16, 2025
Fix failing TestPTSRecordProtectsTargetsAndSystemTables test

Fixes: cockroachdb#135639
Fixes: cockroachdb#138066
Fixes: cockroachdb#137885
Fixes: cockroachdb#137505
Fixes: cockroachdb#136396
Fixes: cockroachdb#135805
Fixes: cockroachdb#135639

Release note: None
mohini-crl pushed a commit to mohini-crl/cockroach that referenced this issue Jan 17, 2025
Fix failing TestPTSRecordProtectsTargetsAndSystemTables test

Fixes: cockroachdb#135639
Fixes: cockroachdb#138066
Fixes: cockroachdb#137885
Fixes: cockroachdb#137505
Fixes: cockroachdb#136396
Fixes: cockroachdb#135805
Fixes: cockroachdb#135639

Release note: None
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-cdc Change Data Capture branch-release-24.1 Used to mark GA and release blockers, technical advisories, and bugs for 24.1 branch-release-24.2 Used to mark GA and release blockers, technical advisories, and bugs for 24.2 branch-release-24.3 Used to mark GA and release blockers, technical advisories, and bugs for 24.3 branch-release-24.3.3-rc C-test-failure Broken test (automatically or manually discovered). O-robot Originated from a bot. P-2 Issues/test failures with a fix SLA of 3 months T-cdc
Projects
None yet
2 participants