Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

cdc: support and validate options in randomized testing #134119

Open
8 tasks
rharding6373 opened this issue Nov 1, 2024 · 1 comment
Open
8 tasks

cdc: support and validate options in randomized testing #134119

rharding6373 opened this issue Nov 1, 2024 · 1 comment
Assignees
Labels
A-cdc Change Data Capture C-enhancement Solution expected to add code/behavior + preserve backward-compat (pg compat issues are exception) T-cdc

Comments

@rharding6373
Copy link
Collaborator

rharding6373 commented Nov 1, 2024

CDC should add support for and validate documented options in randomized testing. Options that make sense to test include:

  • diff
  • envelope
  • format
  • full_table_name
  • initial_scan
  • kafka_sink_config / pubsub_sink_config / webhook_sink_config (no extra validation required, since the randomized test is for correctness testing)
  • key_in_value
  • mvcc_timestamp

Jira issue: CRDB-43921

@rharding6373 rharding6373 added C-enhancement Solution expected to add code/behavior + preserve backward-compat (pg compat issues are exception) A-cdc Change Data Capture T-cdc labels Nov 1, 2024
Copy link

blathers-crl bot commented Nov 1, 2024

cc @cockroachdb/cdc

aerfrei added a commit to aerfrei/cockroach that referenced this issue Dec 23, 2024
This work makes sure our nemesis tests for changefeeds randomize
over the options we use upon changefeed creation. This randomly adds
the key_in_value option (see below) and full_table_name option half
of the time and checks that the changefeed messages respect them in
the beforeAfter validator.

Note the following limitations: the full_table_name option, when on,
asserts that the topic in the output will be d.public.{table_name}
instead of checking for the actual name of the database/schema.

This change also does not add the key_in_value option when for the
webhook and cloudstorage sinks. Even before this change, since
key_in_value is on by default for those sinks, we remove the key
from the value in those testfeed messages for ease of testing.
Unfortunately, this makes these cases hard to test, so we leave them
out for now.

See also: cockroachdb#134119

Epic: CRDB-42866

Release note: None
aerfrei added a commit to aerfrei/cockroach that referenced this issue Jan 7, 2025
This work makes sure our nemesis tests for changefeeds randomize
over the options we use upon changefeed creation. This randomly adds
the key_in_value option (see below) and full_table_name option half
of the time and checks that the changefeed messages respect them in
the beforeAfter validator.

Note the following limitations: the full_table_name option, when on,
asserts that the topic in the output will be d.public.{table_name}
instead of checking for the actual name of the database/schema.

This change also does not add the key_in_value option when for the
webhook and cloudstorage sinks. Even before this change, since
key_in_value is on by default for those sinks, we remove the key
from the value in those testfeed messages for ease of testing.
Unfortunately, this makes these cases hard to test, so we leave them
out for now.

See also: cockroachdb#134119

Epic: CRDB-42866

Release note: None
aerfrei added a commit to aerfrei/cockroach that referenced this issue Jan 7, 2025
This work makes sure our nemesis tests for changefeeds randomize
over the options we use upon changefeed creation. This randomly adds
the key_in_value option (see below) and full_table_name option half
of the time and checks that the changefeed messages respect them in
the beforeAfter validator.

Note the following limitations: the full_table_name option, when on,
asserts that the topic in the output will be d.public.{table_name}
instead of checking for the actual name of the database/schema.

This change also does not add the key_in_value option when for the
webhook and cloudstorage sinks. Even before this change, since
key_in_value is on by default for those sinks, we remove the key
from the value in those testfeed messages for ease of testing.
Unfortunately, this makes these cases hard to test, so we leave them
out for now.

See also: cockroachdb#134119

Epic: CRDB-42866

Release note: None
aerfrei added a commit to aerfrei/cockroach that referenced this issue Jan 7, 2025
This work makes sure our nemesis tests for changefeeds randomize
over the options we use upon changefeed creation. This randomly adds
the key_in_value option (see below) and full_table_name option half
of the time and checks that the changefeed messages respect them in
the beforeAfter validator.

Note the following limitations: the full_table_name option, when on,
asserts that the topic in the output will be d.public.{table_name}
instead of checking for the actual name of the database/schema.

This change also does not add the key_in_value option when for the
webhook and cloudstorage sinks. Even before this change, since
key_in_value is on by default for those sinks, we remove the key
from the value in those testfeed messages for ease of testing.
Unfortunately, this makes these cases hard to test, so we leave them
out for now.

See also: cockroachdb#134119

Epic: CRDB-42866

Release note: None
aerfrei added a commit to aerfrei/cockroach that referenced this issue Jan 9, 2025
This work makes sure our nemesis tests for changefeeds randomize
over the options we use upon changefeed creation. This randomly adds
the key_in_value option (see below) and full_table_name option half
of the time and checks that the changefeed messages respect them in
the beforeAfter validator.

Note the following limitations: the full_table_name option, when on,
asserts that the topic in the output will be d.public.{table_name}
instead of checking for the actual name of the database/schema.

This change also does not add the key_in_value option when for the
webhook and cloudstorage sinks. Even before this change, since
key_in_value is on by default for those sinks, we remove the key
from the value in those testfeed messages for ease of testing.
Unfortunately, this makes these cases hard to test, so we leave them
out for now.

See also: cockroachdb#134119

Epic: CRDB-42866

Release note: None
aerfrei added a commit to aerfrei/cockroach that referenced this issue Jan 14, 2025
This work makes sure our nemesis tests for changefeeds randomize
over the options we use upon changefeed creation. This randomly adds
the key_in_value option (see below) and full_table_name option half
of the time and checks that the changefeed messages respect them in
the beforeAfter validator.

Note the following limitations: the full_table_name option, when on,
asserts that the topic in the output will be d.public.{table_name}
instead of checking for the actual name of the database/schema.

This change also does not add the key_in_value option when for the
webhook and cloudstorage sinks. Even before this change, since
key_in_value is on by default for those sinks, we remove the key
from the value in those testfeed messages for ease of testing.
Unfortunately, this makes these cases hard to test, so we leave them
out for now.

See also: cockroachdb#134119

Epic: CRDB-42866

Release note: None
craig bot pushed a commit that referenced this issue Jan 14, 2025
137947: ccl/changeedccl: Add changefeed options into nemesis tests r=wenyihu6 a=aerfrei

This work makes sure our nemesis tests for changefeeds randomize
over the options we use upon changefeed creation. This randomly adds
the key_in_value option (see below) and full_table_name option half
of the time and checks that the changefeed messages respect them in
the beforeAfter validator.

Note the following limitations: the full_table_name option, when on,
asserts that the topic in the output will be d.public.{table_name}
instead of checking for the actual name of the database/schema.

This change also does not add the key_in_value option when for the
webhook and cloudstorage sinks. Even before this change, since
key_in_value is on by default for those sinks, we remove the key
from the value in those testfeed messages for ease of testing.
Unfortunately, this makes these cases hard to test, so we leave them
out for now.

See also: #134119

Epic: [CRDB-42866](https://cockroachlabs.atlassian.net/browse/CRDB-42866)

Release note: None

138243: changefeedccl: fix PTS test r=stevendanna a=asg0451

Fix failing TestPTSRecordProtectsTargetsAndSystemTables test

Fixes: #135639
Fixes: #138066
Fixes: #137885
Fixes: #137505
Fixes: #136396
Fixes: #135805
Fixes: #135639

Release note: None

138697: crosscluster: add crdb_route parameter for LDR and PCR r=jeffswenson a=jeffswenson

The `crdb_route` query parameter determines how the destination
cluster's stream processor connects to the source cluster. There are two
options for the query parameter: "node" and "gateway". Here is an
example of using the route paraemeter to create an external connection
that is usable for LDR or PCR.

```SQL
-- A connection that routes all replication traffic via the configured
-- connection URI.
CREATE EXTERNAL CONNECTION 'external://source-db' AS
'postgresql://user:[email protected]:26257/sslmode=verify-full&crdb_route=gateway'

-- A connection that enumerates nodes in the source cluster and connects
-- directly to nodes.
CREATE EXTERNAL CONNECTION 'external://source-db' AS
'postgresql://user:[email protected]:26257/sslmode=verify-full&crdb_route=node'
```

The "node" option is the original and default behavior. The "node"
option requires the source and destination clusters to be in the same IP
network. The way it works is the connection string supplied to LDR and
PCR is used to connect to the source cluster and generate a physical sql
plan for the replication. The physical plan includes the
`--sql-addvertise-addr` for nodes in the source cluster and processors
in the destination cluster connect directly to the nodes. Using the
"node" routing is ideal because there are no extra network hops and the
source cluster can control how load is distributed across its nodes.

The "gateway" option is a new option that is introduced in order to
support routing PCR and LDR over a load balancer. When specified, the
destination cluster ignores the node addresses returned by the physical
plan and instead opens a connection for each processor to the URI
supplied by the user. This introduces an extra network hop and does not
distribute load as evenly, but it works in deployments where the source
cluster is only reachable over a load balancer.

Routing over a load balancer only requires changing the destination
clusters behavior. Nodes in the source cluster were always implemented to
act as a gateway and serve rangefeeds that are backed by data stored on
different nodes. This support exists so that the cross cluster
replication does not need to re-plan every time a range moves to a
different node.

Release note (sql change): LDR and PCR may use the `crdb_route=gateway`
query option to route the replication streams over a load balancer.

Epic: [CRDB-40896](https://cockroachlabs.atlassian.net/browse/CRDB-40896)

138877: opt: reduce allocations when filtering histogram buckets r=mgartner a=mgartner

`cat.HistogramBuckets` are now returned and passed by value in
`getFilteredBucket` and `(*Histogram).addBucket`, respectively,
eliminating some heap allocations.

Also, two allocations when building spans from buckets via the
`spanBuilder` have been combined into one. The new `(*spanBuilder).init`
method simplifies the API by no longer requiring that prefix datums are
passed to every invocation of `makeSpanFromBucket`. This also reduces
redundant copying of the prefix.

Epic: None

Release note: None


139029: sql/logictest: disable column family mutations in some cases r=mgartner a=mgartner

Random column family mutations are now disabled for `CREATE TABLE`
statements with unique, hash-sharded indexes. This prevents the AST
from being reserialized with a `UNIQUE` constraint with invalid options,
instead of the original `UNIQUE INDEX`. See #65929 and #107398.

Epic: None

Release note: None


139036: testutils,kvserver: add StartExecTrace and adopt in TestPromoteNonVoterInAddVoter r=tbg a=tbg

Every now and then we end up with tests that fail every once in a blue moon, and we can't reproduce at will.
#138864 was one of them, and execution traces helped a great deal.

This PR introduces a helper for unit tests that execution traces the test and keeps the trace on failure, and adopts it for one of these pesky unit tests.

The trace contains the goroutine ID in the filename. Additionally, the test's main goroutine is marked via a trace region. Sample below:

<img width="1226" alt="image" src="https://github.com/user-attachments/assets/3f641c28-64f7-4fba-9267-ddd48d8dda03" />

Closes #134383.

Epic: None
Release note: None


Co-authored-by: Aerin Freilich <[email protected]>
Co-authored-by: Miles Frankel <[email protected]>
Co-authored-by: Jeff Swenson <[email protected]>
Co-authored-by: Marcus Gartner <[email protected]>
Co-authored-by: Tobias Grieger <[email protected]>
InManuBytes pushed a commit to InManuBytes/cockroach that referenced this issue Jan 15, 2025
This work makes sure our nemesis tests for changefeeds randomize
over the options we use upon changefeed creation. This randomly adds
the key_in_value option (see below) and full_table_name option half
of the time and checks that the changefeed messages respect them in
the beforeAfter validator.

Note the following limitations: the full_table_name option, when on,
asserts that the topic in the output will be d.public.{table_name}
instead of checking for the actual name of the database/schema.

This change also does not add the key_in_value option when for the
webhook and cloudstorage sinks. Even before this change, since
key_in_value is on by default for those sinks, we remove the key
from the value in those testfeed messages for ease of testing.
Unfortunately, this makes these cases hard to test, so we leave them
out for now.

See also: cockroachdb#134119

Epic: CRDB-42866

Release note: None
kvoli pushed a commit to kvoli/cockroach that referenced this issue Jan 16, 2025
This work makes sure our nemesis tests for changefeeds randomize
over the options we use upon changefeed creation. This randomly adds
the key_in_value option (see below) and full_table_name option half
of the time and checks that the changefeed messages respect them in
the beforeAfter validator.

Note the following limitations: the full_table_name option, when on,
asserts that the topic in the output will be d.public.{table_name}
instead of checking for the actual name of the database/schema.

This change also does not add the key_in_value option when for the
webhook and cloudstorage sinks. Even before this change, since
key_in_value is on by default for those sinks, we remove the key
from the value in those testfeed messages for ease of testing.
Unfortunately, this makes these cases hard to test, so we leave them
out for now.

See also: cockroachdb#134119

Epic: CRDB-42866

Release note: None
mohini-crl pushed a commit to mohini-crl/cockroach that referenced this issue Jan 17, 2025
This work makes sure our nemesis tests for changefeeds randomize
over the options we use upon changefeed creation. This randomly adds
the key_in_value option (see below) and full_table_name option half
of the time and checks that the changefeed messages respect them in
the beforeAfter validator.

Note the following limitations: the full_table_name option, when on,
asserts that the topic in the output will be d.public.{table_name}
instead of checking for the actual name of the database/schema.

This change also does not add the key_in_value option when for the
webhook and cloudstorage sinks. Even before this change, since
key_in_value is on by default for those sinks, we remove the key
from the value in those testfeed messages for ease of testing.
Unfortunately, this makes these cases hard to test, so we leave them
out for now.

See also: cockroachdb#134119

Epic: CRDB-42866

Release note: None
aerfrei added a commit to aerfrei/cockroach that referenced this issue Jan 17, 2025
…ting

This work adds to random changefeed settings we pass into changefeeds
in nemesis testing. This commit adds support for: changefeeds without
the diff option specified, the mvcc_timestamp option and the kafka and
pubsub sink configs.

See also: cockroachdb#134119

Epic: CRDB-42866

Release note: None
aerfrei added a commit to aerfrei/cockroach that referenced this issue Jan 21, 2025
…ting

This work adds to random changefeed settings we pass into changefeeds
in nemesis testing. This commit adds support for: changefeeds without
the diff option specified, the mvcc_timestamp option and the kafka and
pubsub sink configs.

See also: cockroachdb#134119

Epic: CRDB-42866

Release note: None
aerfrei added a commit to aerfrei/cockroach that referenced this issue Jan 21, 2025
…ting

This work adds to random changefeed settings we pass into changefeeds
in nemesis testing. This commit adds support for: changefeeds without
the diff option specified, the mvcc_timestamp option and the kafka and
pubsub sink configs.

See also: cockroachdb#134119

Epic: CRDB-42866

Release note: None
aerfrei added a commit to aerfrei/cockroach that referenced this issue Jan 22, 2025
…ting

This work adds to random changefeed settings we pass into changefeeds
in nemesis testing. This commit adds support for: changefeeds without
the diff option specified, the mvcc_timestamp option and the kafka and
pubsub sink configs.

See also: cockroachdb#134119

Epic: CRDB-42866

Release note: None
aerfrei added a commit to aerfrei/cockroach that referenced this issue Jan 22, 2025
…ting

This work adds to random changefeed settings we pass into changefeeds
in nemesis testing. This commit adds support for: changefeeds without
the diff option specified, the mvcc_timestamp option and the kafka and
pubsub sink configs.

See also: cockroachdb#134119

Epic: CRDB-42866

Release note: None
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-cdc Change Data Capture C-enhancement Solution expected to add code/behavior + preserve backward-compat (pg compat issues are exception) T-cdc
Projects
None yet
Development

No branches or pull requests

2 participants