ccl/changeedccl: Add changefeed options into nemesis tests #137947

aerfrei · 2024-12-23T20:27:18Z

This work makes sure our nemesis tests for changefeeds randomize
over the options we use upon changefeed creation. This randomly adds
the key_in_value option (see below) and full_table_name option half
of the time and checks that the changefeed messages respect them in
the beforeAfter validator.

Note the following limitations: the full_table_name option, when on,
asserts that the topic in the output will be d.public.{table_name}
instead of checking for the actual name of the database/schema.

This change also does not add the key_in_value option when for the
webhook and cloudstorage sinks. Even before this change, since
key_in_value is on by default for those sinks, we remove the key
from the value in those testfeed messages for ease of testing.
Unfortunately, this makes these cases hard to test, so we leave them
out for now.

See also: #134119

Epic: CRDB-42866

Release note: None

cockroach-teamcity · 2024-12-23T20:27:26Z

This change is

aerfrei

Adding some notes about some of the decisions I made here:

We choose to generate only valid changefeed options as opposed to randomly generating each option independently and retrying if they're not compatible with each other. This helps us a) make sure we have sufficient likelihood of getting a valid changefeed and b) we don't have to distinguish retryable errors from concerning errors
We changed the NoteRow signature since otherwise none of the NoteRow arguments contain the table name. Seems preferable to moving validation logic into noteFeedMessage. I don't see another viable option here.
Use testName in RunNemesis in order to determine the sink type. This was already being done to compute isSinkless, isCloudstorage in nemesis_test file (to disable `eventPause` in sinkless feeds and to randomly test parquet format for cloudstorage respectively) before this change. Now, that logic is moved into the RunNemesis function itself and to ensure we don’t set key_in_value for webhook and cloudstorage (as mentioned above). This seems messy but also seems to be the only way the sink type is available to us inside the test. The sink type will also be needed for other options that depend on the sink type: kafka_sink_config / pubsub_sink_config / webhook_sink_config, format (certain formats only compatible with certain sinks)

For this project more generally I'm keeping track of some known limitations here (these are mentioned in the initial comment).

Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @dt and @wenyihu6)

wenyihu6 · 2025-01-08T20:45:01Z

pkg/ccl/changefeedccl/cdctest/validator.go

+			))
+		}
+	} else {
+		if topic != v.table {


follow up: we can add more test coverage here for topic_name and when topic names are not table names

When does that happen? Are there certain settings or options that trigger that?

They are query parameters - https://www.cockroachlabs.com/docs/stable/create-changefeed#query-parameters.

wenyihu6

Nice work!

wenyihu6 · 2025-01-08T20:45:49Z

pkg/ccl/changefeedccl/cdctest/nemeses.go

+		// the key in the value is extracted and removed from the test feed
+		// messages (see extractKeyFromJSONValue function).
+		// TODO: enable testing key_in_value for cloudstorage and webhook sinks
+		KeyInValue: !isCloudstorage && !isWebhook && rand.Intn(2) < 1,


nit: can we link the limitation github issue you created here in side todo

Adding that now

wenyihu6 · 2025-01-08T20:47:29Z

pkg/ccl/changefeedccl/cdctest/nemeses.go

@@ -33,10 +79,15 @@ var NemesesOptions = []NemesesOption{
 		EnableFpValidator: false,
 		EnableSQLSmith:    true,
 	},
+	{
+		EnableFpValidator: false,


Is this needed here already? I thought we wanted this just for initial_scan = no / only?

I'm removing this from the PR

wenyihu6 · 2025-01-08T20:48:54Z

pkg/crosscluster/physical/replication_random_client_test.go

@@ -95,7 +95,7 @@ func (sv *streamClientValidator) noteRow(
 ) error {
 	sv.mu.Lock()
 	defer sv.mu.Unlock()
-	return sv.NoteRow(partition, key, value, updated)
+	return sv.NoteRow(partition, key, value, updated, "")


q: why is their topic empty?

Maybe there is a better value to put here but this is what I was thinking. 1) This test is doing something pretty different with the validators. I don't think we're in the context of a changefeed on a normal table as far as I can tell. I would like to understand what's happening here better. 2) This validator is not a before/after validator so it will not be using the topic value anyway.

I see, it seems fine leaving as it is but perhaps we can add another item to our follow up issue to add a new validator for its sole purposes being validating topic names and other things and choose to opt out that validator here. wdyt?

aerfrei

Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @dt and @wenyihu6)

aerfrei · 2025-01-09T16:17:23Z

pkg/ccl/changefeedccl/cdctest/nemeses.go

+		// the key in the value is extracted and removed from the test feed
+		// messages (see extractKeyFromJSONValue function).
+		// TODO: enable testing key_in_value for cloudstorage and webhook sinks
+		KeyInValue: !isCloudstorage && !isWebhook && rand.Intn(2) < 1,


Adding that now

aerfrei · 2025-01-09T16:17:23Z

pkg/ccl/changefeedccl/cdctest/nemeses.go

@@ -33,10 +79,15 @@ var NemesesOptions = []NemesesOption{
 		EnableFpValidator: false,
 		EnableSQLSmith:    true,
 	},
+	{
+		EnableFpValidator: false,


I'm removing this from the PR

aerfrei · 2025-01-09T16:17:23Z

pkg/ccl/changefeedccl/cdctest/validator.go

+			))
+		}
+	} else {
+		if topic != v.table {


When does that happen? Are there certain settings or options that trigger that?

aerfrei · 2025-01-09T16:17:24Z

pkg/crosscluster/physical/replication_random_client_test.go

@@ -95,7 +95,7 @@ func (sv *streamClientValidator) noteRow(
 ) error {
 	sv.mu.Lock()
 	defer sv.mu.Unlock()
-	return sv.NoteRow(partition, key, value, updated)
+	return sv.NoteRow(partition, key, value, updated, "")


Maybe there is a better value to put here but this is what I was thinking. 1) This test is doing something pretty different with the validators. I don't think we're in the context of a changefeed on a normal table as far as I can tell. I would like to understand what's happening here better. 2) This validator is not a before/after validator so it will not be using the topic value anyway.

wenyihu6 · 2025-01-13T15:25:01Z

pkg/ccl/changefeedccl/cdctest/validator.go

+			))
+		}
+	} else {
+		if topic != v.table {


They are query parameters - https://www.cockroachlabs.com/docs/stable/create-changefeed#query-parameters.

wenyihu6 · 2025-01-13T15:26:11Z

pkg/crosscluster/physical/replication_random_client_test.go

@@ -95,7 +95,7 @@ func (sv *streamClientValidator) noteRow(
 ) error {
 	sv.mu.Lock()
 	defer sv.mu.Unlock()
-	return sv.NoteRow(partition, key, value, updated)
+	return sv.NoteRow(partition, key, value, updated, "")


I see, it seems fine leaving as it is but perhaps we can add another item to our follow up issue to add a new validator for its sole purposes being validating topic names and other things and choose to opt out that validator here. wdyt?

rharding6373

I have some drive-by minor nits. Nice work!

Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @aerfrei, @dt, and @wenyihu6)

pkg/ccl/changefeedccl/cdctest/nemeses.go line 120 at r2 (raw file):

	eventPauseCount := 10

	// TODO(dan): Ugly hack to disable `eventPause` in sinkless feeds.

nit: Does this need the TODO? If so, is there an issue we can reference here instead?

pkg/ccl/changefeedccl/cdctest/nemeses.go line 253 at r2 (raw file):

	cfo := newChangefeedOption(testName)
	log.Infof(ctx, "Using changefeed options: %s", cfo.String())

optional nit: put the entire CREATE CHANGEFEED statement in a var, log it here, then use it below.

pkg/crosscluster/physical/replication_random_client_test.go line 98 at r2 (raw file):

	sv.mu.Lock()
	defer sv.mu.Unlock()
	return sv.NoteRow(partition, key, value, updated, "")

nit: add the parameter name in a comment following empty parameters, i.e. "" /* topic */

pkg/ccl/changefeedccl/cdctest/validator.go line 226 at r2 (raw file):

) error {
	if v.fullTableName {
		// TODO: fetch the actual database and schema name for the full table name

For TODOs, please either add an issue # for future follow-up or, if expected to fast follow, add your name. We discourage plain TODOs since it's harder to find the context or follow up on them.

This work makes sure our nemesis tests for changefeeds randomize over the options we use upon changefeed creation. This randomly adds the key_in_value option (see below) and full_table_name option half of the time and checks that the changefeed messages respect them in the beforeAfter validator. Note the following limitations: the full_table_name option, when on, asserts that the topic in the output will be d.public.{table_name} instead of checking for the actual name of the database/schema. This change also does not add the key_in_value option when for the webhook and cloudstorage sinks. Even before this change, since key_in_value is on by default for those sinks, we remove the key from the value in those testfeed messages for ease of testing. Unfortunately, this makes these cases hard to test, so we leave them out for now. See also: cockroachdb#134119 Epic: CRDB-42866 Release note: None

aerfrei

Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @dt, @rharding6373, and @wenyihu6)

pkg/ccl/changefeedccl/cdctest/nemeses.go line 120 at r2 (raw file):

Previously, rharding6373 (Rachael Harding) wrote…

nit: Does this need the TODO? If so, is there an issue we can reference here instead?

I'm removing the TODO in favor of just having this issue. #139035. I didn't want to overload us with issues but it seems like that's a good way to keep track of some of these loose ends. Does that sound right?

pkg/ccl/changefeedccl/cdctest/validator.go line 226 at r2 (raw file):

Previously, rharding6373 (Rachael Harding) wrote…

For TODOs, please either add an issue # for future follow-up or, if expected to fast follow, add your name. We discourage plain TODOs since it's harder to find the context or follow up on them.

Removing this TODO in favor of keeping track of this with an issue as well. #139033

pkg/crosscluster/physical/replication_random_client_test.go line 98 at r2 (raw file):

Previously, rharding6373 (Rachael Harding) wrote…

nit: add the parameter name in a comment following empty parameters, i.e. "" /* topic */

Done, thanks!

aerfrei

Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @dt, @rharding6373, and @wenyihu6)

pkg/ccl/changefeedccl/cdctest/nemeses.go line 253 at r2 (raw file):

Previously, rharding6373 (Rachael Harding) wrote…

optional nit: put the entire CREATE CHANGEFEED statement in a var, log it here, then use it below.

Done.

rharding6373

Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @aerfrei, @dt, and @wenyihu6)

pkg/ccl/changefeedccl/cdctest/nemeses.go line 120 at r2 (raw file):

Previously, aerfrei (Aerin Freilich) wrote…

I'm removing the TODO in favor of just having this issue. #139035. I didn't want to overload us with issues but it seems like that's a good way to keep track of some of these loose ends. Does that sound right?

You could add TODO(#139035): <short text here about what needs to change> so this piece of code is searchable using the issue. Same with the TODO you added below.

If we ever feel like we're overloading ourselves with issues, we can find a way to adjust this policy.

aerfrei · 2025-01-14T20:46:34Z

bors r=wenyihu6

craig · 2025-01-14T21:42:42Z

Build succeeded:

This patch skips the cdc/bank test, as it has been broken since cockroachdb#137947. This patch disables the test while we work on the fix. Release note: none Epic: none

137743: stop: grow stacks for async tasks r=tbg a=tbg Surprisingly, this seems to have helped with alloc/op, but not with CPU time, at least in the microbenchmark. ``` name old time/op new time/op delta Sysbench/SQL/3node/oltp_read_write-24 11.6ms ± 1% 11.6ms ± 2% ~ (p=0.201 n=14+15) name old alloc/op new alloc/op delta Sysbench/SQL/3node/oltp_read_write-24 2.20MB ± 3% 2.16MB ± 1% -1.88% (p=0.047 n=15+12) name old allocs/op new allocs/op delta Sysbench/SQL/3node/oltp_read_write-24 11.0k ± 2% 10.8k ± 0% ~ (p=0.069 n=15+12) ``` End-to-end testing via the `sysbench-settings` roachtest (10 runs) shows a significant improvement in qps[^1]: | max | mean | median | % change max | % change mean | % change median -- | -- | -- | -- | -- | -- | -- old | 23673.17 | 22775.40 | 22684.17 | | | new | 23955.21 | 23154.03 | 23256.25 | 1.19% | 1.66% | 2.52% [^1]: https://docs.google.com/spreadsheets/d/1QUBZHllhk5CtDfcsJqc5eOIqUO4jVdhoYXMWj4CRQaQ/edit?gid=0#gid=0 Closes #130663. Epic: CRDB-43584 139327: roachtest/cdc: skip cdc/bank r=rharding6373 a=wenyihu6 This patch skips the cdc/bank test, as it has been broken since #137947. This patch disables the test while we work on the fix. Informs: #139109 Release note: none Epic: none Co-authored-by: Tobias Grieger <[email protected]> Co-authored-by: Wenyi Hu <[email protected]>

138372: roachtest: task error groups r=DarrylWong a=herkolategan Previously, the wait call on a task group would not return anything and any error, that occurred in a task, would be reported to the test framework and fail the test. This change adds functionality to allow test authors more control and create an error group if it is necessary to wait on errors, and not have the test framework handle the errors directly. A context can also now be specified to override the default context passed by the test framework for the task manager. Informs: #118214 Epic: None Release note: None 139327: roachtest/cdc: skip cdc/bank r=rharding6373 a=wenyihu6 This patch skips the cdc/bank test, as it has been broken since #137947. This patch disables the test while we work on the fix. Informs: #139109 Release note: none Epic: none Co-authored-by: Herko Lategan <[email protected]> Co-authored-by: Wenyi Hu <[email protected]>

139026: licences: update THIRD-PARTY-NOTICES.txt r=celiala a=rail Fixes: REL-1744 Release note: None 139192: roachtest: disable 4K block size intent resolution test r=pav-kv a=andrewbaptist With a 4K block size, the intent resolution test will cause unacceptable slowdowns. This commit changes the perturbation/*/intents to only test up to 1024 block sizes. This also reduces the max perturbation duration to 10 minutes. Informs: #139187 Informs: #139188 Informs: #137590 Informs: #135934 Fixes: #137590 Epic: none Release note: None 139200: roachprod: Create/Destroy should avoid listing _all_ providers r=golgeek,RaduBerinde a=srosenberg Previously, both `Create` and `Destroy` would attempt to list VMs across _all_ active cloud providers. Not only is it inefficient, but `Create` may also fail if the user isn't re-authenticated to an unrelated provider. E.g., `create --clouds gce` may fail if AWS SSO token expired. This change derives the set of required providers from either the user-specified `--clouds` CLI option or the local cluster cache. The set is then used with `ListCloud` to skip listing unrelated providers. The use of the local cache is sound at this time; cluster's providers are immutable. Epic: none Release note: None 139246: ui: show job messages on the job detail page r=dt a=dt Release note (ui change): Jobs can now choose to emit messages that are shown on the job detail page in 25.1+. Epic: none. <img width="1513" alt="Screenshot 2025-01-16 at 14 46 12" src="https://github.com/user-attachments/assets/1718ccdd-e8a1-4ec2-a032-a646f410f918" /> 139327: roachtest/cdc: skip cdc/bank r=rharding6373 a=wenyihu6 This patch skips the cdc/bank test, as it has been broken since #137947. This patch disables the test while we work on the fix. Informs: #139109 Release note: none Epic: none Co-authored-by: Rail Aliiev <[email protected]> Co-authored-by: Andrew Baptist <[email protected]> Co-authored-by: Stan Rosenberg <[email protected]> Co-authored-by: David Taylor <[email protected]> Co-authored-by: Wenyi Hu <[email protected]>

This patch skips the cdc/bank test, as it has been broken since #137947. This patch disables the test while we work on the fix. Release note: none Epic: none

aerfrei force-pushed the aerin/nemeses-ftn_and_kiv branch from 570a234 to 08d12f6 Compare December 23, 2024 22:42

aerfrei force-pushed the aerin/nemeses-ftn_and_kiv branch from 08d12f6 to abdb56f Compare January 7, 2025 15:51

aerfrei marked this pull request as ready for review January 7, 2025 16:26

aerfrei requested review from a team as code owners January 7, 2025 16:26

aerfrei requested review from dt and wenyihu6 and removed request for a team January 7, 2025 16:26

aerfrei commented Jan 7, 2025

View reviewed changes

aerfrei force-pushed the aerin/nemeses-ftn_and_kiv branch 2 times, most recently from 3128c32 to c70abab Compare January 7, 2025 21:13

wenyihu6 reviewed Jan 8, 2025

View reviewed changes

wenyihu6 approved these changes Jan 8, 2025

View reviewed changes

aerfrei commented Jan 9, 2025

View reviewed changes

aerfrei force-pushed the aerin/nemeses-ftn_and_kiv branch from c70abab to c9cafc0 Compare January 9, 2025 16:17

aerfrei requested a review from wenyihu6 January 13, 2025 15:23

wenyihu6 approved these changes Jan 13, 2025

View reviewed changes

rharding6373 reviewed Jan 13, 2025

View reviewed changes

aerfrei commented Jan 14, 2025

View reviewed changes

aerfrei force-pushed the aerin/nemeses-ftn_and_kiv branch from c9cafc0 to 7d9b214 Compare January 14, 2025 15:55

rharding6373 reviewed Jan 14, 2025

View reviewed changes

craig bot merged commit 9d7b5cb into cockroachdb:master Jan 14, 2025
22 checks passed

wenyihu6 mentioned this pull request Jan 15, 2025

roachtest: cdc/bank failed #139109

Closed

mohini-crl mentioned this pull request Jan 17, 2025

[Replicated] ccl/changeedccl: Add changefeed options into nemesis tests mohini-crl/cockroach#173

Merged

wenyihu6 mentioned this pull request Jan 17, 2025

roachtest/cdc: skip cdc/bank #139327

Merged

blathers-crl bot pushed a commit that referenced this pull request Jan 17, 2025

roachtest/cdc: skip cdc/bank

09ba7ad

This patch skips the cdc/bank test, as it has been broken since #137947. This patch disables the test while we work on the fix. Release note: none Epic: none

blathers-crl bot mentioned this pull request Jan 17, 2025

release-25.1: roachtest/cdc: skip cdc/bank #139357

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ccl/changeedccl: Add changefeed options into nemesis tests #137947

ccl/changeedccl: Add changefeed options into nemesis tests #137947

aerfrei commented Dec 23, 2024 •

edited

Loading

cockroach-teamcity commented Dec 23, 2024

aerfrei left a comment

wenyihu6 Jan 8, 2025

aerfrei Jan 9, 2025

wenyihu6 Jan 13, 2025

wenyihu6 left a comment

wenyihu6 Jan 8, 2025

aerfrei Jan 9, 2025

wenyihu6 Jan 8, 2025

aerfrei Jan 9, 2025

wenyihu6 Jan 8, 2025

aerfrei Jan 9, 2025

wenyihu6 Jan 13, 2025

aerfrei left a comment

aerfrei Jan 9, 2025

aerfrei Jan 9, 2025

aerfrei Jan 9, 2025

aerfrei Jan 9, 2025

wenyihu6 Jan 13, 2025

wenyihu6 Jan 13, 2025

rharding6373 left a comment

aerfrei left a comment

aerfrei left a comment

rharding6373 left a comment

aerfrei commented Jan 14, 2025

craig bot commented Jan 14, 2025

ccl/changeedccl: Add changefeed options into nemesis tests #137947

ccl/changeedccl: Add changefeed options into nemesis tests #137947

Conversation

aerfrei commented Dec 23, 2024 • edited Loading

cockroach-teamcity commented Dec 23, 2024

aerfrei left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

wenyihu6 left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

aerfrei left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

rharding6373 left a comment

Choose a reason for hiding this comment

aerfrei left a comment

Choose a reason for hiding this comment

aerfrei left a comment

Choose a reason for hiding this comment

rharding6373 left a comment

Choose a reason for hiding this comment

aerfrei commented Jan 14, 2025

craig bot commented Jan 14, 2025

aerfrei commented Dec 23, 2024 •

edited

Loading