-
Notifications
You must be signed in to change notification settings - Fork 3.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ccl/changeedccl: Add changefeed options into nemesis tests #137947
Conversation
570a234
to
08d12f6
Compare
08d12f6
to
abdb56f
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Adding some notes about some of the decisions I made here:
-
We choose to generate only valid changefeed options as opposed to randomly generating each option independently and retrying if they're not compatible with each other. This helps us a) make sure we have sufficient likelihood of getting a valid changefeed and b) we don't have to distinguish retryable errors from concerning errors
-
We changed the NoteRow signature since otherwise none of the NoteRow arguments contain the table name. Seems preferable to moving validation logic into noteFeedMessage. I don't see another viable option here.
-
Use testName in RunNemesis in order to determine the sink type. This was already being done to compute isSinkless, isCloudstorage in nemesis_test file (to disable `eventPause` in sinkless feeds and to randomly test parquet format for cloudstorage respectively) before this change. Now, that logic is moved into the RunNemesis function itself and to ensure we don’t set key_in_value for webhook and cloudstorage (as mentioned above). This seems messy but also seems to be the only way the sink type is available to us inside the test. The sink type will also be needed for other options that depend on the sink type: kafka_sink_config / pubsub_sink_config / webhook_sink_config, format (certain formats only compatible with certain sinks)
For this project more generally I'm keeping track of some known limitations here (these are mentioned in the initial comment).
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @dt and @wenyihu6)
3128c32
to
c70abab
Compare
)) | ||
} | ||
} else { | ||
if topic != v.table { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
follow up: we can add more test coverage here for topic_name
and when topic names are not table names
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When does that happen? Are there certain settings or options that trigger that?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
They are query parameters - https://www.cockroachlabs.com/docs/stable/create-changefeed#query-parameters.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice work!
// the key in the value is extracted and removed from the test feed | ||
// messages (see extractKeyFromJSONValue function). | ||
// TODO: enable testing key_in_value for cloudstorage and webhook sinks | ||
KeyInValue: !isCloudstorage && !isWebhook && rand.Intn(2) < 1, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: can we link the limitation github issue you created here in side todo
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Adding that now
@@ -33,10 +79,15 @@ var NemesesOptions = []NemesesOption{ | |||
EnableFpValidator: false, | |||
EnableSQLSmith: true, | |||
}, | |||
{ | |||
EnableFpValidator: false, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this needed here already? I thought we wanted this just for initial_scan = no / only?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm removing this from the PR
@@ -95,7 +95,7 @@ func (sv *streamClientValidator) noteRow( | |||
) error { | |||
sv.mu.Lock() | |||
defer sv.mu.Unlock() | |||
return sv.NoteRow(partition, key, value, updated) | |||
return sv.NoteRow(partition, key, value, updated, "") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
q: why is their topic empty?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe there is a better value to put here but this is what I was thinking. 1) This test is doing something pretty different with the validators. I don't think we're in the context of a changefeed on a normal table as far as I can tell. I would like to understand what's happening here better. 2) This validator is not a before/after validator so it will not be using the topic value anyway.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see, it seems fine leaving as it is but perhaps we can add another item to our follow up issue to add a new validator for its sole purposes being validating topic names and other things and choose to opt out that validator here. wdyt?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @dt and @wenyihu6)
// the key in the value is extracted and removed from the test feed | ||
// messages (see extractKeyFromJSONValue function). | ||
// TODO: enable testing key_in_value for cloudstorage and webhook sinks | ||
KeyInValue: !isCloudstorage && !isWebhook && rand.Intn(2) < 1, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Adding that now
@@ -33,10 +79,15 @@ var NemesesOptions = []NemesesOption{ | |||
EnableFpValidator: false, | |||
EnableSQLSmith: true, | |||
}, | |||
{ | |||
EnableFpValidator: false, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm removing this from the PR
)) | ||
} | ||
} else { | ||
if topic != v.table { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When does that happen? Are there certain settings or options that trigger that?
@@ -95,7 +95,7 @@ func (sv *streamClientValidator) noteRow( | |||
) error { | |||
sv.mu.Lock() | |||
defer sv.mu.Unlock() | |||
return sv.NoteRow(partition, key, value, updated) | |||
return sv.NoteRow(partition, key, value, updated, "") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe there is a better value to put here but this is what I was thinking. 1) This test is doing something pretty different with the validators. I don't think we're in the context of a changefeed on a normal table as far as I can tell. I would like to understand what's happening here better. 2) This validator is not a before/after validator so it will not be using the topic value anyway.
c70abab
to
c9cafc0
Compare
)) | ||
} | ||
} else { | ||
if topic != v.table { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
They are query parameters - https://www.cockroachlabs.com/docs/stable/create-changefeed#query-parameters.
@@ -95,7 +95,7 @@ func (sv *streamClientValidator) noteRow( | |||
) error { | |||
sv.mu.Lock() | |||
defer sv.mu.Unlock() | |||
return sv.NoteRow(partition, key, value, updated) | |||
return sv.NoteRow(partition, key, value, updated, "") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see, it seems fine leaving as it is but perhaps we can add another item to our follow up issue to add a new validator for its sole purposes being validating topic names and other things and choose to opt out that validator here. wdyt?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have some drive-by minor nits. Nice work!
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @aerfrei, @dt, and @wenyihu6)
pkg/ccl/changefeedccl/cdctest/nemeses.go
line 120 at r2 (raw file):
eventPauseCount := 10 // TODO(dan): Ugly hack to disable `eventPause` in sinkless feeds.
nit: Does this need the TODO? If so, is there an issue we can reference here instead?
pkg/ccl/changefeedccl/cdctest/nemeses.go
line 253 at r2 (raw file):
cfo := newChangefeedOption(testName) log.Infof(ctx, "Using changefeed options: %s", cfo.String())
optional nit: put the entire CREATE CHANGEFEED
statement in a var, log it here, then use it below.
pkg/crosscluster/physical/replication_random_client_test.go
line 98 at r2 (raw file):
sv.mu.Lock() defer sv.mu.Unlock() return sv.NoteRow(partition, key, value, updated, "")
nit: add the parameter name in a comment following empty parameters, i.e. "" /* topic */
pkg/ccl/changefeedccl/cdctest/validator.go
line 226 at r2 (raw file):
) error { if v.fullTableName { // TODO: fetch the actual database and schema name for the full table name
For TODOs, please either add an issue # for future follow-up or, if expected to fast follow, add your name. We discourage plain TODOs since it's harder to find the context or follow up on them.
This work makes sure our nemesis tests for changefeeds randomize over the options we use upon changefeed creation. This randomly adds the key_in_value option (see below) and full_table_name option half of the time and checks that the changefeed messages respect them in the beforeAfter validator. Note the following limitations: the full_table_name option, when on, asserts that the topic in the output will be d.public.{table_name} instead of checking for the actual name of the database/schema. This change also does not add the key_in_value option when for the webhook and cloudstorage sinks. Even before this change, since key_in_value is on by default for those sinks, we remove the key from the value in those testfeed messages for ease of testing. Unfortunately, this makes these cases hard to test, so we leave them out for now. See also: cockroachdb#134119 Epic: CRDB-42866 Release note: None
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @dt, @rharding6373, and @wenyihu6)
pkg/ccl/changefeedccl/cdctest/nemeses.go
line 120 at r2 (raw file):
Previously, rharding6373 (Rachael Harding) wrote…
nit: Does this need the TODO? If so, is there an issue we can reference here instead?
I'm removing the TODO in favor of just having this issue. #139035. I didn't want to overload us with issues but it seems like that's a good way to keep track of some of these loose ends. Does that sound right?
pkg/ccl/changefeedccl/cdctest/validator.go
line 226 at r2 (raw file):
Previously, rharding6373 (Rachael Harding) wrote…
For TODOs, please either add an issue # for future follow-up or, if expected to fast follow, add your name. We discourage plain TODOs since it's harder to find the context or follow up on them.
Removing this TODO in favor of keeping track of this with an issue as well. #139033
pkg/crosscluster/physical/replication_random_client_test.go
line 98 at r2 (raw file):
Previously, rharding6373 (Rachael Harding) wrote…
nit: add the parameter name in a comment following empty parameters, i.e. "" /* topic */
Done, thanks!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @dt, @rharding6373, and @wenyihu6)
pkg/ccl/changefeedccl/cdctest/nemeses.go
line 253 at r2 (raw file):
Previously, rharding6373 (Rachael Harding) wrote…
optional nit: put the entire
CREATE CHANGEFEED
statement in a var, log it here, then use it below.
Done.
c9cafc0
to
7d9b214
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @aerfrei, @dt, and @wenyihu6)
pkg/ccl/changefeedccl/cdctest/nemeses.go
line 120 at r2 (raw file):
Previously, aerfrei (Aerin Freilich) wrote…
I'm removing the TODO in favor of just having this issue. #139035. I didn't want to overload us with issues but it seems like that's a good way to keep track of some of these loose ends. Does that sound right?
You could add TODO(#139035): <short text here about what needs to change>
so this piece of code is searchable using the issue. Same with the TODO you added below.
If we ever feel like we're overloading ourselves with issues, we can find a way to adjust this policy.
bors r=wenyihu6 |
This patch skips the cdc/bank test, as it has been broken since cockroachdb#137947. This patch disables the test while we work on the fix. Release note: none Epic: none
This patch skips the cdc/bank test, as it has been broken since cockroachdb#137947. This patch disables the test while we work on the fix. Release note: none Epic: none
137743: stop: grow stacks for async tasks r=tbg a=tbg Surprisingly, this seems to have helped with alloc/op, but not with CPU time, at least in the microbenchmark. ``` name old time/op new time/op delta Sysbench/SQL/3node/oltp_read_write-24 11.6ms ± 1% 11.6ms ± 2% ~ (p=0.201 n=14+15) name old alloc/op new alloc/op delta Sysbench/SQL/3node/oltp_read_write-24 2.20MB ± 3% 2.16MB ± 1% -1.88% (p=0.047 n=15+12) name old allocs/op new allocs/op delta Sysbench/SQL/3node/oltp_read_write-24 11.0k ± 2% 10.8k ± 0% ~ (p=0.069 n=15+12) ``` End-to-end testing via the `sysbench-settings` roachtest (10 runs) shows a significant improvement in qps[^1]: | max | mean | median | % change max | % change mean | % change median -- | -- | -- | -- | -- | -- | -- old | 23673.17 | 22775.40 | 22684.17 | | | new | 23955.21 | 23154.03 | 23256.25 | 1.19% | 1.66% | 2.52% [^1]: https://docs.google.com/spreadsheets/d/1QUBZHllhk5CtDfcsJqc5eOIqUO4jVdhoYXMWj4CRQaQ/edit?gid=0#gid=0 Closes #130663. Epic: CRDB-43584 139327: roachtest/cdc: skip cdc/bank r=rharding6373 a=wenyihu6 This patch skips the cdc/bank test, as it has been broken since #137947. This patch disables the test while we work on the fix. Informs: #139109 Release note: none Epic: none Co-authored-by: Tobias Grieger <[email protected]> Co-authored-by: Wenyi Hu <[email protected]>
138372: roachtest: task error groups r=DarrylWong a=herkolategan Previously, the wait call on a task group would not return anything and any error, that occurred in a task, would be reported to the test framework and fail the test. This change adds functionality to allow test authors more control and create an error group if it is necessary to wait on errors, and not have the test framework handle the errors directly. A context can also now be specified to override the default context passed by the test framework for the task manager. Informs: #118214 Epic: None Release note: None 139327: roachtest/cdc: skip cdc/bank r=rharding6373 a=wenyihu6 This patch skips the cdc/bank test, as it has been broken since #137947. This patch disables the test while we work on the fix. Informs: #139109 Release note: none Epic: none Co-authored-by: Herko Lategan <[email protected]> Co-authored-by: Wenyi Hu <[email protected]>
139026: licences: update THIRD-PARTY-NOTICES.txt r=celiala a=rail Fixes: REL-1744 Release note: None 139192: roachtest: disable 4K block size intent resolution test r=pav-kv a=andrewbaptist With a 4K block size, the intent resolution test will cause unacceptable slowdowns. This commit changes the perturbation/*/intents to only test up to 1024 block sizes. This also reduces the max perturbation duration to 10 minutes. Informs: #139187 Informs: #139188 Informs: #137590 Informs: #135934 Fixes: #137590 Epic: none Release note: None 139200: roachprod: Create/Destroy should avoid listing _all_ providers r=golgeek,RaduBerinde a=srosenberg Previously, both `Create` and `Destroy` would attempt to list VMs across _all_ active cloud providers. Not only is it inefficient, but `Create` may also fail if the user isn't re-authenticated to an unrelated provider. E.g., `create --clouds gce` may fail if AWS SSO token expired. This change derives the set of required providers from either the user-specified `--clouds` CLI option or the local cluster cache. The set is then used with `ListCloud` to skip listing unrelated providers. The use of the local cache is sound at this time; cluster's providers are immutable. Epic: none Release note: None 139246: ui: show job messages on the job detail page r=dt a=dt Release note (ui change): Jobs can now choose to emit messages that are shown on the job detail page in 25.1+. Epic: none. <img width="1513" alt="Screenshot 2025-01-16 at 14 46 12" src="https://github.com/user-attachments/assets/1718ccdd-e8a1-4ec2-a032-a646f410f918" /> 139327: roachtest/cdc: skip cdc/bank r=rharding6373 a=wenyihu6 This patch skips the cdc/bank test, as it has been broken since #137947. This patch disables the test while we work on the fix. Informs: #139109 Release note: none Epic: none Co-authored-by: Rail Aliiev <[email protected]> Co-authored-by: Andrew Baptist <[email protected]> Co-authored-by: Stan Rosenberg <[email protected]> Co-authored-by: David Taylor <[email protected]> Co-authored-by: Wenyi Hu <[email protected]>
This patch skips the cdc/bank test, as it has been broken since #137947. This patch disables the test while we work on the fix. Release note: none Epic: none
This work makes sure our nemesis tests for changefeeds randomize
over the options we use upon changefeed creation. This randomly adds
the key_in_value option (see below) and full_table_name option half
of the time and checks that the changefeed messages respect them in
the beforeAfter validator.
Note the following limitations: the full_table_name option, when on,
asserts that the topic in the output will be d.public.{table_name}
instead of checking for the actual name of the database/schema.
This change also does not add the key_in_value option when for the
webhook and cloudstorage sinks. Even before this change, since
key_in_value is on by default for those sinks, we remove the key
from the value in those testfeed messages for ease of testing.
Unfortunately, this makes these cases hard to test, so we leave them
out for now.
See also: #134119
Epic: CRDB-42866
Release note: None