-
Notifications
You must be signed in to change notification settings - Fork 3.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
cdc: Unskip TestChangefeedNodeShutdown #32232
Comments
I've been heads-down on this for a few days, here's what I've been working towards:
A third angle of verification would be to verify the replication history; that is, make sure replica counts make sense, replicas are spread evenly, and that there isn't excess replication. This might be possible through time series, though, if we are unconcerned with specifics. However, I am leaving this out of my V1 product. |
@danhhz can you clarify if this test provides anything in addition to the cdc-chaos roachtests? This sounds like we're mostly shaving yaks. |
My intention was to replace the cdc/crdb-chaos roachtest. Roachtests are necessary for testing large and long-running things, but when something can be a TestCluster-based unit test, I always prefer that. Unit tests are certainly easier while developing something, but in this case it's mostly because Roachtests flake at a dramatically higher rate, and when that ongoing load can be avoided, I'd like to. |
Also just remembered that I'd like to solve some of these anyway. I'm about to start working on a jepsen style test for changefeeds. The overall correctness of changefeeds is dependent on them maintaining our user-guaranteed invariants across anything we throw at it: crdb chaos, sink chaos, schema changes, job pause/resume, etc. There are a huge number of moving pieces that all have to work together exactly right and this is currently under-tested. Being able to do this in a go test with TestCluster would be huge. |
I agree that it's worthwhile but my main concern is that getting these tests to run fast enough to run on every CI is not worth the hacks. They seem to live at a stage between roachtests and unit tests, where we want something that's really easy to run but we'd rather not run it in CI (but, say, in a nightly stress test). I think if we skipped the test if cockroach/pkg/testutils/stress.go Lines 19 to 25 in 2558dcc
|
I'm certainly amenable to the argument, but I don't see how it applies to this case? What about this test makes you worried that it will take too long to run on every CI? |
Looking at this test now. Right now it fails pretty much on every run. Will look a bit as to why that is. |
Doesn't fail for any CDC related reasons. Basically after shutting down the first node, things fall apart. I can reduce this to func TestChangefeedNodeShutdown(t *testing.T) {
defer leaktest.AfterTest(t)()
args := base.TestServerArgs{
UseDatabase: "d",
}
tc := serverutils.StartTestCluster(t, 3, base.TestClusterArgs{
ServerArgs: args,
})
defer tc.Stopper().Stop(context.Background())
db := tc.ServerConn(1)
sqlDB := sqlutils.MakeSQLRunner(db)
sqlDB.Exec(t, `CREATE DATABASE d`)
sqlDB.Exec(t, `CREATE TABLE foo (a INT PRIMARY KEY, b STRING)`)
sqlDB.Exec(t, `INSERT INTO foo VALUES (0, 'initial')`)
time.Sleep(10 * time.Second)
tc.StopServer(0)
sqlDB.Exec(t, `UPSERT INTO foo VALUES(0, 'updated')`)
sqlDB.Exec(t, `INSERT INTO foo VALUES (3, 'third')`)
}
Plenty of failed heartbeats before. I'm not quite sure why, the ranges should be upreplicated at this point. We'll see. |
@shermanCRL why GA-blocker on an issue that's 5 years old? |
99934: changefeedccl: Remove skipped tests that decayed over time r=miretskiy a=miretskiy Remove Fixes #32232 Remove TestChangefeedNodeShutdown. This test has been disabled since 2018; Other tests exist (e.g. `TestChangefeedHandlesDrainingNodes`) that verify restart behavior. Fixes #51842 Remove BenchmarkChangefeedTicks benchmark. This benchmark has been skipped since 2019. Attempts could be made to revive it; however, this benchmark had a lot of code, which accomplished questionable goals. The benchmark itself was unrepresentative (by using dependency injection), too small to be meaningful (1000 rows), and most likely would be too noise and inconclusive. We have added other micro benchmarks over time; and we conduct large scale testing, including with roachtests. Release note: None 100342: upgrades: remove migration that waits for schema changes r=rafiss a=rafiss We can also remove some skipped tests, since they no longer apply. informs: #96751 Release note: None 100345: upgrades: unskip TestIsAtLeastVersionBuiltin r=rafiss a=rafiss informs: #96751 Release note: None 100484: server,testutils: add some extra logging for TestStatusEngineStatsJson r=abarganier a=knz Informs #99261 Release note: None Epic: None Co-authored-by: Yevgeniy Miretskiy <[email protected]> Co-authored-by: Rafi Shamim <[email protected]> Co-authored-by: Raphael 'kena' Poss <[email protected]>
Remove Fixes cockroachdb#32232 Remove TestChangefeedNodeShutdown. This test has been disabled since 2018; Other tests exist (e.g. `TestChangefeedHandlesDrainingNodes`) that verify restart behavior. Fixes cockroachdb#51842 Remove BenchmarkChangefeedTicks benchmark. This benchmark has been skipped since 2019. Attempts could be made to revive it; however, this benchmark had a lot of code, which accomplished questionable goals. The benchmark itself was unrepresentative (by using dependency injection), too small to be meaningful (1000 rows), and most likely would be too noise and inconclusive. We have added other micro benchmarks over time; and we conduct large scale testing, including with roachtests. Release note: None
Remove Fixes cockroachdb#32232 Remove TestChangefeedNodeShutdown. This test has been disabled since 2018; Other tests exist (e.g. `TestChangefeedHandlesDrainingNodes`) that verify restart behavior. Fixes cockroachdb#51842 Remove BenchmarkChangefeedTicks benchmark. This benchmark has been skipped since 2019. Attempts could be made to revive it; however, this benchmark had a lot of code, which accomplished questionable goals. The benchmark itself was unrepresentative (by using dependency injection), too small to be meaningful (1000 rows), and most likely would be too noise and inconclusive. We have added other micro benchmarks over time; and we conduct large scale testing, including with roachtests. Release note: None
TestChangefeedNodeShutdown is a new test which is partially complete, but has been checked in Skip()ed due to unfortunate flakiness issues, of which there are several:
This unit test was intended as a faster-running replacement for the cdc/crdbChaos roachtest, so this functionality is not going untested. We are logging this follow-up issue and skipping the test so that we get this needed (and verified) functionality checked in.
Jira issue: CRDB-4756
The text was updated successfully, but these errors were encountered: