-
Notifications
You must be signed in to change notification settings - Fork 3.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
sql: re-add GC job on schema element deletion #45962
Conversation
0062228
to
6829b7b
Compare
9dd2464
to
88308f5
Compare
8374c51
to
9bcee63
Compare
Ready for a first look. Looks like the races happening on CI are the same ones addressed by #45919. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewed 19 of 22 files at r1.
Reviewable status: complete! 1 of 0 LGTMs obtained (waiting on @ajwerner, @lucy-zhang, and @pbardea)
pkg/sql/drop_test.go, line 298 at r1 (raw file):
defer func(i time.Duration) { gcjob.MaxSQLGCInterval = i }(gcjob.MaxSQLGCInterval) gcjob.MaxSQLGCInterval = 500 * time.Millisecond
nit: make yourself a helper that takes a timestamp and returns a closure so you can go:
defer setMaxGCInterval(500 * time.Millisecond)()
pkg/sql/drop_test.go, line 857 at r1 (raw file):
// Maybe this test API should use an offset starting from the most recent job // instead. schemaChangeJobOffset := 4
nit: const
pkg/sql/schema_changer.go, line 386 at r1 (raw file):
} if err := db.Txn(ctx, func(ctx context.Context, txn *kv.Txn) error {
note to self, @pbardea please don't do anything about this: This pattern is annoying. I'm going to make a library function for it. It also has a problem where if we retry we're going to pollute the job registry map with cancel functions.
Probably
var f func(ctx context.Context, txn *kv.Txn) error
jobRegistry.CreateStartableJob(ctx, jobRecord, resultCh, f)
That func doesn't matter here but it'll be handy with protected timestamps elsewhere. This provides a place to fix the leak and deal with other issues like failure to commit the transaction which is clumsily handled with CleanupOnRollback
.
pkg/sql/gcjob/descriptor_utils.go, line 81 at r1 (raw file):
func deleteDatabaseZoneConfig(ctx context.Context, db *kv.DB, databaseID sqlbase.ID) error { return db.Txn(ctx, func(ctx context.Context, txn *kv.Txn) error { if err := txn.SetSystemConfigTrigger(); err != nil {
huh, how did this work before? Is this just the database cache being sort of ridiculous and broken? Is there some new dependency on this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the review!
Reviewable status: complete! 0 of 0 LGTMs obtained (and 1 stale) (waiting on @ajwerner and @lucy-zhang)
pkg/sql/drop_test.go, line 298 at r1 (raw file):
Previously, ajwerner wrote…
nit: make yourself a helper that takes a timestamp and returns a closure so you can go:
defer setMaxGCInterval(500 * time.Millisecond)()
Nice, since all my tests used the same time I just made a SetSmallMaxGCIntervalForTests
.
pkg/sql/drop_test.go, line 857 at r1 (raw file):
Previously, ajwerner wrote…
nit:
const
Done.
pkg/sql/gcjob/descriptor_utils.go, line 81 at r1 (raw file):
Previously, ajwerner wrote…
huh, how did this work before? Is this just the database cache being sort of ridiculous and broken? Is there some new dependency on this?
Ya, I think this can be removed. I think I added it while experimenting with some approaches to get rid of a test flake, not thinking much about which keys this txn was touching.
a4fa0ee
to
aa95f5d
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: complete! 0 of 0 LGTMs obtained (and 1 stale) (waiting on @ajwerner and @lucy-zhang)
pkg/sql/gcjob/descriptor_utils.go, line 81 at r1 (raw file):
Previously, pbardea (Paul Bardea) wrote…
Ya, I think this can be removed. I think I added it while experimenting with some approaches to get rid of a test flake, not thinking much about which keys this txn was touching.
Nevermind, this is very much needed and it just never worked. In the old world this was done in the same txn that removed the table descriptors.
58539e7
to
496df02
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewed 16 of 22 files at r1, 7 of 7 files at r2.
Reviewable status: complete! 0 of 0 LGTMs obtained (and 1 stale) (waiting on @ajwerner and @pbardea)
pkg/sql/schema_changer.go, line 472 at r2 (raw file):
if err := sc.startGCJob(ctx, gcDetails, false /* isRollback */); err != nil { return err }
Does it make sense to just return immediately from exec()
after we create/write a GC job for a dropped table? (I've never really understood why we didn't return immediately in maybeDropTable()
in the previous implementation, when we did end up dropping the table.)
I'm fine with leaving this as is, though, since we probably don't want to deal with any unexpected interactions at this point.
21f4a31
to
ab2a338
Compare
This change updates the minimum allowed duration for a test run under race in CI to be configured on a per-test basis, rather than a per-package basis. When there are a lot of tests in a given package that need to be stressed by testrace, the entire package may not be given sufficient time to complete. For example, cockroachdb#45962 was not passing CI since many tests were changed within one package. Release justification: Non-production code changes. Release note: None
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: complete! 0 of 0 LGTMs obtained (and 1 stale) (waiting on @ajwerner and @lucy-zhang)
pkg/sql/schema_changer.go, line 472 at r2 (raw file):
Previously, lucy-zhang (Lucy Zhang) wrote…
Does it make sense to just return immediately from
exec()
after we create/write a GC job for a dropped table? (I've never really understood why we didn't return immediately inmaybeDropTable()
in the previous implementation, when we did end up dropping the table.)I'm fine with leaving this as is, though, since we probably don't want to deal with any unexpected interactions at this point.
Ya, I also can't think of a reason it wouldn't be safe to return immediately here. As you mentioned, I'd prefer to do this in a follow-up PR.
After fighting a bit with CI, I think that the timeout for the racetests needs to be increased a bit since there are quite a few tests run under stress from the I successfully got CI to pass with that change, and extracted that commit into #46138. This branch will likely need to be rebased off of that one when it gets merged for CI to turn green here. |
46138: github-pull-request-make: set minimum testrace duration per test r=tbg a=pbardea This change updates the minimum allowed duration for a test run under race in CI to be configured on a per-test basis, rather than a per-package basis. When there are a lot of tests in a given package that need to be stressed by testrace, the entire package may not be given sufficient time to complete. For example, #45962 was not passing CI since many tests were changed within one package. Release justification: Non-production code changes. Release note: None Co-authored-by: Paul Bardea <[email protected]>
This commit creates GC jobs upon the deletion of an index, table or database. Similarly to the previous implementation, it considers the walltime at which the schema change completed to be the drop time of the schema element. Release note (sql change): Previously, after deleting an index, table, or database the relevant schema change job would change its running status to waiting for GC TTL. The schema change and the GC process are now decoupled into 2 jobs. Release justification: This is a follow up to the migration of turning schema changes into actual jobs. This commit re-adds the ability to properly GC indexes and tables.
TFTRs! |
Build failed (retrying...) |
Build succeeded |
This commit creates GC jobs upon the deletion of an index, table or
database. Similarly to the previous implementation, it considers the
walltime at which the schema change completed to be the drop time of the
schema element.
Release note (sql change): Previously, after deleting an index, table,
or database the relevant schema change job would change its running
status to waiting for GC TTL. The schema change and the GC process are
now decoupled into 2 jobs.
Release justification: This is a follow up to the migration of turning
schema changes into actual jobs. This commit re-adds the ability to
properly GC indexes and tables.