Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

sql: re-add GC job on schema element deletion #45962

Merged
merged 1 commit into from
Mar 16, 2020

Conversation

pbardea
Copy link
Contributor

@pbardea pbardea commented Mar 10, 2020

This commit creates GC jobs upon the deletion of an index, table or
database. Similarly to the previous implementation, it considers the
walltime at which the schema change completed to be the drop time of the
schema element.

Release note (sql change): Previously, after deleting an index, table,
or database the relevant schema change job would change its running
status to waiting for GC TTL. The schema change and the GC process are
now decoupled into 2 jobs.

Release justification: This is a follow up to the migration of turning
schema changes into actual jobs. This commit re-adds the ability to
properly GC indexes and tables.

@cockroach-teamcity
Copy link
Member

This change is Reviewable

@pbardea pbardea added the do-not-merge bors won't merge a PR with this label. label Mar 10, 2020
@pbardea pbardea force-pushed the integration branch 7 times, most recently from 0062228 to 6829b7b Compare March 11, 2020 20:14
@nathanstilwell nathanstilwell mentioned this pull request Mar 11, 2020
24 tasks
@pbardea pbardea changed the title [very-wip] use gc jobs sql: re-add GC job on schema element deletion Mar 11, 2020
@pbardea pbardea removed the do-not-merge bors won't merge a PR with this label. label Mar 11, 2020
@pbardea pbardea force-pushed the integration branch 10 times, most recently from 9dd2464 to 88308f5 Compare March 12, 2020 06:21
@pbardea pbardea requested review from thoszhang and ajwerner March 12, 2020 06:21
@pbardea pbardea force-pushed the integration branch 3 times, most recently from 8374c51 to 9bcee63 Compare March 12, 2020 16:18
@pbardea
Copy link
Contributor Author

pbardea commented Mar 12, 2020

Ready for a first look. Looks like the races happening on CI are the same ones addressed by #45919.

Copy link
Contributor

@ajwerner ajwerner left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:lgtm:

Reviewed 19 of 22 files at r1.
Reviewable status: :shipit: complete! 1 of 0 LGTMs obtained (waiting on @ajwerner, @lucy-zhang, and @pbardea)


pkg/sql/drop_test.go, line 298 at r1 (raw file):

	defer func(i time.Duration) { gcjob.MaxSQLGCInterval = i }(gcjob.MaxSQLGCInterval)
	gcjob.MaxSQLGCInterval = 500 * time.Millisecond

nit: make yourself a helper that takes a timestamp and returns a closure so you can go:

defer setMaxGCInterval(500 * time.Millisecond)()

pkg/sql/drop_test.go, line 857 at r1 (raw file):

	// Maybe this test API should use an offset starting from the most recent job
	// instead.
	schemaChangeJobOffset := 4

nit: const


pkg/sql/schema_changer.go, line 386 at r1 (raw file):

	}

	if err := db.Txn(ctx, func(ctx context.Context, txn *kv.Txn) error {

note to self, @pbardea please don't do anything about this: This pattern is annoying. I'm going to make a library function for it. It also has a problem where if we retry we're going to pollute the job registry map with cancel functions.

Probably

var f func(ctx context.Context, txn *kv.Txn) error
jobRegistry.CreateStartableJob(ctx, jobRecord, resultCh, f)

That func doesn't matter here but it'll be handy with protected timestamps elsewhere. This provides a place to fix the leak and deal with other issues like failure to commit the transaction which is clumsily handled with CleanupOnRollback.


pkg/sql/gcjob/descriptor_utils.go, line 81 at r1 (raw file):

func deleteDatabaseZoneConfig(ctx context.Context, db *kv.DB, databaseID sqlbase.ID) error {
	return db.Txn(ctx, func(ctx context.Context, txn *kv.Txn) error {
		if err := txn.SetSystemConfigTrigger(); err != nil {

huh, how did this work before? Is this just the database cache being sort of ridiculous and broken? Is there some new dependency on this?

Copy link
Contributor Author

@pbardea pbardea left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the review!

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (and 1 stale) (waiting on @ajwerner and @lucy-zhang)


pkg/sql/drop_test.go, line 298 at r1 (raw file):

Previously, ajwerner wrote…

nit: make yourself a helper that takes a timestamp and returns a closure so you can go:

defer setMaxGCInterval(500 * time.Millisecond)()

Nice, since all my tests used the same time I just made a SetSmallMaxGCIntervalForTests.


pkg/sql/drop_test.go, line 857 at r1 (raw file):

Previously, ajwerner wrote…

nit: const

Done.


pkg/sql/gcjob/descriptor_utils.go, line 81 at r1 (raw file):

Previously, ajwerner wrote…

huh, how did this work before? Is this just the database cache being sort of ridiculous and broken? Is there some new dependency on this?

Ya, I think this can be removed. I think I added it while experimenting with some approaches to get rid of a test flake, not thinking much about which keys this txn was touching.

@pbardea pbardea force-pushed the integration branch 2 times, most recently from a4fa0ee to aa95f5d Compare March 13, 2020 05:28
Copy link
Contributor Author

@pbardea pbardea left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (and 1 stale) (waiting on @ajwerner and @lucy-zhang)


pkg/sql/gcjob/descriptor_utils.go, line 81 at r1 (raw file):

Previously, pbardea (Paul Bardea) wrote…

Ya, I think this can be removed. I think I added it while experimenting with some approaches to get rid of a test flake, not thinking much about which keys this txn was touching.

Nevermind, this is very much needed and it just never worked. In the old world this was done in the same txn that removed the table descriptors.

@pbardea pbardea force-pushed the integration branch 2 times, most recently from 58539e7 to 496df02 Compare March 13, 2020 21:18
Copy link
Contributor

@thoszhang thoszhang left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed 16 of 22 files at r1, 7 of 7 files at r2.
Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (and 1 stale) (waiting on @ajwerner and @pbardea)


pkg/sql/schema_changer.go, line 472 at r2 (raw file):

		if err := sc.startGCJob(ctx, gcDetails, false /* isRollback */); err != nil {
			return err
		}

Does it make sense to just return immediately from exec() after we create/write a GC job for a dropped table? (I've never really understood why we didn't return immediately in maybeDropTable() in the previous implementation, when we did end up dropping the table.)

I'm fine with leaving this as is, though, since we probably don't want to deal with any unexpected interactions at this point.

@pbardea pbardea force-pushed the integration branch 2 times, most recently from 21f4a31 to ab2a338 Compare March 15, 2020 20:56
pbardea added a commit to pbardea/cockroach that referenced this pull request Mar 16, 2020
This change updates the minimum allowed duration for a test run under
race in CI to be configured on a per-test basis, rather than a
per-package basis.

When there are a lot of tests in a given package that need to be
stressed by testrace, the entire package may not be given sufficient
time to complete. For example, cockroachdb#45962 was not passing CI since many
tests were changed within one package.

Release justification: Non-production code changes.
Release note: None
Copy link
Contributor Author

@pbardea pbardea left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (and 1 stale) (waiting on @ajwerner and @lucy-zhang)


pkg/sql/schema_changer.go, line 472 at r2 (raw file):

Previously, lucy-zhang (Lucy Zhang) wrote…

Does it make sense to just return immediately from exec() after we create/write a GC job for a dropped table? (I've never really understood why we didn't return immediately in maybeDropTable() in the previous implementation, when we did end up dropping the table.)

I'm fine with leaving this as is, though, since we probably don't want to deal with any unexpected interactions at this point.

Ya, I also can't think of a reason it wouldn't be safe to return immediately here. As you mentioned, I'd prefer to do this in a follow-up PR.

@pbardea
Copy link
Contributor Author

pbardea commented Mar 16, 2020

After fighting a bit with CI, I think that the timeout for the racetests needs to be increased a bit since there are quite a few tests run under stress from the pkg/sql package.

I successfully got CI to pass with that change, and extracted that commit into #46138. This branch will likely need to be rebased off of that one when it gets merged for CI to turn green here.

craig bot pushed a commit that referenced this pull request Mar 16, 2020
46138: github-pull-request-make: set minimum testrace duration per test r=tbg a=pbardea

This change updates the minimum allowed duration for a test run under
race in CI to be configured on a per-test basis, rather than a
per-package basis.

When there are a lot of tests in a given package that need to be
stressed by testrace, the entire package may not be given sufficient
time to complete. For example, #45962 was not passing CI since many
tests were changed within one package.

Release justification: Non-production code changes.
Release note: None

Co-authored-by: Paul Bardea <[email protected]>
This commit creates GC jobs upon the deletion of an index, table or
database. Similarly to the previous implementation, it considers the
walltime at which the schema change completed to be the drop time of the
schema element.

Release note (sql change): Previously, after deleting an index, table,
or database the relevant schema change job would change its running
status to waiting for GC TTL. The schema change and the GC process are
now decoupled into 2 jobs.

Release justification: This is a follow up to the migration of turning
schema changes into actual jobs. This commit re-adds the ability to
properly GC indexes and tables.
@pbardea
Copy link
Contributor Author

pbardea commented Mar 16, 2020

TFTRs!
bors r+

@jordanlewis jordanlewis mentioned this pull request Mar 16, 2020
83 tasks
@craig
Copy link
Contributor

craig bot commented Mar 16, 2020

Build failed (retrying...)

@craig
Copy link
Contributor

craig bot commented Mar 16, 2020

Build succeeded

@craig craig bot merged commit 565c476 into cockroachdb:master Mar 16, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants