Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

sql: incorrect behaviour setting NOT NULL on column and dropping it #62167

Merged
merged 1 commit into from
Mar 19, 2021

Conversation

fqazi
Copy link
Collaborator

@fqazi fqazi commented Mar 17, 2021

Fixes: #47719

Previously, if a column was mutated and then dropped in a single
transaction we would still try to validate the dropped column.
Additionally, in general if a drop operation occurred in a
different transaction we ran the danger of doing something
similar. From a behaviour perspective this was bad, since
it can lead to unexpected behaviour for users. To address, this
patch introduces logic to scan later mutations to check if
a drop column occurs. If one does occur then it will skip any
operations made irrelevant by the drop. Additionally, for a
correctness perspective it will wait for the drop to complete
before returning back.

Release note (bug fix): A constraint like not null or check on
a column made irrelevant by a drop in a later concurrent transaction
would lead to incorrect errors / behaviour.

@fqazi fqazi requested a review from a team March 17, 2021 21:02
@cockroach-teamcity
Copy link
Member

This change is Reviewable

@fqazi fqazi force-pushed the columnNotNullBug branch from 554eba5 to 467ca48 Compare March 17, 2021 21:04
Copy link
Contributor

@ajwerner ajwerner left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice. This didn't end up being too bad.

Reviewed 2 of 5 files at r1.
Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @fqazi)


pkg/sql/schema_changer.go, line 1049 at r1 (raw file):

	var committedMutations []descpb.DescriptorMutation
	var scTable *tabledesc.Mutable

How do you feel about just computing the set of jobs inside the txn and stashing that here as opposed to these constituent pieces?

I've become mildly allergic to descriptors retained outside of transactions. It's too easy to keep them around and then forget that properties it has or doesn't have.


pkg/sql/schema_changer.go, line 2567 at r1 (raw file):

// by a later operation. The nextMutationIdx provides the index at which to check for
// later mutation.
func isCurrentMutationDiscarded(

Are there other interactions which are interesting? What about indexes being built? FK constraints?


pkg/sql/schema_changer_test.go, line 7040 at r1 (raw file):

	defer sqltestutils.SetTestJobsAdoptInterval()()
	ctx := context.Background()
	hookOnJobResume := true

I have a feeling you're going to need some synchronization on this.


pkg/sql/schema_changer_test.go, line 7066 at r1 (raw file):

	var schemaChangeWaitGroup sync.WaitGroup

	// Sub test 1: with concurrent drop and mutations

nit: rework this into actual subtests where you take the top part and rework it into a setup function.


pkg/sql/schema_changer_test.go, line 7076 at r1 (raw file):

		defer schemaChangeWaitGroup.Done()
		nextStep <- 1
		conn2.Exec(t,

turns out you can't use the sqlrunner in another goroutine. You have to either call t.Error or pass the error back to the main goroutine on a channel.


pkg/sql/schema_changer_test.go, line 7081 at r1 (raw file):

ALTER TABLE t ALTER COLUMN j SET NOT NULL;
ALTER TABLE t ADD COLUMN k INT8 DEFAULT 42 NOT NULL UNIQUE;
COMMIT TRANSACTION;

TIL I don't know that I've seen COMMIT TRANSACTION spelled out before


pkg/sql/schema_changer_test.go, line 7106 at r1 (raw file):

	// Allow both backfill jobs to proceed
	proceedBeforeBackfill <- nil
	proceedBeforeBackfill <- nil

Just FYI I don't think the backfills will actually be concurrent. Did you see

// Check that we aren't queued behind another schema changer.
if err := sc.notFirstInLine(ctx, desc); err != nil {
return err
}
? This ends up being pretty important. Mutations are processed in order.


pkg/sql/schema_changer_test.go, line 7170 at r1 (raw file):

Quoted 5 lines of code…
	// Allow the first backfill to succeed and the
	// second one with the drop to fail, which should
	// cause both to return errors.
	proceedBeforeBackfill <- nil
	proceedBeforeBackfill <- errors.Newf("Bogus error for drop column transaction")

There's no ordering on when concurrent channel receives happen. You may also want to consider reworking when you receive notification so that you can know definitively which job you're unblocking. Alternatively you could plumb more info into the knob and then pass it along in a struct on the chan. You may want a schema where you do something like:

backfillEventChan := make(chan chan error)
//...
		RunBeforeBackfill: func() error {
                            errCh := make(chan error)
                            backfillEventChan <- errCh
                            return <-errCh
		},

Copy link
Collaborator Author

@fqazi fqazi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @ajwerner)


pkg/sql/schema_changer.go, line 1049 at r1 (raw file):

Previously, ajwerner wrote…
	var committedMutations []descpb.DescriptorMutation
	var scTable *tabledesc.Mutable

How do you feel about just computing the set of jobs inside the txn and stashing that here as opposed to these constituent pieces?

I've become mildly allergic to descriptors retained outside of transactions. It's too easy to keep them around and then forget that properties it has or doesn't have.

Done


pkg/sql/schema_changer_test.go, line 7040 at r1 (raw file):

Previously, ajwerner wrote…

I have a feeling you're going to need some synchronization on this.

Done.


pkg/sql/schema_changer_test.go, line 7076 at r1 (raw file):

Previously, ajwerner wrote…

turns out you can't use the sqlrunner in another goroutine. You have to either call t.Error or pass the error back to the main goroutine on a channel.

Done.


pkg/sql/schema_changer_test.go, line 7081 at r1 (raw file):

Previously, ajwerner wrote…

TIL I don't know that I've seen COMMIT TRANSACTION spelled out before

Came from sqlfumpt, cleaned it up :)


pkg/sql/schema_changer_test.go, line 7170 at r1 (raw file):

Previously, ajwerner wrote…
	// Allow the first backfill to succeed and the
	// second one with the drop to fail, which should
	// cause both to return errors.
	proceedBeforeBackfill <- nil
	proceedBeforeBackfill <- errors.Newf("Bogus error for drop column transaction")

There's no ordering on when concurrent channel receives happen. You may also want to consider reworking when you receive notification so that you can know definitively which job you're unblocking. Alternatively you could plumb more info into the knob and then pass it along in a struct on the chan. You may want a schema where you do something like:

backfillEventChan := make(chan chan error)
//...
		RunBeforeBackfill: func() error {
                            errCh := make(chan error)
                            backfillEventChan <- errCh
                            return <-errCh
		},

Done.

@fqazi fqazi force-pushed the columnNotNullBug branch from 467ca48 to b69d96c Compare March 18, 2021 16:46
@fqazi fqazi requested a review from ajwerner March 18, 2021 16:46
@fqazi
Copy link
Collaborator Author

fqazi commented Mar 18, 2021

Thanks, @ajwerner. I reworked the test internals to be saner.

Copy link
Contributor

@ajwerner ajwerner left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you've got some linter things to fix but :lgtm:

Reviewed 1 of 5 files at r1, 1 of 2 files at r2.
Reviewable status: :shipit: complete! 1 of 0 LGTMs obtained (waiting on @ajwerner and @fqazi)


pkg/sql/schema_changer.go, line 1049 at r2 (raw file):

	var didUpdate bool
	var depMutationJobs []jobspb.JobID
	var scTable *tabledesc.Mutable

I don't think we need to declare scTable outside the txn anymore


pkg/sql/schema_changer.go, line 1547 at r2 (raw file):

		// the rollback completed the job was adopted.
		// the rollback completed the job was adopted.

nit: duplicated line

Fixes: cockroachdb#47719

Previously, if a column was mutated and then dropped in a single
transaction we would still try to validate the dropped column.
Additionally, in general if a drop operation occurred in a
different transaction we ran the danger of doing something
similar. From a behaviour perspective this was bad, since
it can lead to unexpected behaviour for users. To address, this
patch introduces logic to scan later mutations to check if
a drop column occurs. If one does occur then it will skip any
operations made irrelevant by the drop. Additionally, for a
correctness perspective it will wait for the drop to complete
before returning back.

Release note (bug fix): A constraint like not null or check on
a column made irrelevant by a drop in a later concurrent transaction
would lead to incorrect errors / behaviour.
@fqazi fqazi force-pushed the columnNotNullBug branch from b69d96c to a9aa92c Compare March 19, 2021 13:49
@fqazi
Copy link
Collaborator Author

fqazi commented Mar 19, 2021

bors r=ajwerner

@craig
Copy link
Contributor

craig bot commented Mar 19, 2021

Build succeeded:

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

sql: bug when setting a column NOT NULL and then dropping it
3 participants