Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

sql: test namespace entry for multi-region enum is drained properly #60466

Merged
merged 1 commit into from
Feb 18, 2021

Conversation

arulajmani
Copy link
Collaborator

Testing only patch. When the primary region is removed from a
multi-region database we queue up an async job to reclaim the namespace
entry for the multi-region type descriptor (and its array alias). This
patch adds a test to make sure that the namespace entry is drained
properly even if the job runs into an error.

This patch adds a new testing knob to the type schema changer,
RunAfterAllFailOrCancel, to synch on the OnFailOrCancel job
completing.

Release note: None

@arulajmani arulajmani requested review from otan, ajstorm and a team February 11, 2021 04:07
@cockroach-teamcity
Copy link
Member

This change is Reviewable

@arulajmani
Copy link
Collaborator Author

As promised on #60437

@arulajmani arulajmani force-pushed the drop-region-async-fail branch from fb5a77e to c903c36 Compare February 11, 2021 04:34
@@ -162,6 +162,9 @@ type TypeSchemaChangerTestingKnobs struct {
// RunBeforeEnumMemberPromotion runs before enum members are promoted from
// readable to all permissions in the typeSchemaChanger.
RunBeforeEnumMemberPromotion func()
// RunAfterOnFailOrCancel runs after OnFailOrCancel completes, if
// OnFailOrCancel is triggered.
RunAfterAllFailOrCancel func()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do you mean RunAfterOnFailOrCancel?

`)
require.NoError(t, err)

_, err = sqlDB.Exec(`ALTER DATABASE db DROP REGION "us-east-1" `)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: extra space

defer mu.Unlock()
numCleanups++
if numCleanups == 2 {
close(cleanupFinished)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if OnFailOrCancel fails, does it retry, or does the entry get stuck?

Copy link
Collaborator

@ajstorm ajstorm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @arulajmani)


pkg/sql/multiregion_test.go, line 164 at r1 (raw file):

			mu.Lock()
			defer mu.Unlock()
			numCleanups++

Nit: why not make this a bool, and set it after the check?

haveWePerformedCleanup := false
...
if !haveWePerformedCleanup {
   haveWePerformedCleanup = true
} else {
   close(cleanupFinished)
}

@arulajmani arulajmani force-pushed the drop-region-async-fail branch from c903c36 to bf9a6a4 Compare February 11, 2021 17:05
@blathers-crl blathers-crl bot requested a review from otan February 11, 2021 17:05
Copy link
Collaborator Author

@arulajmani arulajmani left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @ajstorm and @otan)


pkg/sql/multiregion_test.go, line 164 at r1 (raw file):

Previously, ajstorm (Adam Storm) wrote…

Nit: why not make this a bool, and set it after the check?

haveWePerformedCleanup := false
...
if !haveWePerformedCleanup {
   haveWePerformedCleanup = true
} else {
   close(cleanupFinished)
}

I think thats slightly confusing because when this block is hit, cleanup has already been performed, as the knob runs after cleanup has completed.

I slightly switched this logic to be explicit about the expectation here, lemme know if that improves the readability.


pkg/sql/multiregion_test.go, line 166 at r1 (raw file):

Previously, otan (Oliver Tan) wrote…

if OnFailOrCancel fails, does it retry, or does the entry get stuck?

This is exactly the bug I was talking about when we spoke yesterday. #60489. This isn't to say things can never be in a bad state -- they can be, if the error is non-retryable, but you can't account for everything :)

I'll send out a patch once this thing goes in, should be easy enough to test once this testing knob is a thing. I can also update this test to account for that.


pkg/sql/type_change.go, line 167 at r1 (raw file):

Previously, otan (Oliver Tan) wrote…

do you mean RunAfterOnFailOrCancel?

I'm not even sure how I made that typo tbh

Copy link
Collaborator

@ajstorm ajstorm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @arulajmani and @otan)


pkg/sql/multiregion_test.go, line 164 at r1 (raw file):

Previously, arulajmani (Arul Ajmani) wrote…

I think thats slightly confusing because when this block is hit, cleanup has already been performed, as the knob runs after cleanup has completed.

I slightly switched this logic to be explicit about the expectation here, lemme know if that improves the readability.

I like the changes. You could still go with the bool if you change name to haveWePerformedFirstRoundOfCleanup, but your call.

Copy link
Contributor

@otan otan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed 2 of 2 files at r2.
Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @arulajmani and @otan)


pkg/sql/type_change.go, line 167 at r1 (raw file):

Previously, arulajmani (Arul Ajmani) wrote…

I'm not even sure how I made that typo tbh

you need sleep

@arulajmani arulajmani force-pushed the drop-region-async-fail branch from bf9a6a4 to 2347a01 Compare February 18, 2021 16:19
Copy link
Collaborator Author

@arulajmani arulajmani left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

bors r+

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @arulajmani and @otan)


pkg/sql/multiregion_test.go, line 164 at r1 (raw file):

Previously, ajstorm (Adam Storm) wrote…

I like the changes. You could still go with the bool if you change name to haveWePerformedFirstRoundOfCleanup, but your call.

Done!

@craig
Copy link
Contributor

craig bot commented Feb 18, 2021

🕐 Waiting for PR status (Github check) to be set, probably by CI. Bors will automatically try to run when all required PR statuses are set.

Testing only patch. When the primary region is removed from a
multi-region database we queue up an async job to reclaim the namespace
entry for the multi-region type descriptor (and its array alias). This
patch adds a test to make sure that the namespace entry is drained
properly even if the job runs into an error.

Release note: None
@arulajmani arulajmani force-pushed the drop-region-async-fail branch from 2347a01 to df14643 Compare February 18, 2021 17:24
@arulajmani
Copy link
Collaborator Author

bors r+

@craig
Copy link
Contributor

craig bot commented Feb 18, 2021

This PR was included in a batch that was canceled, it will be automatically retried

@craig
Copy link
Contributor

craig bot commented Feb 18, 2021

Build failed (retrying...):

@craig craig bot merged commit 40a35fe into cockroachdb:master Feb 18, 2021
@craig
Copy link
Contributor

craig bot commented Feb 18, 2021

Build succeeded:

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants