Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

changefeedccl: error when a watched table backfills #29367

Merged
merged 2 commits into from
Sep 4, 2018

Conversation

danhhz
Copy link
Contributor

@danhhz danhhz commented Aug 30, 2018

When a table is currently being backfilled for a schema change (e.g.
adding a column with a default value), it's unclear what the expectation
is for any rows that are changed during the backfill. Our current
invariant is that rows are emitted with an updated timestamp and a later
SELECT ... AS OF SYSTEM TIME for that row would exactly match the
emitted data. During the backfill, there is nothing we can emit that
would definitely meet that invariant (because the backfill can be
aborted and rolled back).

In the meantime, this commit makes sure that we error whenever a
backfill happens, even if it's fast enough that we never get it from
leasing.

This also paves the way for switching to RangeFeed, which doesn't have
the convenient fetchSpansForTargets hook that the ExportRequest based
poller was (ab)using.

Closes #28643

Release note (bug fix): CHANGEFEEDs now error when a watched table
backfills (instead of undefined behavior)

@danhhz danhhz requested review from mrtracy, vivekmenezes and a team August 30, 2018 19:02
@cockroach-teamcity
Copy link
Member

This change is Reviewable

@danhhz
Copy link
Contributor Author

danhhz commented Aug 30, 2018

@vivekmenezes can you review the second commit, please? It's got some assumptions about leasing and schema change backfills that I want to make sure I got right. Feel free to ignore the first commit, @mrtracy will review that one.

@danhhz
Copy link
Contributor Author

danhhz commented Sep 4, 2018

@mrtracy @vivekmenezes thoughts on this?

Copy link
Contributor

@vivekmenezes vivekmenezes left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed 2 of 2 files at r1.
Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained


pkg/ccl/changefeedccl/changefeed_processors.go, line 171 at r2 (raw file):

	}(ctx)
	ca.tableHistUpdaterDoneCh = make(chan struct{})
	go func(ctx context.Context) {

Any reason for not using poller.RunTask() ?


pkg/ccl/changefeedccl/changefeed_stmt.go, line 328 at r2 (raw file):

			return errors.Errorf(`CHANGEFEEDs cannot operate on tables being backfilled`)
		}
	}

it will be cool if you can move the above logic surrounding Mutations to sit in sqlbase/structured.go . I've consciously tried to keep it there so that future refactoring work will be easier.


pkg/ccl/changefeedccl/changefeed_test.go, line 434 at r2 (raw file):

			`unique: [2]->{"a": 2, "b": "2"}`,
		})
	})

Really nice job with the testing!

@danhhz danhhz requested a review from a team September 4, 2018 17:12
Copy link
Contributor Author

@danhhz danhhz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained


pkg/ccl/changefeedccl/changefeed_processors.go, line 171 at r2 (raw file):

Previously, vivekmenezes wrote…

Any reason for not using poller.RunTask() ?

Good idea! I hadn't thought of it. Tried it just now, but looks like it's really awkward for distsql to handle an error during Start, so I think I'll keep this the way it is


pkg/ccl/changefeedccl/changefeed_stmt.go, line 328 at r2 (raw file):

Previously, vivekmenezes wrote…

it will be cool if you can move the above logic surrounding Mutations to sit in sqlbase/structured.go . I've consciously tried to keep it there so that future refactoring work will be easier.

Absolutely! Done


pkg/ccl/changefeedccl/changefeed_test.go, line 434 at r2 (raw file):

Previously, vivekmenezes wrote…

Really nice job with the testing!

It caught TONS of bugs while I was writing the PR. Totally worth the effort

Copy link
Contributor

@mrtracy mrtracy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Commit 1 is :lgtm: with a couple of nits, nice tests

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (and 1 stale)


pkg/ccl/changefeedccl/table_history.go, line 87 at r1 (raw file):

// WaitForTS blocks until the given timestamp is greater or equal to the
// high-water or error timestamp. In the latter case, the error is returned.

I think this is the other way around; it waits until the given timestamp is less than or equal to the high-water or error timestamp.


pkg/ccl/changefeedccl/table_history.go, line 168 at r1 (raw file):

	}

	m.mu.Lock()

I think you should move this above the validateErr check, and remove the Lock/Unlock inside of the validateErr check block. This whole area works inside of the mutex.

Copy link
Contributor Author

@danhhz danhhz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@mrtracy RFAL

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (and 1 stale)


pkg/ccl/changefeedccl/table_history.go, line 87 at r1 (raw file):

Previously, mrtracy (Matt Tracy) wrote…

I think this is the other way around; it waits until the given timestamp is less than or equal to the high-water or error timestamp.

Done.


pkg/ccl/changefeedccl/table_history.go, line 168 at r1 (raw file):

Previously, mrtracy (Matt Tracy) wrote…

I think you should move this above the validateErr check, and remove the Lock/Unlock inside of the validateErr check block. This whole area works inside of the mutex.

I was hoping this way would make it easier to reason about, but sounds like my intuition was wrong. I moved this whole thing into a new method, which I think is even better (one lock followed immediately by a defer at the very start of the new method)

Copy link
Contributor

@mrtracy mrtracy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Still LGTM, thanks!

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (and 1 stale)


pkg/ccl/changefeedccl/table_history.go, line 168 at r1 (raw file):

Previously, danhhz (Daniel Harrison) wrote…

I was hoping this way would make it easier to reason about, but sounds like my intuition was wrong. I moved this whole thing into a new method, which I think is even better (one lock followed immediately by a defer at the very start of the new method)

Yes, looks nicer.


pkg/ccl/changefeedccl/table_history.go, line 47 at r3 (raw file):

// The `ValidateThroughTS` method allows a user to block until some given
// timestamp is greater (or equal) to either the high-water or the error
// timestamp. In the latter case, it returns the error.

Sorry, missed this. Same comment problem that you just fixed below. This one is unexported, but now its inconsistent.

tableHistory tracks that a some invariants hold over a set of tables as
time advances.

Internally, two timestamps are tracked. The high-water is the highest
timestamp such that every version of a TableDescriptor has met a
provided invariant (via `validateFn`). An error timestamp is also kept,
which is the lowest timestamp where at least one table doesn't meet the
invariant.

The `ValidateThroughTS` method allows a user to block until some given
timestamp is greater (or equal) to either the high-water or the error
timestamp. In the latter case, it returns the error.

Release note: None
When a table is currently being backfilled for a schema change (e.g.
adding a column with a default value), it's unclear what the expectation
is for any rows that are changed during the backfill. Our current
invariant is that rows are emitted with an updated timestamp and a later
SELECT ... AS OF SYSTEM TIME for that row would exactly match the
emitted data. During the backfill, there is nothing we can emit that
would definitely meet that invariant (because the backfill can be
aborted and rolled back).

In the meantime, this commit makes sure that we error whenever a
backfill happens, even if it's fast enough that we never get it from
leasing.

This also paves the way for switching to RangeFeed, which doesn't have
the convenient `fetchSpansForTargets` hook that the ExportRequest based
poller was (ab)using.

Closes cockroachdb#28643

Release note (bug fix): CHANGEFEEDs now error when a watched table
backfills (instead of undefined behavior)
Copy link
Contributor Author

@danhhz danhhz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the reviews!

bors r=mrtracy,vivekmenezes

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (and 1 stale)


pkg/ccl/changefeedccl/table_history.go, line 47 at r3 (raw file):

Previously, mrtracy (Matt Tracy) wrote…

Sorry, missed this. Same comment problem that you just fixed below. This one is unexported, but now its inconsistent.

Done.

craig bot pushed a commit that referenced this pull request Sep 4, 2018
29367: changefeedccl: error when a watched table backfills r=mrtracy,vivekmenezes a=danhhz

When a table is currently being backfilled for a schema change (e.g.
adding a column with a default value), it's unclear what the expectation
is for any rows that are changed during the backfill. Our current
invariant is that rows are emitted with an updated timestamp and a later
SELECT ... AS OF SYSTEM TIME for that row would exactly match the
emitted data. During the backfill, there is nothing we can emit that
would definitely meet that invariant (because the backfill can be
aborted and rolled back).

In the meantime, this commit makes sure that we error whenever a
backfill happens, even if it's fast enough that we never get it from
leasing.

This also paves the way for switching to RangeFeed, which doesn't have
the convenient `fetchSpansForTargets` hook that the ExportRequest based
poller was (ab)using.

Closes #28643

Release note (bug fix): CHANGEFEEDs now error when a watched table
backfills (instead of undefined behavior)

29427: docs: Fix replace and link in table_ref diagram r=jseldess a=jseldess

Needed for cockroachdb/docs#3682.

Release note: None

29488: roachtest: add acceptance/decommission r=benesch,tschottdorf a=petermattis

Move the decommission acceptance test to a new acceptance/decommission
roachtest.

Fixes #29151

Release note: None

29538: stats: document stats-related commands as experimental r=RaduBerinde a=RaduBerinde

Update the documentation inside `sql.y` to designate the stats-related
statements as experimental.

Release note: None

29546: roachtest: skip (intentionally) failing Kafka chaos test r=petermattis a=tschottdorf

This test has no chance of passing until Kafka chaos is actually
supported (see #28636).

Touches #29196.

Release note: None

29550: testcluster: make manual replication mode disable the merge queue r=petermattis a=benesch

TestClusters have a manual replication mode for use in tests that need
to precisely control replication on a cluster. Teach that mode to
disable the merge queue in addition to the split and replicate queues.
This decreases the number of tests that need to directly disable the
merge queue.

Release note: None

29552: ui: add attributes to login form so LastPass will autofill it r=vilterp a=vilterp

LastPass wasn't confident enough to autofill and autologin without these
attributes.

Fixes #29529 (fixes for LastPass, but maybe not other PW managers)

Release note (admin ui change): Add attributes to the login form to allow LastPass to properly recognize it.

Co-authored-by: Daniel Harrison <[email protected]>
Co-authored-by: Jesse Seldess <[email protected]>
Co-authored-by: Peter Mattis <[email protected]>
Co-authored-by: Radu Berinde <[email protected]>
Co-authored-by: Tobias Schottdorf <[email protected]>
Co-authored-by: Nikhil Benesch <[email protected]>
Co-authored-by: Pete Vilter <[email protected]>
@craig
Copy link
Contributor

craig bot commented Sep 4, 2018

Build succeeded

@craig craig bot merged commit fd1c235 into cockroachdb:master Sep 4, 2018
@danhhz danhhz deleted the cdc_schemachanges branch September 4, 2018 21:07
Amruta-Ranade pushed a commit that referenced this pull request Sep 4, 2018
29367: changefeedccl: error when a watched table backfills r=mrtracy,vivekmenezes a=danhhz

When a table is currently being backfilled for a schema change (e.g.
adding a column with a default value), it's unclear what the expectation
is for any rows that are changed during the backfill. Our current
invariant is that rows are emitted with an updated timestamp and a later
SELECT ... AS OF SYSTEM TIME for that row would exactly match the
emitted data. During the backfill, there is nothing we can emit that
would definitely meet that invariant (because the backfill can be
aborted and rolled back).

In the meantime, this commit makes sure that we error whenever a
backfill happens, even if it's fast enough that we never get it from
leasing.

This also paves the way for switching to RangeFeed, which doesn't have
the convenient `fetchSpansForTargets` hook that the ExportRequest based
poller was (ab)using.

Closes #28643

Release note (bug fix): CHANGEFEEDs now error when a watched table
backfills (instead of undefined behavior)

29427: docs: Fix replace and link in table_ref diagram r=jseldess a=jseldess

Needed for cockroachdb/docs#3682.

Release note: None

29488: roachtest: add acceptance/decommission r=benesch,tschottdorf a=petermattis

Move the decommission acceptance test to a new acceptance/decommission
roachtest.

Fixes #29151

Release note: None

29538: stats: document stats-related commands as experimental r=RaduBerinde a=RaduBerinde

Update the documentation inside `sql.y` to designate the stats-related
statements as experimental.

Release note: None

29546: roachtest: skip (intentionally) failing Kafka chaos test r=petermattis a=tschottdorf

This test has no chance of passing until Kafka chaos is actually
supported (see #28636).

Touches #29196.

Release note: None

29550: testcluster: make manual replication mode disable the merge queue r=petermattis a=benesch

TestClusters have a manual replication mode for use in tests that need
to precisely control replication on a cluster. Teach that mode to
disable the merge queue in addition to the split and replicate queues.
This decreases the number of tests that need to directly disable the
merge queue.

Release note: None

29552: ui: add attributes to login form so LastPass will autofill it r=vilterp a=vilterp

LastPass wasn't confident enough to autofill and autologin without these
attributes.

Fixes #29529 (fixes for LastPass, but maybe not other PW managers)

Release note (admin ui change): Add attributes to the login form to allow LastPass to properly recognize it.

Co-authored-by: Daniel Harrison <[email protected]>
Co-authored-by: Jesse Seldess <[email protected]>
Co-authored-by: Peter Mattis <[email protected]>
Co-authored-by: Radu Berinde <[email protected]>
Co-authored-by: Tobias Schottdorf <[email protected]>
Co-authored-by: Nikhil Benesch <[email protected]>
Co-authored-by: Pete Vilter <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

changefeedccl: test all types of schema changes
4 participants