Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

rangefeedcache,settingswatcher,systemcfgwatcher: lose gossip deps #74612

Merged
merged 10 commits into from
Feb 2, 2022

Conversation

ajwerner
Copy link
Contributor

This is a pile of commits which supersedes #69269 and pretty much puts in place all of the pieces to move on #70560.

@cockroach-teamcity
Copy link
Member

This change is Reviewable

Copy link
Member

@RaduBerinde RaduBerinde left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is really great work!

My only concern is around the frontier bumping delay - I walked away from a previous discussion with the idea that the frontier lags behind the present by something on the order of seconds. That makes the decision of "upgrading" from a RangeFeed to a Watcher non-trivial, no? In particular, does that mean that SET CLUSTER SETTING now takes seconds?

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @ajwerner and @RaduBerinde)


pkg/kv/kvclient/rangefeed/config.go, line 111 at r1 (raw file):

}

// WithDiff makes an option to enable an initial scan which defaults to

This comment is incorrect (and had me scratching my head for a while).


pkg/kv/kvclient/rangefeed/rangefeedbuffer/kvs.go, line 20 at r2 (raw file):

// RangeFeedValueEventToKV is a function to type assert an Event into a
// *roachpb.RangeFeedValue and then conver it to a roachpb.KeyValue.

[nit] convert


pkg/kv/kvclient/rangefeed/rangefeedbuffer/kvs.go, line 22 at r2 (raw file):

// *roachpb.RangeFeedValue and then conver it to a roachpb.KeyValue.
func RangeFeedValueEventToKV(event Event) roachpb.KeyValue {
	return func(rfv *roachpb.RangeFeedValue) roachpb.KeyValue {

I don't understand what is going on here. How is this different than

rfv := event.(*roachpb.RangeFeedValue)
return roachpb.KeyValue{..}

?


pkg/kv/kvclient/rangefeed/rangefeedcache/watcher.go, line 40 at r1 (raw file):

const (

[nit] blank line


pkg/kv/kvclient/rangefeed/rangefeedcache/watcher.go, line 66 at r1 (raw file):

}

// Watcher is used to implement a consistent cache over spans of KV data

[supernit] it would be nice if this was the first thing in the file, easier to discover


pkg/kv/kvclient/rangefeed/rangefeedcache/watcher.go, line 72 at r1 (raw file):

// loop. Rangefeeds used as is don't offer any ordering guarantees with
// respect to updates made over non-overlapping keys, which is something we care
// about. For that reason we inject logic to make use of a rangefeed buffer,

[nit] "we inject logic" can be misunderstood, I'd say "For that reason this facility implements logic to..."


pkg/kv/kvclient/rangefeed/rangefeedcache/watcher.go, line 85 at r1 (raw file):

// The expectation is that the caller will use a mutex to update an underlying
// data structure.
type Watcher struct {

Shouldn't we include here the information that the events are received with a significant lag because of how the frontier is advanced?


pkg/kv/kvclient/rangefeed/rangefeedcache/watcher.go, line 106 at r1 (raw file):

// Knobs allows tests to inject behavior into the Watcher.
type Knobs struct {

[nit] blank line


pkg/kv/kvclient/rangefeed/rangefeedcache/watcher.go, line 124 at r1 (raw file):

// NewWatcher instantiates a Watcher.
func NewWatcher(

withPrevValue deserves some documentation here (point to RangeFeedRequest.WithDiff)


pkg/kv/kvclient/rangefeed/rangefeedcache/watcher.go, line 157 at r1 (raw file):

		const aWhile = 5 * time.Minute // arbitrary but much longer than a retry
		for r := retry.StartWithCtx(ctx, base.DefaultRetryOptions()); r.Next(); {

[nit] blank line


pkg/kv/kvclient/rangefeed/rangefeedcache/watcher.go, line 180 at r1 (raw file):

// This is a blocking operation, returning only when the context is canceled,
// or when a retryable error occurs. For the latter, it's expected that callers
// will re-run the cache.

[nit] re-run the watcher?


pkg/server/settingswatcher/row_decoder.go, line 46 at r5 (raw file):

}

func (r RawValue) SafeFormat(s interfaces.SafePrinter, verb rune) {

[nit] // SafeFormat implements redact.SafeFormatter.


pkg/server/settingswatcher/settings_watcher.go, line 127 at r4 (raw file):

	}

	c := rangefeedcache.NewWatcher(

Isn't there a significant functional change here in that the updates are now received with a multi-second delay? At the very least I can image this would make many tests much slower (if each time we set a cluster setting, we have to wait a few seconds)


pkg/server/settingswatcher/settings_watcher.go, line 130 at r4 (raw file):

		"settings-watcher",
		s.clock, s.f,
		0, // bufferSize

I don't understand what bufferSize=0 does? Wouldn't it cause an error on the first onValue() callback from the rangefeed?


pkg/server/settingswatcher/settings_watcher.go, line 132 at r4 (raw file):

		0, // bufferSize
		[]roachpb.Span{settingsTableSpan},
		false,

[nit] add comment


pkg/spanconfig/spanconfigkvsubscriber/kvsubscriber.go, line 142 at r1 (raw file):

		ctx context.Context, v *roachpb.RangeFeedValue,
	) rangefeedbuffer.Event {
		return translateEvent(ctx, d, v)

[nit] should translateEvent be a method on d (in which case we would just use d.translateEvent)?


pkg/spanconfig/spanconfigkvsubscriber/kvsubscriber.go, line 289 at r1 (raw file):

}

func translateEvent(

[nit] could use a short comment


pkg/kv/kvclient/rangefeed/rangefeedcache/cache_impl_test.go, line 83 at r3 (raw file):

		updatedData = updateKVs
	case PartialUpdate:
		// Note that handleUpdate is not called synchronously, so we can use the

[nit] not called concurrently?


pkg/kv/kvclient/rangefeed/rangefeedcache/cache_impl_test.go, line 87 at r3 (raw file):

		// missing anything.
		prev, _, _ := c.GetSnapshot() // okay if prev is nil
		if len(updateKVs) == 0 {

[nit] this special case could be handled in MergeKVs.


pkg/server/settingswatcher/row_decoder_test.go, line 1 at r5 (raw file):

package settingswatcher

[supernit] should remove this file from this commit

Copy link
Contributor Author

@ajwerner ajwerner left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My only concern is around the frontier bumping delay - I walked away from a previous discussion with the idea that the frontier lags behind the present by something on the order of seconds. That makes the decision of "upgrading" from a RangeFeed to a Watcher non-trivial, no? In particular, does that mean that SET CLUSTER SETTING now takes seconds?

In the case of settings, it applies the settings change to the updater in the Translate function which is called before buffering the event. That means there is no delay for settings introduced by this change. It is a little cheeky to hijack the translate call like that. I’ll add commentary that this is a reasonable thing to do. In fact, if there’s no storage, it never even buffers the event, so it’s not really getting that much out of the watcher layer. The delay only affects the snapshot logic.

Similarly, I don’t care about the delay very much for the two other use cases.

One thing I’d like to do is deal with a hard restart of the watcher as will happen if the gc threshold is hit.

I’m waiting at an urgent care, so I’ll address the feedback later. Thanks for taking a look.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @ajwerner and @RaduBerinde)

@ajwerner ajwerner force-pushed the ajwerner/system-config-shim branch 2 times, most recently from d5e5579 to 5fc6b01 Compare January 11, 2022 07:44
Copy link
Contributor Author

@ajwerner ajwerner left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @RaduBerinde)


pkg/kv/kvclient/rangefeed/config.go, line 111 at r1 (raw file):

Previously, RaduBerinde wrote…

This comment is incorrect (and had me scratching my head for a while).

Done.


pkg/kv/kvclient/rangefeed/rangefeedbuffer/kvs.go, line 20 at r2 (raw file):

Previously, RaduBerinde wrote…

[nit] convert

Done.


pkg/kv/kvclient/rangefeed/rangefeedbuffer/kvs.go, line 22 at r2 (raw file):

Previously, RaduBerinde wrote…

I don't understand what is going on here. How is this different than

rfv := event.(*roachpb.RangeFeedValue)
return roachpb.KeyValue{..}

?

Done.

Bad refactor


pkg/kv/kvclient/rangefeed/rangefeedcache/watcher.go, line 40 at r1 (raw file):

Previously, RaduBerinde wrote…

[nit] blank line

Done.


pkg/kv/kvclient/rangefeed/rangefeedcache/watcher.go, line 66 at r1 (raw file):

Previously, RaduBerinde wrote…

[supernit] it would be nice if this was the first thing in the file, easier to discover

Done.


pkg/kv/kvclient/rangefeed/rangefeedcache/watcher.go, line 72 at r1 (raw file):

Previously, RaduBerinde wrote…

[nit] "we inject logic" can be misunderstood, I'd say "For that reason this facility implements logic to..."

I reworded this comment.


pkg/kv/kvclient/rangefeed/rangefeedcache/watcher.go, line 85 at r1 (raw file):

Previously, RaduBerinde wrote…

Shouldn't we include here the information that the events are received with a significant lag because of how the frontier is advanced?

Done.


pkg/kv/kvclient/rangefeed/rangefeedcache/watcher.go, line 106 at r1 (raw file):

Previously, RaduBerinde wrote…

[nit] blank line

Done.


pkg/kv/kvclient/rangefeed/rangefeedcache/watcher.go, line 124 at r1 (raw file):

Previously, RaduBerinde wrote…

withPrevValue deserves some documentation here (point to RangeFeedRequest.WithDiff)

Added a bunch of commentary


pkg/kv/kvclient/rangefeed/rangefeedcache/watcher.go, line 157 at r1 (raw file):

Previously, RaduBerinde wrote…

[nit] blank line

Done.


pkg/kv/kvclient/rangefeed/rangefeedcache/watcher.go, line 180 at r1 (raw file):

Previously, RaduBerinde wrote…

[nit] re-run the watcher?

Done.


pkg/server/settingswatcher/row_decoder.go, line 46 at r5 (raw file):

Previously, RaduBerinde wrote…

[nit] // SafeFormat implements redact.SafeFormatter.

Done.


pkg/server/settingswatcher/settings_watcher.go, line 127 at r4 (raw file):

Previously, RaduBerinde wrote…

Isn't there a significant functional change here in that the updates are now received with a multi-second delay? At the very least I can image this would make many tests much slower (if each time we set a cluster setting, we have to wait a few seconds)

Nope. Hopefully the new commentary makes that clearer.


pkg/server/settingswatcher/settings_watcher.go, line 130 at r4 (raw file):

Previously, RaduBerinde wrote…

I don't understand what bufferSize=0 does? Wouldn't it cause an error on the first onValue() callback from the rangefeed?

No, because it's leveraging translateEvent and then always returning nil so no events get buffered unless there's storage, at which point, the buffer exists. I added more commentary.


pkg/server/settingswatcher/settings_watcher.go, line 132 at r4 (raw file):

Previously, RaduBerinde wrote…

[nit] add comment

Done.


pkg/spanconfig/spanconfigkvsubscriber/kvsubscriber.go, line 142 at r1 (raw file):

Previously, RaduBerinde wrote…

[nit] should translateEvent be a method on d (in which case we would just use d.translateEvent)?

heh i wrote that initially and then I felt sad about that method not being tested with the rest of the decoder methods. Done.


pkg/spanconfig/spanconfigkvsubscriber/kvsubscriber.go, line 289 at r1 (raw file):

Previously, RaduBerinde wrote…

[nit] could use a short comment

done.


pkg/kv/kvclient/rangefeed/rangefeedcache/cache_impl_test.go, line 83 at r3 (raw file):

Previously, RaduBerinde wrote…

[nit] not called concurrently?

Done.


pkg/kv/kvclient/rangefeed/rangefeedcache/cache_impl_test.go, line 87 at r3 (raw file):

Previously, RaduBerinde wrote…

[nit] this special case could be handled in MergeKVs.

Done. Took some rewording to make it sane to avoid the copy.


pkg/server/settingswatcher/row_decoder_test.go, line 1 at r5 (raw file):

Previously, RaduBerinde wrote…

[supernit] should remove this file from this commit

Fixed the history.

@ajwerner ajwerner marked this pull request as ready for review January 11, 2022 14:28
@ajwerner ajwerner requested a review from a team as a code owner January 11, 2022 14:28
@irfansharif irfansharif self-requested a review January 12, 2022 14:58
@ajwerner ajwerner force-pushed the ajwerner/system-config-shim branch 9 times, most recently from e6d5238 to 484b405 Compare January 13, 2022 23:26
Copy link
Contributor

@irfansharif irfansharif left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Incredibly cool. Read up to the settingswatcher change, have to come back to it and stare some more.


// PartialUpdate indicates that the events represent an incremental update
// since the previous update.
PartialUpdate UpdateType = false
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

s/Partial/Incremental?

)

// TranslateEventFunc is used by the client to translate a low-level event
// into an event for buffering. if nil is returned, the event is skipped.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[nit] "If"

context.Context, *roachpb.RangeFeedValue,
) rangefeedbuffer.Event

// OnUpdateFunc is used by the client to receive a batch of updates.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[nit] Batch of updates? Rephrase to the singular form or refer to "events" in the plural.

// This is a blocking operation, returning only when the context is canceled,
// or when a retryable error occurs. For the latter, it's expected that callers
// will re-run the watcher.
func (s *Watcher) Run(ctx context.Context) error {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this need to be exported?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, refers to "global store of span configs".

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I sort of like that it's exported, it makes testing other things useful, and it had clear semantics: run synchronously until there's a permanent error.

case CompleteUpdate:
updatedData = updateKVs
case PartialUpdate:
// Note that handleUpdate is synchronously within the underlying watcher,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"handlUpdate is synchronously"

@ajwerner
Copy link
Contributor Author

@RaduBerinde is there more that you want on this other than @irfansharif's nits?

Copy link
Member

@RaduBerinde RaduBerinde left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:lgtm: (sorry for the delay)

Reviewable status: :shipit: complete! 1 of 0 LGTMs obtained (waiting on @ajwerner, @irfansharif, and @RaduBerinde)


pkg/server/settingswatcher/settings_watcher.go, line 118 at r17 (raw file):

	}
	noteUpdate := func(update rangefeedcache.Update) {
		s.mu.Lock()

We can do the if outside the lock no?

@ajwerner ajwerner force-pushed the ajwerner/system-config-shim branch from 484b405 to 72e5717 Compare January 29, 2022 07:23
@blathers-crl blathers-crl bot requested a review from irfansharif January 29, 2022 07:23
@ajwerner ajwerner force-pushed the ajwerner/system-config-shim branch from 72e5717 to 18c1699 Compare January 31, 2022 02:42
Copy link
Contributor Author

@ajwerner ajwerner left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TFTR!

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (and 1 stale) (waiting on @irfansharif and @RaduBerinde)


pkg/kv/kvclient/rangefeed/rangefeedcache/watcher.go, line 94 at r9 (raw file):

Previously, irfansharif (irfan sharif) wrote…

s/Partial/Incremental?

Done.


pkg/kv/kvclient/rangefeed/rangefeedcache/watcher.go, line 98 at r9 (raw file):

Previously, irfansharif (irfan sharif) wrote…

[nit] "If"

Done.


pkg/kv/kvclient/rangefeed/rangefeedcache/watcher.go, line 103 at r9 (raw file):

Previously, irfansharif (irfan sharif) wrote…

[nit] Batch of updates? Rephrase to the singular form or refer to "events" in the plural.

Done.


pkg/server/settingswatcher/settings_watcher.go, line 118 at r17 (raw file):

Previously, RaduBerinde wrote…

We can do the if outside the lock no?

No, the mutex here was implicitly protecting u. I made that fact explicit with a little bit of refactoring. I'm glad I did it. I recall wavering on moving the updater into the struct earlier.


pkg/kv/kvclient/rangefeed/rangefeedcache/cache_impl_test.go, line 83 at r12 (raw file):

Previously, irfansharif (irfan sharif) wrote…

"handlUpdate is synchronously"

Done.

@ajwerner ajwerner force-pushed the ajwerner/system-config-shim branch from 18c1699 to dabd587 Compare January 31, 2022 03:33
@ajwerner ajwerner requested a review from a team January 31, 2022 03:33
@ajwerner ajwerner force-pushed the ajwerner/system-config-shim branch from dabd587 to aeb377a Compare January 31, 2022 04:52
@ajwerner ajwerner requested a review from a team January 31, 2022 04:52
@RaduBerinde
Copy link
Member


pkg/server/settingswatcher/settings_watcher.go, line 118 at r17 (raw file):

Previously, ajwerner wrote…

No, the mutex here was implicitly protecting u. I made that fact explicit with a little bit of refactoring. I'm glad I did it. I recall wavering on moving the updater into the struct earlier.

But it's not protecting update.Type.. I meant we can lock inside the if.

This logic is generally handy when one wants to build caching atop rangefeeds
and wants to make claims about the data corresponding to a consistent snapshot.

Release note: None
… kvs

Often we want to extract KVs from buffered events and we want to merge them
to form a snapshot.

Release note: None
Sometimes we want to know what was the timestamp in KV.

Release note: None
This will be handy when dealing with the system config and generally as a test
for this package.

Release note: None
This commit doesn't do anything in terms of functional changes, but it lays
the groundwork for a subsequent change to buffer and write out a settings
snapshot.

Release note: None
This commit removes the code which connected the settings to their backing
table via the gossipped system config. Instead it unconditionally enables the
rangefeed-backed settingswatcher which was developed to support tenants.

Note that it is rather tested code that has been used in multi-tenant sql
pods for about a year now and all the existing tests still pass.

Release note: None
@ajwerner ajwerner force-pushed the ajwerner/system-config-shim branch from aeb377a to cc156d8 Compare February 2, 2022 00:03
This is an improvement over what the library provided before. It previously
assumed that the rangefeed would never encounter a permanent error after the
initial scan. There are cases where such errors can occur. This commit defers
to the rangefeedcache.Start loop to handle them.

Release note: None
This also adopts the rangefeed backed cache for the optimizer and for
the reporter.

Fixes cockroachdb#70558
Fixes cockroachdb#74665

Release note: None
@ajwerner ajwerner force-pushed the ajwerner/system-config-shim branch from cc156d8 to 9d9e975 Compare February 2, 2022 02:08
Copy link
Contributor Author

@ajwerner ajwerner left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (and 1 stale) (waiting on @irfansharif)


pkg/server/settingswatcher/settings_watcher.go, line 118 at r17 (raw file):

Previously, RaduBerinde wrote…

But it's not protecting update.Type.. I meant we can lock inside the if.

Whoops, done

@ajwerner
Copy link
Contributor Author

ajwerner commented Feb 2, 2022

bors r+

@craig craig bot merged commit 63988a4 into cockroachdb:master Feb 2, 2022
@craig
Copy link
Contributor

craig bot commented Feb 2, 2022

Build succeeded:

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants