server,settingswatcher: fix the local persisted cache #111475

knz · 2023-09-29T01:34:28Z

There's two commits here, fixing 2 separate issues.
Epic: CRDB-6671

server,settingswatcher: properly evict entries from the local persisted cache

Fixes #70567.
Supersedes #101472.

(For context, on each node there is a local persisted cache of cluster
setting customizations. This exists to ensure that configured values
can be used even before a node has fully started up and can start
reading customizations from system.settings.)

Prior to this patch, entries were never evicted from the local
persisted cache: when a cluster setting was reset, any previously
saved entry in the cache would remain there.

This is a very old bug, which was long hidden and was recently
revealed when commit 2f5d717 was
merged. In a nutshell, before this recent commit the code responsible
to load the entries from the cache didn't fully work and so the stale
entries were never restored from the cache. That commit fixed the
loader code, and so the stale entries became active, which made the
old bug visible.

To fix the old bug, this present commit modifies the settings watcher
to preserve KV deletion events, and propagates them to the persisted
cache.

(There is no release note because there is no user-facing release where
the bug was visible.)

settingswatcher: write-through to the persisted cache

Fixes #111422.
Fixes #111328.

Prior to this patch, the rangefeed watcher over system.settings was
updating the in-RAM value store before it propagated the updates to
the persisted local cache.

In fact, the update to the persisted local cache was lagging quite a
bit behind, because the rangefeed watcher would buffer updates and
only flush them after a while.

As a result, the following sequence was possible:

client updates a cluster setting.
server is immediately shut down. The persisted cache
has not been updated yet.
server is restarted. For a short while (until the settings watcher
has caught up), the old version of the setting remains active.

This recall of ghost values of a setting was simply a bug. This patch
fixes that, by ensuring that the persisted cache is written through
before the in-RAM value store.

By doing this, we give up on batching updates to the persisted local
store. This is deemed acceptable because cluster settings are not
updated frequently.

cockroach-teamcity · 2023-09-29T01:34:39Z

This change is

(This function/method will also be deleted in a later commit.) Release note: None

stevendanna

Thanks for digging into this situation. Overall this looks good to me. Left a couple of minor comments for your consideration.

Reviewed 1 of 1 files at r1, 6 of 6 files at r2.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @aliher1911 and @knz)

pkg/server/settingswatcher/settings_watcher.go line 296 at r3 (raw file):

	// updates in the onUpdate rangefeed function.
	if s.storage != nil {
		s.snapshot = rangefeedbuffer.MergeKVsKeepingDeletions(s.snapshot, []roachpb.KeyValue{rkv})

I wondered for a bit whether it was possible to make the merging code more efficient given that we only ever update 1 value at a time. But, ultimately decided that, like you mentioned in one of the commit messages, settings are updated infrequently.

Another thing we could consider if this becomes an issue is returning some indicator about whether anything was actually updated to avoid extra KV traffic if it is unnecessary. For instance, if the rangefeed restarts under the hood such that we see a flurry of duplicates.

But again, let's handle that if we see it become an issue. This traffic should be trivial overall.

pkg/kv/kvclient/rangefeed/rangefeedbuffer/kvs_test.go line 107 at r2 (raw file):

// TestMergeKVs tests the logic of MergeKVsKeepingDeletions and the
// logic of to extract KVs from a slice of events.

[nit] missing "EventToKVs" between "of" and "to" in the comment I think.

pkg/kv/kvclient/rangefeed/rangefeedbuffer/kvs_test.go line 168 at r2 (raw file):

				{"c", 4, ""},
				{"d", 4, ""},
			},

Should we add some out of order keys in this update (I see the updates for b are out of timestamp order, but I mean something like reversing d and c).

Perhaps also a key at a lower timestamp than what we have in the existing snapshot. Which I don't think can happen easily in the first commit but can definitely happen when we move this to being used in the translateEvent callback.

Both of these cases should work given your current implementation, so the test is just to prevent regressions there.

pkg/settings/integration_tests/settings_test.go line 292 at r3 (raw file):

}

func TestSettingsPersistenceEndToEnd(t *testing.T) {

I wonder if we want a test that adds some testing hook into server startup so that we can test accessing settings in a way that guarantees we are relying on the version of the on-disk cache. I don't feel strongly about this suggestion.

aliher1911

Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @knz and @stevendanna)

pkg/server/settingswatcher/settings_watcher.go line 296 at r3 (raw file):

Previously, stevendanna (Steven Danna) wrote…

I wondered for a bit whether it was possible to make the merging code more efficient given that we only ever update 1 value at a time. But, ultimately decided that, like you mentioned in one of the commit messages, settings are updated infrequently.

Another thing we could consider if this becomes an issue is returning some indicator about whether anything was actually updated to avoid extra KV traffic if it is unnecessary. For instance, if the rangefeed restarts under the hood such that we see a flurry of duplicates.

But again, let's handle that if we see it become an issue. This traffic should be trivial overall.

We must be only writing to a local store, not even under proper mvcc so the only downside must be pebble compaction/iteration over garbage which I think is negligible here. At least that's how I understand this setting storage here.

pkg/server/settings_cache_test.go line 185 at r3 (raw file):

	testutils.SucceedsSoon(t, func() error {
		store, err := ts.GetStores().(*kvserver.Stores).GetStore(1)
		if err != nil {

We can require.NoError(t, err, "failed to get store") here and below to avoid retry in cases where we can only get fatal errors. Also less lines to read.

stevendanna · 2023-09-29T10:20:44Z

Backfilling some commentary from slack. An issue we might want to consider is what happens when:

1. What happens if the following happens:
2. Write cluster setting a
3. See range feed update for write
4. Update cache
5. Reset cluster setting a
6. Crash before range feed sees update

One edge case to think through.

…ed cache (For context, on each node there is a local persisted cache of cluster setting customizations. This exists to ensure that configured values can be used even before a node has fully started up and can start reading customizations from `system.settings`.) Prior to this patch, entries were never evicted from the local persisted cache: when a cluster setting was reset, any previously saved entry in the cache would remain there. This is a very old bug, which was long hidden and was recently revealed when commit 2f5d717 was merged. In a nutshell, before this recent commit the code responsible to load the entries from the cache didn't fully work and so the stale entries were never restored from the cache. That commit fixed the loader code, and so the stale entries became active, which made the old bug visible. To fix the old bug, this present commit modifies the settings watcher to preserve KV deletion events, and propagates them to the persisted cache. (There is no release note because there is no user-facing release where the bug was visible.) Release note: None Co-authored-by: Steven Danna <[email protected]>

Prior to this patch, the rangefeed watcher over `system.settings` was updating the in-RAM value store before it propagated the updates to the persisted local cache. In fact, the update to the persisted local cache was lagging quite a bit behind, because the rangefeed watcher would buffer updates and only flush them after a while. As a result, the following sequence was possible: 1. client updates a cluster setting. 2. server is immediately shut down. The persisted cache has not been updated yet. 3. server is restarted. For a short while (until the settings watcher has caught up), the old version of the setting remains active. This recall of ghost values of a setting was simply a bug. This patch fixes that, by ensuring that the persisted cache is written through before the in-RAM value store. By doing this, we give up on batching updates to the persisted local store. This is deemed acceptable because cluster settings are not updated frequently. Release note: None

knz

The special case mentioned above is now handled via DeleteRange.

Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @aliher1911 and @stevendanna)

pkg/server/settingswatcher/settings_watcher.go line 296 at r3 (raw file):

Previously, aliher1911 (Oleg) wrote…

We must be only writing to a local store, not even under proper mvcc so the only downside must be pebble compaction/iteration over garbage which I think is negligible here. At least that's how I understand this setting storage here.

I went for the DeleteRange route. It's so much easier to read and understand.

pkg/server/settings_cache_test.go line 185 at r3 (raw file):

Previously, aliher1911 (Oleg) wrote…

We can require.NoError(t, err, "failed to get store") here and below to avoid retry in cases where we can only get fatal errors. Also less lines to read.

Done.

pkg/settings/integration_tests/settings_test.go line 292 at r3 (raw file):

Previously, stevendanna (Steven Danna) wrote…

I wonder if we want a test that adds some testing hook into server startup so that we can test accessing settings in a way that guarantees we are relying on the version of the on-disk cache. I don't feel strongly about this suggestion.

Done.

knz · 2023-09-29T17:17:06Z

PTAL

knz · 2023-09-29T17:28:05Z

TFYRs!

bors r=stevendanna,aliher1911

knz · 2023-09-29T17:29:17Z

bors r-

the kv testsuite is not happy yet

craig · 2023-09-29T17:29:20Z

Canceled.

knz · 2023-09-29T17:30:58Z

unrelated flake: #111426

knz · 2023-09-29T17:57:34Z

bors r=stevendanna,aliher1911

craig · 2023-09-29T19:01:16Z

Build succeeded:

Bazel Essential CI (Cockroach)

RaduBerinde

LGTM with one question

Reviewable status: complete! 0 of 0 LGTMs obtained

pkg/server/settingswatcher/settings_watcher.go line 297 at r5 (raw file):

	if s.storage != nil {
		s.snapshot = rangefeedbuffer.MergeKVs(s.snapshot, []roachpb.KeyValue{rkv})
		s.storage.SnapshotKVs(ctx, s.snapshot)

SnapshotKV runs the update asynchronously... If we're not waiting for it, we reduced the window but it's still there. Maybe that's ok but then the comment should be honest about that, currently it suggests a clear sequencing.

110758: server,settings: properly cascade defaults for TenantReadOnly r=stevendanna,yuzefovich a=knz Previous in sequence: - [x] #110789 and #110947 - [x] #110676 - [x] #111008 - [x] #111145 - [x] #111147 - [x] #111149 - [x] #111150 - [x] #111153 - [x] #111475 - [x] #111210 - [x] #111212 Fixes #108677. Fixes #85729. Fixes #91825. Completes the work described in the settings RFC. Epic: CRDB-6671 TLDR: this patch ensures that virtual cluster servers observe changes made to TenantReadOnly settings via SET CLUSTER SETTING in the system interface, even when there is no override set via ALTER VIRTUAL CLUSTER SET CLUSTER SETTING. For example, after `SET CLUSTER SETTING kv.closed_timestamp.target_duration = '10s'` in the system interface, this value will show up via `SHOW CLUSTER SETTING` in a virtual cluster SQL session. This changes the way that settings are picked up in virtual cluster, as follows: 1. if there is an override specifically for this tenant's ID (in `tenant_settings`), use that. 2. otherwise, if there is an override for the pseudo-ID 0 (in `tenant_settings` still, set via `ALTER TENANT ALL SET CLUSTER SETTING`), then use that. 3. **NEW** otherwise, if the class is TenantReadOnly and there is a custom value in `system.settings`, set via a regular `SET CLUSTER SETTING` in the system tenant, then use that. 4. otherwise, use the global default set via the setting's `Register()` call. ---- Prior to this patch, TenantReadOnly settings as observed from virtual clusters were defined as the following priority order: 1. if there is an override specifically for this tenant's ID (in `tenant_settings`), use that. 2. otherwise, if there is an override for the pseudo-ID 0 (in `tenant_settings` still, set via `ALTER TENANT ALL SET CLUSTER SETTING`), then use that. 3. otherwise, use the global default set via the setting's `Register()` call. Remarkably, this did not pick up any changes made via a plain `SET CLUSTER SETTING` statement via the system interface, which only modifies this setting's value in `system.settings` (thus not `tenant_settings`). This situation was problematic in two ways. To start, settings like `kv.closed_timestamp.target_duration` cannot be set solely in `system.tenant_settings`; they are also used in the storage layer and so must also be picked up from changes in `system.settings`. For these settings, it is common for operators to just issue the plain `SET CLUSTER SETTING` statement (to update `system.settings`) and simply forget to _also_ run `ALTER TENANT ALL SET CLUSTER SETTING`. This mistake is nearly unavoidable and would result in incoherent behavior, where the storage layer would use the customized value and virtual clusters would use the registered global default. The second problem is in mixed-version configurations, where the storage layer runs version N+1 and the SQL service runs version N of the executable. If the registered global default changes from version N to N+1, the SQL service would not properly pick up the new default defined in version N+1 of the storage layer. This patch fixes both problems as follows: - it integrates changes to TenantReadOnly settings observed in `system.settings`, to the watcher that tracks changes to `system.tenant_settings`. When a TenantReadOnly setting is present in the former but not the latter, a synthetic override is added. - it also initializes synthetic overrides for all the TenantReadOnly settings upon server initialization, from the registered global default, so that virtual cluster servers always pick up the storage layer's default as override. 111383: *: simplify tests r=yuzefovich a=knz All commits except the last are from #110758. Epic: CRDB-6671 Now that "tenant-ro" settings take their default from the system tenant's value, we do not need `ALTER TENANT ALL` for them any more. This patch simplifies test code accordingly. Rcommended by `@yuzefovich` in the review for #110758. 111440: cluster-ui: fix db page stories r=THardy98 a=THardy98 Epic: none This change fixes the stories for the database pages. Release note: None 111512: kv: correctly handle shared lock replays in KVNemesis r=nvanbenschoten a=arulajmani Previously, we'd return an AssertionFailed error if a SKIP LOCKED request discovered another request from its own transaction waiting in a lock's wait queue. In SQL's use of KV, this can only happen if the SKIP LOCKED request is being replayed -- so returning an error here is fine. However, this tripped KVNemesis up. This patch marks such errors, for the benefit of KVNemesis, and doesn't call them assertion failed errors. Fixes #111426 Fixes #111506 Fixes #111513 Release note: None Co-authored-by: Raphael 'kena' Poss <[email protected]> Co-authored-by: Thomas Hardy <[email protected]> Co-authored-by: Arul Ajmani <[email protected]>

knz · 2023-09-30T18:34:19Z

SnapshotKV runs the update asynchronously... If we're not waiting for it, we reduced the window but it's still there.

😱 i did not see this.

knz requested review from stevendanna, aliher1911 and a team September 29, 2023 01:34

knz requested review from a team as code owners September 29, 2023 01:34

knz force-pushed the 20230928-settings-regression branch 2 times, most recently from 3b48a32 to 0d30293 Compare September 29, 2023 07:15

settingswatcher: move an inline function to a method

ad326c1

(This function/method will also be deleted in a later commit.) Release note: None

knz force-pushed the 20230928-settings-regression branch from 0d30293 to 2db9035 Compare September 29, 2023 08:18

stevendanna previously approved these changes Sep 29, 2023

View reviewed changes

aliher1911 approved these changes Sep 29, 2023

View reviewed changes

knz force-pushed the 20230928-settings-regression branch from 2db9035 to 85021de Compare September 29, 2023 17:02

knz and others added 2 commits September 29, 2023 19:06

knz force-pushed the 20230928-settings-regression branch from 85021de to c5af624 Compare September 29, 2023 17:16

knz commented Sep 29, 2023

View reviewed changes

knz requested review from aliher1911 and stevendanna September 29, 2023 17:16

stevendanna approved these changes Sep 29, 2023

View reviewed changes

knz mentioned this pull request Sep 29, 2023

server: storeCachedSettingsKVs removes unset settings value #101472

Closed

craig bot merged commit beb87b4 into cockroachdb:master Sep 29, 2023

knz deleted the 20230928-settings-regression branch September 29, 2023 19:17

RaduBerinde reviewed Sep 29, 2023

View reviewed changes

This was referenced Oct 2, 2023

roachtest: change-replicas/mixed-version failed #111539

Closed

rangefeed: investigate 3s closed timestamp interval effect on system rangefeeds #111195

Closed

knz mentioned this pull request Oct 3, 2023

setting ‹"cluster.preserve_downgrade_option"› to ‹"20.2"› failed: cannot set cluster.preserve_downgrade_option to ‹20.2› (cluster version is 21.1) #68335

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

server,settingswatcher: fix the local persisted cache #111475

server,settingswatcher: fix the local persisted cache #111475

knz commented Sep 29, 2023 •

edited

Loading

cockroach-teamcity commented Sep 29, 2023

stevendanna left a comment

aliher1911 left a comment

stevendanna commented Sep 29, 2023

knz left a comment

knz commented Sep 29, 2023

knz commented Sep 29, 2023

knz commented Sep 29, 2023

craig bot commented Sep 29, 2023

knz commented Sep 29, 2023

knz commented Sep 29, 2023

craig bot commented Sep 29, 2023

RaduBerinde left a comment

knz commented Sep 30, 2023

server,settingswatcher: fix the local persisted cache #111475

server,settingswatcher: fix the local persisted cache #111475

Conversation

knz commented Sep 29, 2023 • edited Loading

server,settingswatcher: properly evict entries from the local persisted cache

settingswatcher: write-through to the persisted cache

cockroach-teamcity commented Sep 29, 2023

stevendanna left a comment

Choose a reason for hiding this comment

aliher1911 left a comment

Choose a reason for hiding this comment

stevendanna commented Sep 29, 2023

knz left a comment

Choose a reason for hiding this comment

knz commented Sep 29, 2023

knz commented Sep 29, 2023

knz commented Sep 29, 2023

craig bot commented Sep 29, 2023

knz commented Sep 29, 2023

knz commented Sep 29, 2023

craig bot commented Sep 29, 2023

RaduBerinde left a comment

Choose a reason for hiding this comment

knz commented Sep 30, 2023

knz commented Sep 29, 2023 •

edited

Loading