setting ‹"cluster.preserve_downgrade_option"› to ‹"20.2"› failed: cannot set cluster.preserve_downgrade_option to ‹20.2› (cluster version is 21.1) #68335

nick-jones · 2021-08-02T17:15:54Z

Describe the problem

After upgrading our clusters from 20.2 to 21.1, all nodes now appear to want to adjust cluster.preserve_downgrade_option when starting.

+ exec /cockroach/cockroach start --logtostderr=WARNING --certs-dir /cockroach/cockroach-certs --advertise-addr cockroachdb-0.cockroachdb.***.svc.cluster.local --http-addr 0.0.0.0 --joi
n cockroachdb-0.cockroachdb,cockroachdb-1.cockroachdb,cockroachdb-2.cockroachdb --cache 25% --max-sql-memory 25%
Flag --logtostderr has been deprecated, use --log instead to specify 'sinks: {stderr: {filter: ...}}'.
W210801 11:35:17.090732 66 server/settingsworker.go:48 ⋮ [n?] 1  setting ‹"cluster.preserve_downgrade_option"› to ‹"20.2"› failed: cannot set cluster.preserve_downgrade_option to ‹20.2› (cluster version
 is 21.1)
W210801 11:35:17.090732 66 server/settingsworker.go:48 ⋮ [n?] 1 +(1) attached stack trace
W210801 11:35:17.090732 66 server/settingsworker.go:48 ⋮ [n?] 1 +  -- stack trace:
W210801 11:35:17.090732 66 server/settingsworker.go:48 ⋮ [n?] 1 +  | github.com/cockroachdb/cockroach/pkg/clusterversion.registerPreserveDowngradeVersionSetting.func1
W210801 11:35:17.090732 66 server/settingsworker.go:48 ⋮ [n?] 1 +  |    /go/src/github.com/cockroachdb/cockroach/pkg/clusterversion/setting.go:259
W210801 11:35:17.090732 66 server/settingsworker.go:48 ⋮ [n?] 1 +  | github.com/cockroachdb/cockroach/pkg/settings.(*StringSetting).Validate
W210801 11:35:17.090732 66 server/settingsworker.go:48 ⋮ [n?] 1 +  |    /go/src/github.com/cockroachdb/cockroach/pkg/settings/string.go:65
W210801 11:35:17.090732 66 server/settingsworker.go:48 ⋮ [n?] 1 +  | github.com/cockroachdb/cockroach/pkg/settings.(*StringSetting).set
W210801 11:35:17.090732 66 server/settingsworker.go:48 ⋮ [n?] 1 +  |    /go/src/github.com/cockroachdb/cockroach/pkg/settings/string.go:79
W210801 11:35:17.090732 66 server/settingsworker.go:48 ⋮ [n?] 1 +  | github.com/cockroachdb/cockroach/pkg/settings.updater.Set
W210801 11:35:17.090732 66 server/settingsworker.go:48 ⋮ [n?] 1 +  |    /go/src/github.com/cockroachdb/cockroach/pkg/settings/updater.go:92
W210801 11:35:17.090732 66 server/settingsworker.go:48 ⋮ [n?] 1 +  | github.com/cockroachdb/cockroach/pkg/server.processSystemConfigKVs.func1
W210801 11:35:17.090732 66 server/settingsworker.go:48 ⋮ [n?] 1 +  |    /go/src/github.com/cockroachdb/cockroach/pkg/server/settingsworker.go:47
W210801 11:35:17.090732 66 server/settingsworker.go:48 ⋮ [n?] 1 +  | github.com/cockroachdb/cockroach/pkg/server.processSystemConfigKVs
W210801 11:35:17.090732 66 server/settingsworker.go:48 ⋮ [n?] 1 +  |    /go/src/github.com/cockroachdb/cockroach/pkg/server/settingsworker.go:53
W210801 11:35:17.090732 66 server/settingsworker.go:48 ⋮ [n?] 1 +  | github.com/cockroachdb/cockroach/pkg/server.(*Server).refreshSettings
W210801 11:35:17.090732 66 server/settingsworker.go:48 ⋮ [n?] 1 +  |    /go/src/github.com/cockroachdb/cockroach/pkg/server/settingsworker.go:69
W210801 11:35:17.090732 66 server/settingsworker.go:48 ⋮ [n?] 1 +  | github.com/cockroachdb/cockroach/pkg/server.(*Server).PreStart
W210801 11:35:17.090732 66 server/settingsworker.go:48 ⋮ [n?] 1 +  |    /go/src/github.com/cockroachdb/cockroach/pkg/server/server.go:1494
W210801 11:35:17.090732 66 server/settingsworker.go:48 ⋮ [n?] 1 +  | github.com/cockroachdb/cockroach/pkg/cli.runStart.func4.2
W210801 11:35:17.090732 66 server/settingsworker.go:48 ⋮ [n?] 1 +  |    /go/src/github.com/cockroachdb/cockroach/pkg/cli/start.go:587
W210801 11:35:17.090732 66 server/settingsworker.go:48 ⋮ [n?] 1 +  | github.com/cockroachdb/cockroach/pkg/cli.runStart.func4
W210801 11:35:17.090732 66 server/settingsworker.go:48 ⋮ [n?] 1 +  |    /go/src/github.com/cockroachdb/cockroach/pkg/cli/start.go:710
W210801 11:35:17.090732 66 server/settingsworker.go:48 ⋮ [n?] 1 +  | runtime.goexit
W210801 11:35:17.090732 66 server/settingsworker.go:48 ⋮ [n?] 1 +  |    /usr/local/go/src/runtime/asm_amd64.s:1374
W210801 11:35:17.090732 66 server/settingsworker.go:48 ⋮ [n?] 1 +Wraps: (2) cannot set cluster.preserve_downgrade_option to ‹20.2› (cluster version is 21.1)
W210801 11:35:17.090732 66 server/settingsworker.go:48 ⋮ [n?] 1 +Error types: (1) *withstack.withStack (2) *errutil.leafError
W210801 11:35:17.095099 66 2@gossip/gossip.go:1491 ⋮ [n?] 2  no incoming or outgoing connections
CockroachDB node starting at 2021-08-01 11:35:22.216472524 +0000 UTC (took 5.4s)
build:               CCL v21.1.6 @ 2021/07/20 15:30:39 (go1.15.11)
webui:               https://0.0.0.0:8080
sql:                 postgresql://[email protected].***.svc.cluster.local:26257?sslmode=verify-full&sslrootcert=%2Fcockroach%2Fcockroach-certs%2Fca.crt
RPC client flags:    /cockroach/cockroach <client cmd> --host=cockroachdb-0.cockroachdb.***.svc.cluster.local:26257 --certs-dir=/cockroach/cockroach-certs
logs:                /cockroach/cockroach-data/logs
temp dir:            /cockroach/cockroach-data/cockroach-temp515401059
external I/O path:   /cockroach/cockroach-data/extern
store[0]:            path=/cockroach/cockroach-data
storage engine:      pebble
status:              restarted pre-existing node
clusterID:           88707280-6f4a-4158-8dda-0ffcf42d40ba
nodeID:              1

To Reproduce

We upgraded our cockroachdb clusters. When upgrading each cluster, we used the following steps:

SET CLUSTER SETTING cluster.preserve_downgrade_option = '20.2'; executed prior to upgrade
Upgraded all nodes in cluster to v21.1
Waited 24h+
RESET CLUSTER SETTING cluster.preserve_downgrade_option; executed

Expected behavior

The node seems to want to adjust the preserve_downgrade_option with no good reason.

Additional data / screenshots

Environment:

CockroachDB version: v21.1.6
Server OS: Linux

Additional context

Jira issue: CRDB-8992
Epic: CRDB-6671

The text was updated successfully, but these errors were encountered:

blathers-crl · 2021-08-02T17:15:56Z

Hello, I am Blathers. I am here to help you get the issue triaged.

Hoot - a bug! Though bugs are the bane of my existence, rest assured the wretched thing will get the best of care here.

I have CC'd a few people who may be able to assist you:

@cockroachdb/storage (found keywords: pebble)

If we have not gotten back to your issue within a few business days, you can try the following:

Join our community slack channel and ask on #cockroachdb.
Try find someone from here if you know they worked closely on the area and CC them.

_{🦉 Hoot! I am a Blathers, a bot for CockroachDB. My owner is otan.}

darinpp · 2021-08-02T18:47:14Z

@tbg Can you take a look at this? Seems like it tries to apply cached settings when they aren't applicable anymore.

Nican · 2021-08-14T20:21:52Z

@tbg I am having a similar issue, and I am wondering what this is about.
I had updated from 20.2 in the past, to 21.1, and recently I updated from 21.1.6 to 21.1.7.

W210814 20:03:19.815560 28 server/settingsworker.go:48 ⋮ [n?] 28  setting ‹"cluster.preserve_downgrade_option"› to ‹"20.2"› failed: cannot set cluster.preserve_downgrade_option to ‹20.2› (cluster version is 21.1)

What does it mean?

ajwerner · 2021-09-22T12:50:08Z

This warning is not a big deal. The stack trace makes it look scary. It means that these nodes think that the preserve downgrade option should be set because that's what they had on disk but they've learned that the version is already newer than that. I believe this should only happen once. We could special case away this warning, but I don't think it's a major thing.

This would happen if you upgrade manually but don't clear the preserve downgrade option setting. You can clear it with SET CLUSTER SETTING cluster.preserve_downgrade_option = DEFAULT.

ajwerner · 2021-09-22T12:50:39Z

I think what I'd say is that we should refuse to upgrade if you've got the preserve_downgrade_option set.

nick-jones · 2021-09-22T13:00:40Z

@ajwerner

This warning is not a big deal. The stack trace makes it look scary. It means that these nodes think that the preserve downgrade option should be set because that's what they had on disk but they've learned that the version is already newer than that. I believe this should only happen once. We could special case away this warning, but I don't think it's a major thing.

This would happen if you upgrade manually but don't clear the preserve downgrade option setting. You can clear it with SET CLUSTER SETTING cluster.preserve_downgrade_option = DEFAULT.

In our case the error occurs every single time a node starts. We cleared preserve_downgrade_option weeks ago, albeit via RESET CLUSTER SETTING cluster.preserve_downgrade_option rather than the statement you've specified.

As an example:

sh-4.4# cockroach sql
#
# Welcome to the CockroachDB SQL shell.
# All statements must be terminated by a semicolon.
# To exit, type: \q.
#
# Server version: CockroachDB CCL v21.1.7 (x86_64-unknown-linux-gnu, built 2021/08/09 17:55:28, go1.15.14) (same version as client)
# Cluster ID: 7ce2364a-782e-4cec-992c-c0c4dc40709f
#
# Enter \? for a brief introduction.
#
root@cockroachdb-proxy:26257/defaultdb> SHOW CLUSTER SETTING cluster.preserve_downgrade_option;
  cluster.preserve_downgrade_option
-------------------------------------

(1 row)

Time: 4ms total (execution 3ms / network 1ms)

root@cockroachdb-proxy:26257/defaultdb> SELECT NOW();
               now
---------------------------------
  2021-09-22 12:57:23.193367+00
(1 row)

Time: 2ms total (execution 0ms / network 1ms)

And then restarting a node in this cluster:

$ kubectl --context=prod-aws --namespace=<snip> delete pod cockroachdb-0
$ kubectl --context=prod-aws --namespace=<snip> logs cockroachdb-0 -c cockroachdb | head -n5 
++ hostname -f
+ exec /cockroach/cockroach start --logtostderr=WARNING --certs-dir /cockroach/cockroach-certs --advertise-addr cockroachdb-0.cockroachdb.<snip>.svc.cluster.local --http-addr 0.0.0.0 --join cockroachdb-0.cockroachdb,cockroachdb-1.cockroachdb,cockroachdb-2.cockroachdb --cache 25% --max-sql-memory 25%
Flag --logtostderr has been deprecated, use --log instead to specify 'sinks: {stderr: {filter: ...}}'.
W210922 12:59:07.247931 12 server/settingsworker.go:48 ⋮ [n?] 1  setting ‹"cluster.preserve_downgrade_option"› to ‹"20.2"› failed: cannot set cluster.preserve_downgrade_option to ‹20.2› (cluster version is 21.1)
W210922 12:59:07.247931 12 server/settingsworker.go:48 ⋮ [n?] 1 +(1) attached stack trace

ajwerner · 2021-09-22T13:03:46Z

Interesting. Thanks for the report. I found the bug and will file a separate issue. We don't clear unset settings from our on-disk cache 🙁.

ajwerner · 2021-09-22T13:32:08Z

Filed #70567.

knz · 2023-10-03T22:21:31Z

This was fixed by #111475.

nick-jones added the C-bug Code not up to spec/doc, specs & docs deemed correct. Solution expected to change code/behavior. label Aug 2, 2021

blathers-crl bot added O-community Originated from the community X-blathers-triaged blathers was able to find an owner labels Aug 2, 2021

ajwerner mentioned this issue Sep 22, 2021

server: storeCachedSettingsKVs does not remove unset settings values #70567

Closed

knz added A-configurability Pertains to cluster settings, CLI flags, env vars etc T-multitenant Issues owned by the multi-tenant virtual team labels Aug 11, 2023

This was referenced Oct 2, 2023

roachtest: change-replicas/mixed-version failed #111539

Closed

settings: (regression) stale values coming from local cache #111610

Closed

knz closed this as completed Oct 3, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

setting ‹"cluster.preserve_downgrade_option"› to ‹"20.2"› failed: cannot set cluster.preserve_downgrade_option to ‹20.2› (cluster version is 21.1) #68335

setting ‹"cluster.preserve_downgrade_option"› to ‹"20.2"› failed: cannot set cluster.preserve_downgrade_option to ‹20.2› (cluster version is 21.1) #68335

nick-jones commented Aug 2, 2021 •

edited by knz

Loading

blathers-crl bot commented Aug 2, 2021

darinpp commented Aug 2, 2021

Nican commented Aug 14, 2021

ajwerner commented Sep 22, 2021

ajwerner commented Sep 22, 2021

nick-jones commented Sep 22, 2021 •

edited

Loading

ajwerner commented Sep 22, 2021

ajwerner commented Sep 22, 2021

knz commented Oct 3, 2023

setting ‹"cluster.preserve_downgrade_option"› to ‹"20.2"› failed: cannot set cluster.preserve_downgrade_option to ‹20.2› (cluster version is 21.1) #68335

setting ‹"cluster.preserve_downgrade_option"› to ‹"20.2"› failed: cannot set cluster.preserve_downgrade_option to ‹20.2› (cluster version is 21.1) #68335

Comments

nick-jones commented Aug 2, 2021 • edited by knz Loading

blathers-crl bot commented Aug 2, 2021

darinpp commented Aug 2, 2021

Nican commented Aug 14, 2021

ajwerner commented Sep 22, 2021

ajwerner commented Sep 22, 2021

nick-jones commented Sep 22, 2021 • edited Loading

ajwerner commented Sep 22, 2021

ajwerner commented Sep 22, 2021

knz commented Oct 3, 2023

nick-jones commented Aug 2, 2021 •

edited by knz

Loading

nick-jones commented Sep 22, 2021 •

edited

Loading