-
Notifications
You must be signed in to change notification settings - Fork 3.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
roachtest: decommission/mixed-versions failed #54908
Comments
Same as #54908. |
This is strange. We see the logic to pick a predecessor version working as expected:
So at no point during this test should the node have been running a |
I can reproduce this at will. |
The reproductions aren't as easy as I thought. But I was able to hit this again and take a look at the node in question. The v20.1 binary certainly looks like a real v20.1 binary.
But I do certainly see a
When running the debug command with the following diff: git diff
diff --git a/pkg/kv/kvserver/debug_print.go b/pkg/kv/kvserver/debug_print.go
index c179390..f5b1dc5 100644
--- a/pkg/kv/kvserver/debug_print.go
+++ b/pkg/kv/kvserver/debug_print.go
@@ -25,6 +25,7 @@ import (
"github.com/cockroachdb/cockroach/pkg/util/protoutil"
"github.com/cockroachdb/errors"
"go.etcd.io/etcd/raft/raftpb"
+ "github.com/cockroachdb/cockroach/pkg/clusterversion"
)
// PrintKeyValue attempts to pretty-print the specified MVCCKeyValue to
@@ -226,7 +227,7 @@ func tryRaftLogEntry(kv storage.MVCCKeyValue) (string, error) {
}
func tryTxn(kv storage.MVCCKeyValue) (string, error) {
- var txn roachpb.Transaction
+ var txn clusterversion.ClusterVersion
if err := maybeUnmarshalInline(kv.Value, &txn); err != nil {
return "", err
} It is strange to me that there are only two keys in the @irfansharif does any of this ring any bells for you? I think you're most familiar with these decommission tests. |
No I don't think I've seen this failure before. But happy to take this off your plate, I'm around this area (cluster version persistence) anyhow. |
I've been looking at this over in #54906, @irfansharif |
Writes to a `storage.Engine` are not sync'ed by default, meaning that they can get lost due to an ill-timed crash. Fixes cockroachdb#54906. (The backport will take care of cockroachdb#54908). Release note (bug fix): a rare scenario in which a node would refuse to start after updating the binary was fixed. The log message would indicate: "store [...], last used with cockroach version [...], is too old for running version [...] (which requires data from [...] or later)".
Writes to a RocksDB `storage.Engine` were not sync'ed by default, meaning that they can get lost due to an ill-timed crash. They are now, matching pebble's behavior. This affects only WriteClusterVersion, updateBootstrapInfoLocked, WriteLastUpTimestamp, and Compactor.Suggest, nonw of which are performance sensitive. Fixes cockroachdb#54906. (The backport will take care of cockroachdb#54908). Release note (bug fix): a rare scenario in which a node would refuse to start after updating the binary was fixed. The log message would indicate: "store [...], last used with cockroach version [...], is too old for running version [...] (which requires data from [...] or later)".
55240: kvserver: sync cluster version setting to store r=petermattis a=tbg Writes to a `storage.Engine` are not sync'ed by default, meaning that they can get lost due to an ill-timed crash. Fixes #54906. (The backport will take care of #54908). Release note (bug fix): a rare scenario in which a node would refuse to start after updating the binary was fixed. The log message would indicate: "store [...], last used with cockroach version [...], is too old for running version [...] (which requires data from [...] or later)". Co-authored-by: Tobias Grieger <[email protected]>
@tbg we can close this issue now, right? |
No, needs backport, this is release-20.2 (though I am confused - @lucy-zhang , why are you removing the branch labels? We use them to track the branch the failure is on. This particular one has been fixed on master but not release-20.2). |
Writes to a RocksDB `storage.Engine` were not sync'ed by default, meaning that they can get lost due to an ill-timed crash. They are now, matching pebble's behavior. This affects only WriteClusterVersion, updateBootstrapInfoLocked, WriteLastUpTimestamp, and Compactor.Suggest, nonw of which are performance sensitive. Fixes cockroachdb#54906. (The backport will take care of cockroachdb#54908). Release note (bug fix): a rare scenario in which a node would refuse to start after updating the binary was fixed. The log message would indicate: "store [...], last used with cockroach version [...], is too old for running version [...] (which requires data from [...] or later)".
Writes to a RocksDB `storage.Engine` were not sync'ed by default, meaning that they can get lost due to an ill-timed crash. They are now, matching pebble's behavior. This affects only WriteClusterVersion, updateBootstrapInfoLocked, WriteLastUpTimestamp, and Compactor.Suggest, nonw of which are performance sensitive. Fixes cockroachdb#54906. (The backport will take care of cockroachdb#54908). Release note (bug fix): a rare scenario in which a node would refuse to start after updating the binary was fixed. The log message would indicate: "store [...], last used with cockroach version [...], is too old for running version [...] (which requires data from [...] or later)".
Sorry, that was accidental. I'm putting them back on the other roachtest failures. |
(roachtest).decommission/mixed-versions failed on release-20.2@973a6f33ecb8f99b2d510408826f37cd84a99f81:
More
Artifacts: /decommission/mixed-versions See this test on roachdash |
(roachtest).decommission/mixed-versions failed on release-20.2@7f10df809db5076075b6ec63bc744b62109ee459:
More
Artifacts: /decommission/mixed-versions
See this test on roachdash |
Closing now that #55745 is fixed. |
(roachtest).decommission/mixed-versions failed on release-20.2@bdb8cd0e7b2f25a08569a56464838486b6d16421:
More
Artifacts: /decommission/mixed-versions
Related:
See this test on roachdash
powered by pkg/cmd/internal/issues
The text was updated successfully, but these errors were encountered: