-
Notifications
You must be signed in to change notification settings - Fork 3.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
storage/engine: improve cluster version handling #42653
Comments
We have marked this issue as stale because it has been inactive for |
We ran into this issue when making backwards incompatible encryption-at-rest changes and began moving the cluster version out of Pebble. A new file |
Doing some backlog triage. My understanding of this issue is that we have implemented the "out of DB" version handling, and all that remains is some cleanup:
|
@RaduBerinde - does any of your recent work in #96394 help here? |
Yes, I am working on the cleanup mentioned above. |
The cluster version is currently stored inside of RocksDB/Pebble by
Store.WriteClusterVersion
. At startup, a node will refuse to start if the cluster version is newer than what the node supports. This prevents accidental downgrades. Unfortunately, there is a hole in this strategy. Because the cluster version is stored in RocksDB, we have to open the RocksDB instance in order to read it. Opening the RocksDB instance causes the RocksDB WAL to be replayed and replaying the WAL itself can callback into CRDB code which is cluster version specific. It looks like this exact scenario caused problems for a user who upgraded to 19.1 and then restarted a node with 2.0.There are two possibilities for improvement here. We could open the RocksDB instance read-only in order to check the cluster version, then open it again as writable. This should work, but is mildly unfortunate as we'd end up replaying the WAL twice.
Another option is to move the cluster version storage out of RocksDB. We already have a
COCKROACHDB_VERSION
. We could extend this file, or add another file, that stores the cluster version.Jira issue: CRDB-5336
The text was updated successfully, but these errors were encountered: