-
Notifications
You must be signed in to change notification settings - Fork 3.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
loqrecovery: use captured meta range content for LOQ plans #94239
loqrecovery: use captured meta range content for LOQ plans #94239
Conversation
973507f
to
70e07e9
Compare
809abfb
to
5305b54
Compare
c727406
to
36bace0
Compare
36bace0
to
8b8eb85
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just minor stuff.
storeNames = append(storeNames, fmt.Sprintf("s%d", id)) | ||
} | ||
return strings.Join(storeNames, ", ") | ||
} | ||
|
||
func (s storeIDSet) intersect(other storeIDSet) storeIDSet { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We allow use of Go generics now (#93735). This seems like a good opportunity to try it out with some generic map intersect/diff functions (or just use a library which provides it). Doesn't have to happen in this PR though, or at all.
@@ -1415,7 +1415,9 @@ func init() { | |||
f.StringVarP(&debugRecoverPlanOpts.outputFileName, "plan", "o", "", | |||
"filename to write plan to") | |||
f.IntSliceVar(&debugRecoverPlanOpts.deadStoreIDs, "dead-store-ids", nil, | |||
"list of dead store IDs") | |||
"list of dead store IDs (can't be used together with dead-node-ids)") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we just get rid of dead-store-ids
, to avoid the complexity of supporting both?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we should get rid of dead store ids in 23.2. Just to avoid any confusion on mixed cluster where old option which was the primary means disappears after upgrade. It should be less of an issue with 23.1 to 23.2 where we don't expect people to use dead-store-ids anymore.
8b8eb85
to
0618f19
Compare
9b527a8
to
f21dbac
Compare
Previously loss of quorum recovery planner was using local replica info collected from all nodes to find source of truth for replicas that lost quorum. With online approach local info snapshots don't have atomicity. This could cause planner to fail if available replicas are caught in different states on different nodes. This commit adds alternative planning approach when range metadata is available. Instead of fixing individual replicas that can't make progress it finds ranges that can't make progress from metadata using descriptors and updates their replicas to recover from loss of quorum. This commit also adds replica collection stage as a part of make-plan command itself. To invoke collection from a cluster instead of using files one needs to provide --host and other standard cluster connection related flags (--cert-dir, --insecure etc.) as appropriate. Release note: None
f21dbac
to
b5d5fc3
Compare
bors r=erikgrinaker |
Build failed (retrying...): |
Build succeeded: |
Note: only last commit belongs to this PR. Will update description once #93157 lands.
Previously loss of quorum recovery planner was using local replica info collected from all nodes to find source of truth for replicas that lost quorum.
With online approach local info snapshots don't have atomicity. This could cause planner to fail if available replicas are caught in different states on different nodes.
This commit adds alternative planning approach when range metadata is available. Instead of fixing individual replicas that can't make progress it finds ranges that can't make progress from metadata using descriptors and updates their replicas to recover from loss of quorum.
This commit also adds replica collection stage as a part of make-plan command itself. To invoke collection from a cluster instead of using files one needs to provide --host and other standard cluster connection related flags (--cert-dir, --insecure etc.) as appropriate.
Example command output for a local cluster with 3 out of 5 nodes surrvivng looks like:
Release note: None
Fixes: #93038
Fixes: #93046