-
Notifications
You must be signed in to change notification settings - Fork 3.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Introduce a Snapshot Procedure #4115
Conversation
🎊 +1 overall
This message was automatically generated. |
🎊 +1 overall
This message was automatically generated. |
🎊 +1 overall
This message was automatically generated. |
💔 -1 overall
This message was automatically generated. |
🎊 +1 overall
This message was automatically generated. |
🎊 +1 overall
This message was automatically generated. |
🎊 +1 overall
This message was automatically generated. |
💔 -1 overall
This message was automatically generated. |
🎊 +1 overall
This message was automatically generated. |
🎊 +1 overall
This message was automatically generated. |
🎊 +1 overall
This message was automatically generated. |
🎊 +1 overall
This message was automatically generated. |
hi duo, I've made some progress recently.
If you have time, would you mind taking a look on this at your convenience and see if there is any problems ? Thanks. @Apache9 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Overall LGTM.
Only need to confirm that the fencing method is correct.
Thanks for the hard work @frostruan !
hbase-server/src/main/java/org/apache/hadoop/hbase/master/procedure/SnapshotProcedure.java
Show resolved
Hide resolved
...e-server/src/main/java/org/apache/hadoop/hbase/master/procedure/SnapshotRegionProcedure.java
Outdated
Show resolved
Hide resolved
...e-server/src/main/java/org/apache/hadoop/hbase/master/procedure/SnapshotRegionProcedure.java
Outdated
Show resolved
Hide resolved
} | ||
|
||
@RestrictedApi(explanation = "Should only be called in tests", link = "", | ||
allowedOnPath = ".*(/src/test/.*|TestSnapshotProcedure).java") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
'src/test/.* ' contains TestSnapshotProcedure?
hbase-server/src/main/java/org/apache/hadoop/hbase/master/snapshot/SnapshotManager.java
Show resolved
Hide resolved
hbase-server/src/main/java/org/apache/hadoop/hbase/master/snapshot/SnapshotManager.java
Show resolved
Hide resolved
hbase-server/src/main/java/org/apache/hadoop/hbase/master/snapshot/SnapshotManager.java
Show resolved
Hide resolved
.filter(p -> !p.isFinished()) | ||
.filter(p -> p.getServerName() != null) | ||
.forEach(p -> { | ||
verifyWorkerAssigner.addUsedWorker(p.getServerName()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What if the region server is already dead?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it's harmless that we have dead servers in the WorkAssigner based on this consideration:
- If the remote server is dead, there will be a FailedRemoteDispatchException thrown when we dispatch remote procedures. The procedure-v2 framework is designed to be able to handle this. The upper layer procedure can notice that and reschedule a new procedure, or the procedure can retry itself.
- When acquire a new worker from WorkAssigner, we will always choose from the online server list. The dead server will not be able to become a candidate again.
- If there are too many dead servers in the WorkerAssigner, we can restart the master to clear it. This is a very lightweight operation.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Seems this is the behavior of the SplitWALManager, so I guess it is OK.
|
||
private void prepareSnapshot(MasterProcedureEnv env) | ||
throws ProcedureSuspendedException, IOException { | ||
if (isAnySplitOrMergeProcedureRunning(env)) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So when arriving here, we can not schedule new split or merge procedures for this table any more as we have already record this procedure in the SnapshotManager?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes. That's correct. In fact, in the current implementation, if a zk-coordinated snapshot is in progress, the split/merge procedure on the same table will be marked as FAILED.
Really thanks for your great suggestions and help, Duo. I will address it ASAP. @Apache9 |
🎊 +1 overall
This message was automatically generated. |
🎊 +1 overall
This message was automatically generated. |
💔 -1 overall
This message was automatically generated. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Overall LGTM. Just need to fix some minor nits.
I think we can merge it to master and branch-2 first, and then polish other things.
Thanks @frostruan !
.filter(p -> !p.isFinished()) | ||
.filter(p -> p.getServerName() != null) | ||
.forEach(p -> { | ||
verifyWorkerAssigner.addUsedWorker(p.getServerName()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Seems this is the behavior of the SplitWALManager, so I guess it is OK.
Really appreciate your help, Duo @Apache9 |
Signed-off-by: Duo Zhang <[email protected]>
No description provided.