-
Notifications
You must be signed in to change notification settings - Fork 3.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
HBASE-28420 Update the procedure's field to store for ServerRemoteProcedure #5816
Conversation
💔 -1 overall
This message was automatically generated. |
@@ -137,6 +140,10 @@ synchronized void remoteOperationDone(MasterProcedureEnv env, Throwable error) { | |||
getProcId()); | |||
return; | |||
} | |||
//below persistence is added so that if report goes to last active master, it throws exception | |||
state = MasterProcedureProtos.ServerRemoteProcedureState.SERVER_REMOTE_PROCEDURE_REPORT_SUCCEED; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If there is an error, we should not marked it as succeed? And we also need to persist the error field?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have changed the status. For persisting the error field I am thinking about it. I did see an example of persisting the exception. If need to persist I think I can use those.
//below persistence is added so that if report goes to last active master, it throws exception | ||
state = MasterProcedureProtos.ServerRemoteProcedureState.SERVER_REMOTE_PROCEDURE_REPORT_SUCCEED; | ||
env.getMasterServices().getMasterProcedureExecutor().getStore().update(this); | ||
|
||
complete(env, error); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here, we should not call the complete directly again. We should wake up the procedure and process the result in the execute method.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If I have to call the complete
method in execute
methods, I have to store the Error as complete
methods are using that to take the decision.
Another solution can be I play with the state. According to the error in remoteOperationDone
method, I can change the state and then take all the decisions in complete
method based on the state. But I don't think this would be possible without introducing some state specific to some parent procedure like SnapshotVerifyProcedure.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Persisting the error is a must...
Think of this scenario, after you persist the state, master crashes before you call complete, and after restart, you enter the execute method, if you do not have the error object then how do you plan to call the comlete method...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Any updates here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
First I want to understand what is your thought for not calling complete directly again - is it because of design? We want state transition in execute only?
Think of this scenario, after you persist the state, master crashes before you call complete, and after restart.
If we are fine with calling complete in remoteOperationDone if we call complete before persisting the state will that be okay ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I tried to write another approach as well. You can take a look. https://github.com/apache/hbase/compare/master...Umeshkumar9414:hbase:HBASE_28420_2?expand=1
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The above link doesn't work @Umeshkumar9414
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I do not think you need to consider a master with old code become active, as we are trying to fix a bug in the old code...
And I still do not get your point on why you must call complete in remoteOperationDone method, as I explained above, if master crashes after we persist the state and then restarts, we need to call complete in the execute method, so why not just follow the same pattern since you must have the code in execute method?
And there is another advantage to prevent potential data race if you call complete method in execute method, as the execute method will be called by PEWorker, where we have a bunch of fencing ways to make sure that there is no race.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wanted to call complete in remoteOperationDone so that we didn't encounter the scenario of a master with old code becoming active. Anyway as we don't need to consider this, I have updated the PR with new changes.
🎊 +1 overall
This message was automatically generated. |
🎊 +1 overall
This message was automatically generated. |
🎊 +1 overall
This message was automatically generated. |
@@ -80,6 +81,8 @@ public abstract class ServerRemoteProcedure extends Procedure<MasterProcedureEnv | |||
protected ServerName targetServer; | |||
protected boolean dispatched; | |||
protected boolean succ; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@Apache9 do you think I should replace this succ flag with state ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do it in your own convenience. We must have a state here, you can either add a state but still keep this flag, or you can implement all the logic with the state and drop this succ flag.
@@ -80,6 +81,8 @@ public abstract class ServerRemoteProcedure extends Procedure<MasterProcedureEnv | |||
protected ServerName targetServer; | |||
protected boolean dispatched; | |||
protected boolean succ; | |||
protected MasterProcedureProtos.ServerRemoteProcedureState state = |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If I am using state for making decisions then I think we need to handle HMaster restart scenarios while HBase upgrade.
🎊 +1 overall
This message was automatically generated. |
🎊 +1 overall
This message was automatically generated. |
🎊 +1 overall
This message was automatically generated. |
🎊 +1 overall
This message was automatically generated. |
remoteOperationDone(env, null); | ||
} | ||
|
||
@Override | ||
public synchronized void remoteOperationFailed(MasterProcedureEnv env, | ||
RemoteProcedureException error) { | ||
state = MasterProcedureProtos.ServerRemoteProcedureState.SERVER_REMOTE_PROCEDURE_SERVER_CRASH; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I do not think this should be server crash? The remote region server could also report the failure?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For this, I relied on the documentation comments.
If the targetServer crashed but this
* procedure has no response, than dispatcher will call remoteOperationFailed()
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I went through the code and found that this is not the case(you are right it is not a server crash). Let me change it in that scenario.
//below persistence is added so that if report goes to last active master, it throws exception | ||
state = MasterProcedureProtos.ServerRemoteProcedureState.SERVER_REMOTE_PROCEDURE_REPORT_SUCCEED; | ||
env.getMasterServices().getMasterProcedureExecutor().getStore().update(this); | ||
|
||
complete(env, error); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Any updates here?
🎊 +1 overall
This message was automatically generated. |
💔 -1 overall
This message was automatically generated. |
🎊 +1 overall
This message was automatically generated. |
🎊 +1 overall
This message was automatically generated. |
🎊 +1 overall
This message was automatically generated. |
🎊 +1 overall
This message was automatically generated. |
💔 -1 overall
This message was automatically generated. |
🎊 +1 overall
This message was automatically generated. |
The reason for the last build failure is |
💔 -1 overall
This message was automatically generated. |
🎊 +1 overall
This message was automatically generated. |
🎊 +1 overall
This message was automatically generated. |
🎊 +1 overall
This message was automatically generated. |
🎊 +1 overall
This message was automatically generated. |
🎊 +1 overall
This message was automatically generated. |
🎊 +1 overall
This message was automatically generated. |
💔 -1 overall
This message was automatically generated. |
…istence and complete change
@Umeshkumar9414 could you pull from latest master one more time? |
85b4d6a
to
c78f5da
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The change overall looks good, I wonder if we should add WARN/ERROR log while setting ForeignExceptionMessage at some places though
On worker side, at RS do have logs for example - hbase/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/SplitLogWorker.java Line 215 in a16f458
At master side also we do have debug logs - |
🎊 +1 overall
This message was automatically generated. |
🎊 +1 overall
This message was automatically generated. |
💔 -1 overall
This message was automatically generated. |
💔 -1 overall
This message was automatically generated. |
…cedure (#5816) Co-authored-by: ukumawat <[email protected]> Signed-off-by: Duo Zhang <[email protected]> (cherry picked from commit abea484)
…cedure (apache#5816) Co-authored-by: ukumawat <[email protected]> Signed-off-by: Duo Zhang <[email protected]>
…cedure (apache#5816) Co-authored-by: ukumawat <[email protected]> Signed-off-by: Duo Zhang <[email protected]>
…cedure (#5816) (#5960) Co-authored-by: ukumawat <[email protected]> Signed-off-by: Duo Zhang <[email protected]>
…cedure (#5816) (#5960) Co-authored-by: ukumawat <[email protected]> Signed-off-by: Duo Zhang <[email protected]> (cherry picked from commit 377c37f)
…cedure (#5816) (#5960) Co-authored-by: ukumawat <[email protected]> Signed-off-by: Duo Zhang <[email protected]> (cherry picked from commit 377c37f)
ServerRemoteProcedure does not have its own serializing implementation, so had to make changes in all the sub-procedures.