-
Notifications
You must be signed in to change notification settings - Fork 3.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
HBASE-24265 Remove hedged rpc call support, implement the logic in MaterRegistry … #1593
Conversation
The function here is like region replica so I do not think we should implement it in the rpc framework, as the prepered way in HBase, is to implement a special RpcRetryingCaller. Anyway MasterRegistry is a bit strange, as it is the root of the client implementation, so we can not make use of the existing rpc retrying caller, but anyway, implement the hedge logic in MasterRegistry is still easier than in the rpc framework. And I dropped the fan out parameter. This is because that, usually we will only have 2 masters, and even if we have more, I do not think it makes much difference. So I only support two options, send the request one by one, or send them altogether at the same time. And since now the logic is in MasterRegistry, both rpc client implementation can work with MasterRegistry. PTAL. @saintstack @bharathv |
🎊 +1 overall
This message was automatically generated. |
💔 -1 overall
This message was automatically generated. |
💔 -1 overall
This message was automatically generated. |
Trying to understand the motivation for this refactor.
I think hedging RPC support is a standard RPC layer thing (for ex grpc: [1]). We can probably make a case to port region replica to hedging framework in the RPC layer. Fun fact, when I first implemented this as a prototype, I had all the logic in the master registry itself and then the review comments from @apurtell and @saintstack were to push it into a layer below, which is RPC and it made sense to me. Also, anyone who wants to implement hedging means they have to write a lot of boiler plate code around synchronizing responses from multiple RPC threads and issuing cancellations etc. It is not trivial and why would anyone want to do that all over again? There are some issues we can fix, like the one you noted in the jira, to fix the usage of common fork join pool (I'm trying to think why I did that, I initially implemented it like a regular async code implementation, but revered it later). Let me think a little bit about why I did this and get back to you.
We use > 5 masters for redundancy in our critical production deployments. If we were to use this, we definitely don't want to spam every master for each of the request. -1 on removing it unless you have a strong counter argument. [1] https://github.com/grpc/proposal/blob/master/A6-client-retries.md#hedging-policy |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just realized this work was triggered by HBASE-24264. How did you narrow it down to testHedgedAsyncEcho? There is not much detail in the jira/commit. Any jstacks? Curious what was hanging in the test and if we can de-flake it.
I think >2 masters is a common deployment pattern. Typical ZooKeeper and HDFS HA deploy requires three coordinator hosts minimum, so it makes sense to deploy 3 masters across these same hosts as well. |
To be clear, I'm good on disabling a flakey test till it gets love. Doesn't mean I in favor of a purge of this hedging facility. |
Things are different in HBase. HBase is a rich client, you can see the implementation of the roc retrying caller. Typically, in HBase, a rpc retrying will lead to a clearing of meta location cache and then relocating, or fetching the new masters. For the case here, you have a test function to verify that whether the return protobuf message is valid right? In your old code, if someone returns an invalid protobuf message, the rpc framework will not consider it as incorrect and will return it to the upper layer and leads to a failure, even if other endpoints may give you the correct result. And if you implement the logic in MasterRegistry directly, you are free to wait request from other masters as well. And implementing hedge read is not an easy work. As I mentioned on the JIRA issue, the implementation is not fully ‘async’, it just execute a blocking operation in a thread pool, and uses an anti pattern in java that execute a critical task in common pool, which may lead to unpredictable dead lock. GRPC has the hedge read support does not mean we have to support the feature. Even if we make use of GRPC directly, as said above, I do not think we will use its hedge read feature directly. Different scenarios. |
And if you guys really want to implement the hedge read support, please open a new issue for it, and implement the feature in both rpc client implementations, and make it suitable for other operations as well, not only for MasterRegistry. For example, support the read replica feature. Now the implementation was pulled in as a side effect of MasterRegistry, and lacked of lots of features. For example, only support NettyRpcClient, do not have built in retries support, etc. To be clear, it is not a good idea to expose a feature which may have a long term impacting when implementing another feature. Developers may accidentally use it and then it becomes a cancer of the code since it is not in a good shape at the beginning. I spent a lot of time to purge the old stale code in sync client, both in rpc implementation and the old retrying caller, especially, the big big AsyncProcess class. I do not want to do it in the future again. Thanks. |
It does not make much difference with 2 or 3 masters, if we have 10 masters then maybe it worths to implement the 'fan out limit' feature, but the code will also be harder to understand, think of a mix of the current 'sending all requests concurrently' and 'sending requests one by one'. Notice that, a blocking wait here is not allowed, so... |
OK, for me it is fine to add this back. I have the confidence to implement this feature correctly. |
Most code of the current rpc-client and hbase-client are written by me so I think I'm more familiar with these things in HBase. We have retries everywhere in hbase-client, but we just use the same RpcRetryingCaller if we have the same retrying logic, so the argument here that if we do not implement hedge read in rpc layer then everyone needs to implement the logic by its own does not make sense. The problem here is the design of the client library. The ConnectionRegistry is the root of everything, we must make it available before we actually create the RpcClient which is used across all the client, so we can not make use of all the RpcRetryingCallers. But this is only a problem for the Registry implementation, not a common problem for the whole hbase client.
Back to the feature in grpc, this is the configs for the feature. Looking at the code in HBase, what do we do when there is an error? We just send the request again to the same endpoint? No! Definitely not. We need to check whether the endpoint is the correct place to go, if not, we change to another endpoint. |
I think what we want in 2.3.0 is MasterRegistry, not hedge read rpc right? So my goal is to make sure MasterRegistry available first and then we start the think of the hedge read feature. If you guys think we must have hedge read feature in the rpc framework in 2.3.0, then I will veto on releasing 2,3.0 with the current code base. Let's at least move the read replicas logic down to the rpc framework layer if we really want to do this(although I do not think this is necessary in HBase). Maybe this will spend several months... |
To be honest, when I looked at the implementation of HedgedRpcChannel I just gave up on finding out the root cause as the code is not real 'async'... |
🎊 +1 overall
This message was automatically generated. |
🎊 +1 overall
This message was automatically generated. |
🎊 +1 overall
This message was automatically generated. |
Okay, let me first summarize your beef with the current approach, for my own understanding and for those following so that they can weigh in too. Your main concerns are as follows. I'll try to answer each one separately.
=====
|
Well, all rpc frameworks have a dream to implement everything. But unfortunately, this is a rpc framework only for HBase, not for everyone. So there is a final judgement on whether or not implement a feature in rpc framework, is that if it introduces more complexity in code. The motivation here is to support hedge read in MasterRegistry, and you can see the patch, "+186 -629", I just added 186 lines but removed 629 lines to just support the same function. OK there are about 100 lines of test code removed, any way, it is still about 200 vs. 500. So I think for supporting MasterRegistry, implement it in MaterRegistry directly is much better than imlementing it in rpc. And the reason is very easy to understand, we do not need extra abstraction... On other usage of hedged reads, the first thing is the read replica feature. If you can prove that, you can use less code to support same feature in rpc framework than RpcRetryingCaller, then I'm OK with implementing the hedged reads feature in rpc. So my suggestion here is still, remove the hedged reads feature to not block the 2.3.0 release, and open another issue for implementing it, as it has much more long term impact than the MasterRegistry feature. Thanks. |
For your statement on BlockingRpcClient, well, I'm fine with removing it as it is outdated, which still uses the tech from the 1990s(or even 1980s?) But I still need to say, as netty is widely used, more developers are not familiar with the word 'NIO' and 'BIO'. The 'Blocking' in its name does not mean the rpc implementation is blocking, it just means the socket is in blocking mode. In netty there is a rarely used Channel type called 'OIO', it is the same thing here. Obviously, you can use netty OIO to write async code, so in HBase you can also use the BlockingRpcClient to implement async code. As you can see that, our async hbase client can run on both NettyRpcClient and BlockingRpcClient. Thanks. |
Oh, forgot to say, in the newest patch I implemented the fan out limit feature. |
Ok, let me read the code for read replicas. I had a brief look at it just now and I see that the timeline reads are very much intertwined with AsyncSingleRequestRpcRetryingCaller (which is common for all gets/puts/...etc). We need to separate it out for hedging gets and I think we can port it to use hedging rpc framework but I don't know about the number of lines, it may be more or less. Anyway, I don't think number of lines is always the right metric to go by.
Well if you already made your decision, I don't think I can do much to change it. The least we could do is to fix it to make it actually async (which is #1601). If you still have other concerns, I think we'd end up blocking 2.3.0 (which is very close AFAICT, based on jiras from @ndimiduk). Let's see what other reviewers think? Thanks. |
Ping @ndimiduk and @saintstack , what do you guys think? I've replied on #1601 , at least the current implementation does not make sense to me, there is no reason that why we can only support NettyRpcClient. FWIW, I do not see any advantages on moving the logic from AsyncRpcRetryingCaller down to the rpc framework, except that the MasterRegistry can make use of it, but implementing the logic in MasterRegistry only requires a very small amount of code piece, and can do better error handling than implementing in the rpc framework, so I do not think it worths. Thanks. |
Your comments on #1601 make sense to me. Thanks for taking a look. It was a misunderstanding on my part that the BlockingRpcConnection implementation always defaulted to a blocking channel implementation, I took a closer look at the code after your comments and that cleared it up. This means it is much easier to support all RpcClients if we wrap the channels in HedgedRpcChannel. So, I think this can be fixed. I think the whole discussion now boils down to whether we want to have the abstraction of hedging in RPC layer or not.
|
After reading the above: It was a mistake suggesting you generalize hedged reads Bharath -- at least for branch-2. I apologize if it made you do more work. On read-replicas in RPC tier, it strikes me that this an base-special in need of support from tiers above RPC? It won't fit? On the notion that the hedged reads addition to RPC implies our building a generic, feature-ful, RPC lib, let us take pause here (sorry if my suggestion that hedged reads be made generally available in RPC helped imply this). Lets not do this. Lets see if we can use an existing RPC lib instead (it may not be possible given the reminder above that retries, read-replicas, and hedging factor info from higher tiers). The Duo async refactor helped undo our RPC tangle. If we could plug in a GRPC, perf allowing, it would be fun being able to just enable ‘pushback’, controlled delay, tracing, etc., because we are up on an RPC lib that just supports these abilities and others (May not be possible in the hbase context but worth investigation). Duo's reminder that we have to be careful when it comes to almost-there optional code is timely because too often, even with the best of intentions, the stuff is not enabled and rots. I'm game to help out here how ever I can. |
Seems we've lead @bharathv down a rabbit hole. These changes are part of a feature new for 2.3.0, so I prefer that they are settled before the release. I'm still in the process of setting up my infrastructure for running ITBLL tests in the peculiarities of my environment (see my recent patches around the chaos monkey tools). Thus there's still time. Please take the time to see this resolved before the first RC, whichever direction you choose to take it. Go ahead and mark appropriate JIRAs as blockers for 2.3.0. My one request is that we update the class names, class-level javadoc, and package level javadoc to more clearly describe what's going on here and why. It seems that what structures we have in place are not enough for multiple developers familiar with the codebase to understand the subtleties of this subsystem. No offense @Apache9, but more people than just you need to be able to provide meaningful code review on our RPC implementation. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@saintstack / @ndimiduk Thanks for chiming in. It looks like we have a consensus to move the logic into the master registry itself.
I apologize if it made you do more work.
No worries.
On read-replicas in RPC tier, it strikes me that this an base-special in need of support from tiers above RPC? It won't fit?
I think it would look out of place given the way the current client code is structured.
I prefer that they are settled before the release.
The refactor is pretty simple actually, I think should be doable before the release. I can review the patch.
hbase-client/src/main/java/org/apache/hadoop/hbase/client/MasterRegistry.java
Outdated
Show resolved
Hide resolved
MASTER_REGISTRY_HEDGED_REQS_FANOUT_DEFAULT); | ||
int rpcTimeoutMs = (int) Math.min(Integer.MAX_VALUE, | ||
conf.getLong(HConstants.HBASE_RPC_TIMEOUT_KEY, HConstants.DEFAULT_HBASE_RPC_TIMEOUT)); | ||
// XXX: we pass cluster id as null here since we do not have a cluster id yet, we have to fetch |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Starting to wonder, why this didn't get flagged in tests. I guess there is some test hole with token based auth..
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just because we do not have a test where MasterRegistry is enabled and we use cluster id to select authentication. Most tests in HBase do not need authentication.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ya, thats what I meant.
hbase-client/src/main/java/org/apache/hadoop/hbase/client/MasterRegistry.java
Show resolved
Hide resolved
future.complete(transformResult.apply(rpcResult)); | ||
}; | ||
// send requests concurrently to hedgedReadsFanout masters | ||
private <T extends Message> void groupCall(CompletableFuture<T> future, int startIndexInclusive, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this could use a comment around logic.., without that, would be difficult to follow.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I added more comments for this method.
hbase-client/src/main/java/org/apache/hadoop/hbase/client/MasterRegistry.java
Show resolved
Hide resolved
hbase-client/src/main/java/org/apache/hadoop/hbase/client/MasterRegistry.java
Show resolved
Hide resolved
if (remaining.decrementAndGet() == 0) { | ||
if (endIndexExclusive == masterStubs.size()) { | ||
// we are done, complete the future with exception | ||
RetriesExhaustedException ex = new RetriesExhaustedException("masters", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: technically these are not retries right? Instead wrap all the errors in to MasterRegistryFetch..?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The RetriesExhaustedException is used to wrap all the exceptions and then it will be wrapped by the MasterRegistryFetchException to include all the master addresses.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ya, my question was about the word "retries". That will show up in the exception message (IIUC). These give a false impression that things are being retried, right (when in reality they are hedged, in some sense).
} | ||
} else { | ||
// do not need to decrement the counter any more as we have already finished the future. | ||
future.complete(r); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What happens to the hedged calls? Shouldn't they be canceled?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There is no way to cancel a already sent rpc call, our rpc framework does not support this feature. For read replica feature there is a delay for the secondary replica calls so we have a chance to cancel the local delayed task but here, we just sent the request out.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was thinking about Call#setException(), which cleans up the caller state and propagates the exception to the the future callback? For example, if a master is hung and the RPC is hung, that state would be cleaned up quicker.
hbase-client/src/main/java/org/apache/hadoop/hbase/client/MasterRegistry.java
Outdated
Show resolved
Hide resolved
hbase-server/src/test/java/org/apache/hadoop/hbase/ipc/AbstractTestIPC.java
Show resolved
Hide resolved
🎊 +1 overall
This message was automatically generated. |
💔 -1 overall
This message was automatically generated. |
💔 -1 overall
This message was automatically generated. |
🎊 +1 overall
This message was automatically generated. |
💔 -1 overall
This message was automatically generated. |
💔 -1 overall
This message was automatically generated. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Patch looks fine to me, just a follow up on hedged call cancellations, otherwise +1 from me. Nice tests.
MASTER_REGISTRY_HEDGED_REQS_FANOUT_DEFAULT); | ||
int rpcTimeoutMs = (int) Math.min(Integer.MAX_VALUE, | ||
conf.getLong(HConstants.HBASE_RPC_TIMEOUT_KEY, HConstants.DEFAULT_HBASE_RPC_TIMEOUT)); | ||
// XXX: we pass cluster id as null here since we do not have a cluster id yet, we have to fetch |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ya, thats what I meant.
if (remaining.decrementAndGet() == 0) { | ||
if (endIndexExclusive == masterStubs.size()) { | ||
// we are done, complete the future with exception | ||
RetriesExhaustedException ex = new RetriesExhaustedException("masters", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ya, my question was about the word "retries". That will show up in the exception message (IIUC). These give a false impression that things are being retried, right (when in reality they are hedged, in some sense).
} | ||
} else { | ||
// do not need to decrement the counter any more as we have already finished the future. | ||
future.complete(r); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was thinking about Call#setException(), which cleans up the caller state and propagates the exception to the the future callback? For example, if a master is hung and the RPC is hung, that state would be cleaned up quicker.
} | ||
future.complete(transformResult.apply(rpcResult)); | ||
}; | ||
// send requests concurrently to hedgedReadsFanout masters. If any of the request is succeeded, we |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: javadoc for method formatting /**
Looks like the test class is missing a class rule (precommit failure). |
I think query multiple masters could also be some types of retry? If this master is not available, we just try another, and if all are failed we will return the exception to you.
Not a big deal? Only a small amount of memory. Implementing this will lead to a more complicated logic, we have to reference the CompletableFutures for other requests and call cancel on them, and we also need to deal with the cancel event and do the clean up work. Comparing to the benefits, I do not think it worths... |
🎊 +1 overall
This message was automatically generated. |
🎊 +1 overall
This message was automatically generated. |
🎊 +1 overall
This message was automatically generated. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Comparing to the benefits, I do not think it worths...
Okay.
…terRegistry … (#1593) Signed-off-by: Bharath Vissapragada <[email protected]>
…terRegistry … (#1593) Signed-off-by: Bharath Vissapragada <[email protected]>
…directly