-
Notifications
You must be signed in to change notification settings - Fork 3.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
HBASE-23647: Make MasterRegistry the default impl. #1039
HBASE-23647: Make MasterRegistry the default impl. #1039
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do not merge this yet. Lets see what breaks in the PR jenkins run.
💔 -1 overall
This message was automatically generated. |
A bunch of tests failed with "Connection Refused" errors on the master ports serving the RPCs. Let me dig into the logs and update here. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Cleans up a subset of test failures. Pushing it out to reduce the noise in the test report. Let's see how it looks like after this change.
Majority of the remaining failures have something to do with the killing of the only active master and that doesn't go well with master based registry implementation. I'm still thinking on how best to fix those cases while still retaining the test coverage.
💔 -1 overall
This message was automatically generated. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Latest patch fixes most tests except a few replication tests that I'm still debugging. Meanwhile, want to do a quick run to make sure nothing else is broken.
💔 -1 overall
This message was automatically generated. |
6ec93fe
to
a9cf71d
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the last pushed patchset fixes all the test issues (atleast for me locally). Lets see how it does in a jenkins run. If its green, the patch is ready for review.
💔 -1 overall
This message was automatically generated. |
💔 -1 overall
This message was automatically generated. |
So the only test failure now is TestFromClientSide? Can this be excluded as not related or a flake? |
Reposting from the jira comment: @apurtell TestFromClientSide was known to be flaky (even before this patch). So I'm fairly certain it has nothing to do with the current jira. One thing @ndimiduk and I noticed here is that switching to JUnit 4.13 (HBASE-23664) has exacerbated the problem. Based on some initial debugging it looks like it has something to do with leaking FileSystem/DFSClient objects in the HBase code and restarting the minicluster in the same JUnit test runner JVM (what TestFromClientSide does) causes a leak of hdfs lease renewer threads. Some how Junit 4.12 was masking the problem. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A couple of suggestions.
Lots of nice test improvements in here too.
// Without the right registry, the above configs are useless. Also, we don't use setClass() | ||
// here because the ConnectionRegistry* classes are not resolvable from this module. | ||
// This will be broken if ZkConnectionRegistry class gets renamed or moved. Is there a better | ||
// way? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You could use findClass and if there's an exception fall through to alternate or recovery code. Anyway, agreed, a reference to a class constant is not called for here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fall through to alternate or recovery code
There is no alternate or recovery from that point, no? The same error is propagated while creating the registry instance, so I guess we don't need to do it again here I think.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This code is going waste away. If user chooses zk registry, this code applies?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agree. It is ok to leave as-is since it is a no-op?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm finding this method used by SyncTable
, TableMapReduceUtil
, TableOutputFormat
, VerifyReplication
, ExportSnapshot
... Per earlier discussion, replication will continue to use the ZK registry... what about the general MapReduce/Spark/Flink use-cases?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm. Good point.
TableMapReduceUtil: This happens only in initCredentialsForCluster() or if hbase.mapred.output.quorum is specified. Basically it only happens for a "peer" cluster. (same for TableOutputFormat, unless QUORUM_ADDRESS for a target is specified this doesn't happen). Overall, I think all MR jobs running on a single cluster will use master registry. I think that answers the MR/Spark/Flink usecases.
Now coming to MR jobs spanning multiple clusters (source and target) Ex: VerifyReplication / ExportSnapshot/SyncTable/TableOutputFormat etc
I think these need to be rewritten with new config params like hbase.[source|target].master.addrs for clients to pass the addresses so that they can use master registry.
What do you think? Should we rewrite them with new configs or maintain compatibility and keep using zkregistry?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Overall, I think all MR jobs running on a single cluster will use master registry. I think that answers the MR/Spark/Flink usecases.
Sounds good.
I think these need to be rewritten with new config params like hbase.[source|target].master.addrs for clients to pass the addresses so that they can use master registry.
I think it's best to reduce the ZK-exposed surface area as much as possible. This seems a reasonable solution to me.
// - Decouples RS and master life cycles. For example, if all the masters are down, region | ||
// servers can abort at the same time, because the internal connection is master dependent and | ||
// fails.This is an operational nightmare. Using the ZK based registry means that the region | ||
// servers are now on the look out for new masters, if they are spun up. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The tense here is confusing.
"Using the ZK based registry means the region servers continue to be independent of master availability..." ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Clarified.
// - We need to retain ZKConnectionRegistry for replication use anyway, so we just extend it for | ||
// other internal connections too. | ||
conf.set(HConstants.CLIENT_CONNECTION_REGISTRY_IMPL_CONF_KEY, | ||
"org.apache.hadoop.hbase.client.ZKConnectionRegistry"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, but these class names can be constant strings at least.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
// ZK registry is so slow that by then the server manager is init'ed thus masking the problem. | ||
// For now, I'm putting a sleep here to workaround the issue, I think the fix for it is a little | ||
// delicate and needs to be done separately. | ||
Thread.sleep(5000); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it possible to use Waiter#waitFor to wait on some condition variable or predicate? Hard coded sleeps in tests tend to fall over on Apache Jenkins because of load issues ie the sleeps can never be long enough...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That is cleaner, done.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks great.
One question probably answered over in core patch, is there any writeup on how things will run from here on out when we have Master Registry in place? If no Masters, clients will have no place to go. What is the failure like? Spewing can't connect? If Master comes up, all smooths out? Etc. Thanks.
// Without the right registry, the above configs are useless. Also, we don't use setClass() | ||
// here because the ConnectionRegistry* classes are not resolvable from this module. | ||
// This will be broken if ZkConnectionRegistry class gets renamed or moved. Is there a better | ||
// way? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This code is going waste away. If user chooses zk registry, this code applies?
hbase-server/src/main/java/org/apache/hadoop/hbase/LocalHBaseCluster.java
Show resolved
Hide resolved
List<String> masterHostPorts = new ArrayList<>(); | ||
getMasters().forEach(masterThread -> | ||
masterHostPorts.add(masterThread.getMaster().getServerName().getAddress().toString())); | ||
conf.set(HConstants.MASTER_ADDRS_KEY, String.join(",", masterHostPorts)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I could be obnoxious here but won't. Better to do as you have done here and just move the code.... not change it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah I rewrote this multiple times to get it working and in that process called it from multiple places, hence separated it out into a method. Doesn't make sense anymore like you pointed out. Moved it to the new place instead.
hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/HRegionServer.java
Show resolved
Hide resolved
// - Decouples RS and master life cycles. RegionServers can continue be up independent of | ||
// masters' availability. | ||
// - Configuration management for region servers (cluster internal) is much simpler when adding | ||
// new masters etc. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't understand this one.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So if one is adding new masters to the cluster (or taking down an existing master), they will have to update the region servers config by doing a rolling restart (unless there is some dynamic way of updating it). This operational overhead is avoided since the regionservers figure it out from ZK. They just have to update it on the clients.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(clarified it in the comment, let me know if it is still unclear)
hbase-server/src/test/java/org/apache/hadoop/hbase/master/AlwaysStandByMasterManager.java
Outdated
Show resolved
Hide resolved
hbase-server/src/test/java/org/apache/hadoop/hbase/master/AlwaysStandByMasterManager.java
Outdated
Show resolved
Hide resolved
hbase-server/src/test/java/org/apache/hadoop/hbase/master/TestAlwaysStandByHMaster.java
Show resolved
Hide resolved
hbase-server/src/test/java/org/apache/hadoop/hbase/master/TestMasterShutdown.java
Show resolved
Hide resolved
@@ -96,7 +96,7 @@ public void testRSTerminationAfterRegisteringToMasterBeforeCreatingEphemeralNode | |||
TEST_UTIL.startMiniZKCluster(); | |||
TEST_UTIL.createRootDir(); | |||
final LocalHBaseCluster cluster = | |||
new LocalHBaseCluster(conf, NUM_MASTERS, NUM_RS, HMaster.class, | |||
new LocalHBaseCluster(conf, NUM_MASTERS, 0, NUM_RS, HMaster.class, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We seem to do '0' as AlwaysMasters. Should we do override that defaults zero? Just to keep the AlwaysMasters out of view when not needed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
💔 -1 overall
This message was automatically generated. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is there any writeup on how things will run from here on out when we have Master Registry in place?
Plan to document as a part of HBASE-23331.
If no Masters, clients will have no place to go. What is the failure like? Spewing can't connect? If Master comes up, all smooths out? Etc. Thanks.
Ya, as long as the master ports (atleast one of the configured ones) didn't change, connection should be back up. Until then spewing of stack traces
// Without the right registry, the above configs are useless. Also, we don't use setClass() | ||
// here because the ConnectionRegistry* classes are not resolvable from this module. | ||
// This will be broken if ZkConnectionRegistry class gets renamed or moved. Is there a better | ||
// way? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agree. It is ok to leave as-is since it is a no-op?
hbase-server/src/main/java/org/apache/hadoop/hbase/LocalHBaseCluster.java
Show resolved
Hide resolved
List<String> masterHostPorts = new ArrayList<>(); | ||
getMasters().forEach(masterThread -> | ||
masterHostPorts.add(masterThread.getMaster().getServerName().getAddress().toString())); | ||
conf.set(HConstants.MASTER_ADDRS_KEY, String.join(",", masterHostPorts)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah I rewrote this multiple times to get it working and in that process called it from multiple places, hence separated it out into a method. Doesn't make sense anymore like you pointed out. Moved it to the new place instead.
hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/HRegionServer.java
Show resolved
Hide resolved
// - Decouples RS and master life cycles. RegionServers can continue be up independent of | ||
// masters' availability. | ||
// - Configuration management for region servers (cluster internal) is much simpler when adding | ||
// new masters etc. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So if one is adding new masters to the cluster (or taking down an existing master), they will have to update the region servers config by doing a rolling restart (unless there is some dynamic way of updating it). This operational overhead is avoided since the regionservers figure it out from ZK. They just have to update it on the clients.
...e-server/src/test/java/org/apache/hadoop/hbase/client/TestFromClientSideWithCoprocessor.java
Outdated
Show resolved
Hide resolved
@@ -60,7 +60,7 @@ public static void setUp() throws Exception { | |||
TEST_UTIL.getConfiguration().setInt(HConstants.META_REPLICAS_NUM, 3); | |||
TEST_UTIL.startMiniCluster(3); | |||
REGISTRY = ConnectionRegistryFactory.getRegistry(TEST_UTIL.getConfiguration()); | |||
RegionReplicaTestHelper.waitUntilAllMetaReplicasHavingRegionLocation( | |||
RegionReplicaTestHelper.waitUntilAllMetaReplicasAreReady(TEST_UTIL, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
good catch, you are right. For some reason I thought there are some callers that pass some custom config but I was wrong.
hbase-server/src/test/java/org/apache/hadoop/hbase/master/AlwaysStandByMasterManager.java
Outdated
Show resolved
Hide resolved
hbase-server/src/test/java/org/apache/hadoop/hbase/master/TestMasterShutdown.java
Show resolved
Hide resolved
@@ -96,7 +96,7 @@ public void testRSTerminationAfterRegisteringToMasterBeforeCreatingEphemeralNode | |||
TEST_UTIL.startMiniZKCluster(); | |||
TEST_UTIL.createRootDir(); | |||
final LocalHBaseCluster cluster = | |||
new LocalHBaseCluster(conf, NUM_MASTERS, NUM_RS, HMaster.class, | |||
new LocalHBaseCluster(conf, NUM_MASTERS, 0, NUM_RS, HMaster.class, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
💔 -1 overall
This message was automatically generated. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
....and I felt this is redundant (tests the same thing) so removed. You prefer to retain it?
We need more of this sort of judgement -- purging redundant tests.
They just have to update it on the clients.
Hmm. If a client has three Masters, and the operators remove one, what then? Operator has to update client configs to purge the removed Master?
I suppose we have same issue w/ zk ensemble list.
Agree, it is a test-only construct. Didn't leak into the server module. Anything else should be done?
No. Just hide it in tests as much as you can. Be explicit it is a test-only item (which I think you do...).
Agree. It is ok to leave as-is since it is a no-op?
Yes. Ok as is.
Ya, thats the same problem, agreed. |
d9bb034
to
62da419
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Rebased and squashed after a force push to the feature branch. Was a clean one.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have a couple small change requests, but overall this is looking good.
ZKConfig.ZKClusterKey zkClusterKey = ZKConfig.transformClusterKey(key); | ||
conf.set(HConstants.ZOOKEEPER_QUORUM, zkClusterKey.getQuorumString()); | ||
conf.setInt(HConstants.ZOOKEEPER_CLIENT_PORT, zkClusterKey.getClientPort()); | ||
conf.set(HConstants.ZOOKEEPER_ZNODE_PARENT, zkClusterKey.getZnodeParent()); | ||
// Without the right registry, the above configs are useless. Also, we don't use setClass() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there a check-then-fail that can be done to assert that the method invocation has significance? Maybe log a warning saying zk configs are being applied to a master registry?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added the logging. Don't think it should be warn though.
// Without the right registry, the above configs are useless. Also, we don't use setClass() | ||
// here because the ConnectionRegistry* classes are not resolvable from this module. | ||
// This will be broken if ZkConnectionRegistry class gets renamed or moved. Is there a better | ||
// way? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm finding this method used by SyncTable
, TableMapReduceUtil
, TableOutputFormat
, VerifyReplication
, ExportSnapshot
... Per earlier discussion, replication will continue to use the ZK registry... what about the general MapReduce/Spark/Flink use-cases?
final int noRegionServers, final Class<? extends HMaster> masterClass, | ||
final Class<? extends HRegionServer> regionServerClass) | ||
throws IOException { | ||
final int noAlwaysStandByMasters, final int noRegionServers, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ugh. The final constructor args are Configuration, int, int, int, Class, Class
. It's time for a builder with named arguments.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ya, too long and confusing now.
hbase-server/src/test/java/org/apache/hadoop/hbase/HBaseTestingUtility.java
Show resolved
Hide resolved
hbase-server/src/test/java/org/apache/hadoop/hbase/client/RegionReplicaTestHelper.java
Outdated
Show resolved
Hide resolved
hbase-server/src/test/java/org/apache/hadoop/hbase/master/TestAlwaysStandByHMaster.java
Outdated
Show resolved
Hide resolved
// the master. The reason it was not happening earlier is because the connection creation with | ||
// ZK registry is so slow that by then the server manager is init'ed thus masking the problem. | ||
// For now, I'm putting a wait() here to workaround the issue, I think the fix for it is a | ||
// little delicate and needs to be done separately. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What's the fix you have in mind? Reading through your comment, I thought of a test-only coprocessor that installs a latch that can be waited upon by the test.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was thinking about fixing the actual problem rather than the test. If a shutdown happens before starting the server manager, the latter takes note of that during init and shuts down automatically (or something along the lines). My fix is in the master bootstrap.
...server/src/test/java/org/apache/hadoop/hbase/master/assignment/TestRegionMoveAndAbandon.java
Show resolved
Hide resolved
hbase-server/src/test/java/org/apache/hadoop/hbase/security/token/SecureTestCluster.java
Show resolved
Hide resolved
hbase-client/src/main/java/org/apache/hadoop/hbase/client/ConnectionRegistryFactory.java
Show resolved
Hide resolved
💔 -1 overall
This message was automatically generated. |
💔 -1 overall
This message was automatically generated. |
Signed-off-by: Stack <[email protected]> Signed-off-by: Nick Dimiduk <[email protected]> Signed-off-by: Andrew Purtell <[email protected]>
Signed-off-by: Stack <[email protected]> Signed-off-by: Nick Dimiduk <[email protected]> Signed-off-by: Andrew Purtell <[email protected]>
Signed-off-by: Stack <[email protected]> Signed-off-by: Nick Dimiduk <[email protected]> Signed-off-by: Andrew Purtell <[email protected]>
Signed-off-by: Stack <[email protected]> Signed-off-by: Nick Dimiduk <[email protected]> Signed-off-by: Andrew Purtell <[email protected]>
Signed-off-by: Stack <[email protected]> Signed-off-by: Nick Dimiduk <[email protected]> Signed-off-by: Andrew Purtell <[email protected]>
Signed-off-by: Stack <[email protected]> Signed-off-by: Nick Dimiduk <[email protected]> Signed-off-by: Andrew Purtell <[email protected]>
Signed-off-by: Stack <[email protected]> Signed-off-by: Nick Dimiduk <[email protected]> Signed-off-by: Andrew Purtell <[email protected]>
Signed-off-by: Stack <[email protected]> Signed-off-by: Nick Dimiduk <[email protected]> Signed-off-by: Andrew Purtell <[email protected]>
Signed-off-by: Stack <[email protected]> Signed-off-by: Nick Dimiduk <[email protected]> Signed-off-by: Andrew Purtell <[email protected]>
Signed-off-by: Stack <[email protected]> Signed-off-by: Nick Dimiduk <[email protected]> Signed-off-by: Andrew Purtell <[email protected]>
Signed-off-by: Stack <[email protected]> Signed-off-by: Nick Dimiduk <[email protected]> Signed-off-by: Andrew Purtell <[email protected]>
Signed-off-by: Stack <[email protected]> Signed-off-by: Nick Dimiduk <[email protected]> Signed-off-by: Andrew Purtell <[email protected]>
Signed-off-by: Stack <[email protected]> Signed-off-by: Nick Dimiduk <[email protected]> Signed-off-by: Andrew Purtell <[email protected]>
Signed-off-by: Stack <[email protected]> Signed-off-by: Nick Dimiduk <[email protected]> Signed-off-by: Andrew Purtell <[email protected]>
Signed-off-by: Stack <[email protected]> Signed-off-by: Nick Dimiduk <[email protected]> Signed-off-by: Andrew Purtell <[email protected]>
Signed-off-by: Stack <[email protected]> Signed-off-by: Nick Dimiduk <[email protected]> Signed-off-by: Andrew Purtell <[email protected]> (cherry picked from commit 229b8aa)
Signed-off-by: Stack <[email protected]> Signed-off-by: Nick Dimiduk <[email protected]> Signed-off-by: Andrew Purtell <[email protected]> (cherry picked from commit 229b8aa)
Signed-off-by: Stack <[email protected]> Signed-off-by: Nick Dimiduk <[email protected]> Signed-off-by: Andrew Purtell <[email protected]> (cherry picked from commit 229b8aa)
Signed-off-by: Stack <[email protected]> Signed-off-by: Nick Dimiduk <[email protected]> Signed-off-by: Andrew Purtell <[email protected]> (cherry picked from commit 229b8aa)
Signed-off-by: Stack <[email protected]> Signed-off-by: Nick Dimiduk <[email protected]> Signed-off-by: Andrew Purtell <[email protected]> (cherry picked from commit 229b8aa)
Signed-off-by: Stack <[email protected]> Signed-off-by: Nick Dimiduk <[email protected]> Signed-off-by: Andrew Purtell <[email protected]>
Signed-off-by: Stack <[email protected]> Signed-off-by: Nick Dimiduk <[email protected]> Signed-off-by: Andrew Purtell <[email protected]>
No description provided.