-
Notifications
You must be signed in to change notification settings - Fork 3.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
HBASE-24518 : waitForNamespaceOnline() should return false if any region is offline #1869
Conversation
The method name "isRegionOnline" seems a bit confused, it actually wait a region online, so WDYT "waitForRegionOnline"? Not related to this PR, just say. |
🎊 +1 overall
This message was automatically generated. |
You are correct, |
Yeah, nice catch. |
🎊 +1 overall
This message was automatically generated. |
🎊 +1 overall
This message was automatically generated. |
@Apache9 Can you please take a look when you have bandwidth? |
Is this applicable only for master branch? |
master and branch-2.x too. Ideally, this is something to be considered when we upgrade from 1.x to any 2.x (1.x will have namespace and good to ensure all regions come up before HMaster completed rest of initializations and then migrate namespace table data to meta table). |
@anoopsjohn Hope you are fine with this patch going to all 2.x branches. |
@@ -1285,7 +1285,9 @@ private boolean waitForNamespaceOnline() throws InterruptedException, IOExceptio | |||
} | |||
// Else there are namespace regions up in meta. Ensure they are assigned before we go on. | |||
for (RegionInfo ri : ris) { | |||
isRegionOnline(ri); | |||
if (!isRegionOnline(ri)) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually isRegionOnline() is waiting in a loop until this region's status become opened and that server is online. So there is no bug as such right? isRegionOnline() might return false iff the server is being stopped
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same question. So even if one region is offline you will not wait for namespace correct? That means that indicates some case where regions are gong down? What if a split region was opened and that has status offline? Even then we wont wait for namespace?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
isRegionOnline
might also return false if region doesn't come online after all retry attempts with exponential backoff are completed (rare but valid case)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
BTW our RetryCounterFactory is with infinite retry. maxAttempts are Integer.MAX_VALUE. Also we don't account for a case whether RetryCounterFactory finishes the retry attempts.
There is no harm in adding this fix. But should we see whether the RetryCounterFactory usage is correct also?
Am I missing some thing?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh I see, for infinite retries, it is not a problem but then ideally we should change the signature of isRegionOnline()
because boolean is of no use. Also, it is being used in similar manner for meta region also, and that's why I wanted to make namespace region open logic look same.
e.g
public boolean waitForMetaOnline() throws InterruptedException {
return isRegionOnline(RegionInfoBuilder.FIRST_META_REGIONINFO);
}
However, now I understand the main purpose of isRegionOnline()
which is to keep retrying infinite times and only when server is going down, return false. And with the current logic, waitForNamespaceOnline()
returns true no matter what isRegionOnline()
returns and hence, instead of returning from HMaster active initializations, it proceeds further.
if (!waitForNamespaceOnline()) {
return;
}
status.setStatus("Starting cluster schema service");
initClusterSchemaService();
....
....
....
We should return from the above if condition ideally. Good to maintain same logic for namespace and meta regions initializations. Thought?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@ramkrish86 from this code block, we just return false and we return back from finishActiveMasterInitialization()
without any further initialization because mostly this will happen when HM server is going down so we expect finishActiveMasterInitialization()
to return instead of going further with any further init:
finishActiveMasterInitialization()
:
if (!waitForNamespaceOnline()) {
return;
}
waitForNamespaceOnline()
:
if (!isRegionOnline(ri)) {
return false;
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ya seeing the meta case this seems fine. Just a unification. +1.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One thing is that if namespace does not come online and it keeps retrying infinitely u wil not be able to perform any operations is what I have seen. So if you are returning out of this just be careful that u r doing it in legitmate cases.
@@ -1285,7 +1285,9 @@ private boolean waitForNamespaceOnline() throws InterruptedException, IOExceptio | |||
} | |||
// Else there are namespace regions up in meta. Ensure they are assigned before we go on. | |||
for (RegionInfo ri : ris) { | |||
isRegionOnline(ri); | |||
if (!isRegionOnline(ri)) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same question. So even if one region is offline you will not wait for namespace correct? That means that indicates some case where regions are gong down? What if a split region was opened and that has status offline? Even then we wont wait for namespace?
…ion is offline (#1869) Signed-off-by: ramkrish86 <[email protected]>
…ion is offline (apache#1869) Signed-off-by: ramkrish86 <[email protected]>
No description provided.