Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

HBASE-24518 : waitForNamespaceOnline() should return false if any region is offline #1869

Merged
merged 1 commit into from
Jun 17, 2020

Conversation

virajjasani
Copy link
Contributor

No description provided.

@bsglz
Copy link
Contributor

bsglz commented Jun 8, 2020

The method name "isRegionOnline" seems a bit confused, it actually wait a region online, so WDYT "waitForRegionOnline"? Not related to this PR, just say.

@Apache-HBase
Copy link

🎊 +1 overall

Vote Subsystem Runtime Comment
+0 🆗 reexec 0m 53s Docker mode activated.
_ Prechecks _
+1 💚 dupname 0m 0s No case conflicting files found.
+1 💚 hbaseanti 0m 0s Patch does not have any anti-patterns.
+1 💚 @author 0m 0s The patch does not contain any @author tags.
_ master Compile Tests _
+1 💚 mvninstall 3m 52s master passed
+1 💚 checkstyle 1m 15s master passed
+1 💚 spotbugs 1m 59s master passed
_ Patch Compile Tests _
+1 💚 mvninstall 3m 21s the patch passed
+1 💚 checkstyle 1m 3s the patch passed
+1 💚 whitespace 0m 0s The patch has no whitespace issues.
+1 💚 hadoopcheck 11m 19s Patch does not cause any errors with Hadoop 3.1.2 3.2.1.
+1 💚 spotbugs 2m 10s the patch passed
_ Other Tests _
+1 💚 asflicense 0m 14s The patch does not generate ASF License warnings.
33m 26s
Subsystem Report/Notes
Docker Client=19.03.11 Server=19.03.11 base: https://builds.apache.org/job/HBase-PreCommit-GitHub-PR/job/PR-1869/1/artifact/yetus-general-check/output/Dockerfile
GITHUB PR #1869
Optional Tests dupname asflicense spotbugs hadoopcheck hbaseanti checkstyle
uname Linux 16854658c7ef 4.15.0-58-generic #64-Ubuntu SMP Tue Aug 6 11:12:41 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux
Build tool maven
Personality dev-support/hbase-personality.sh
git revision master / 9d0baa9
Max. process+thread count 94 (vs. ulimit of 12500)
modules C: hbase-server U: hbase-server
Console output https://builds.apache.org/job/HBase-PreCommit-GitHub-PR/job/PR-1869/1/console
versions git=2.17.1 maven=(cecedd343002696d0abb50b32b541b8a6ba2883f) spotbugs=3.1.12
Powered by Apache Yetus 0.11.1 https://yetus.apache.org

This message was automatically generated.

@virajjasani
Copy link
Contributor Author

The method name "isRegionOnline" seems a bit confused, it actually wait a region online, so WDYT "waitForRegionOnline"? Not related to this PR, just say.

You are correct, isRegionOnline waits for region with retries and exponential backoffs on retry intervals. But it does return false if after all attempts region is still offline, and that's what we are missing to propagate back to HMaster with waitForNamespaceOnline() method.

@bsglz
Copy link
Contributor

bsglz commented Jun 8, 2020

Yeah, nice catch.

@Apache-HBase
Copy link

🎊 +1 overall

Vote Subsystem Runtime Comment
+0 🆗 reexec 0m 42s Docker mode activated.
-0 ⚠️ yetus 0m 3s Unprocessed flag(s): --brief-report-file --spotbugs-strict-precheck --whitespace-eol-ignore-list --whitespace-tabs-ignore-list --quick-hadoopcheck
_ Prechecks _
_ master Compile Tests _
+1 💚 mvninstall 4m 19s master passed
+1 💚 compile 1m 4s master passed
+1 💚 shadedjars 6m 9s branch has no errors when building our shaded downstream artifacts.
+1 💚 javadoc 0m 41s master passed
_ Patch Compile Tests _
+1 💚 mvninstall 3m 34s the patch passed
+1 💚 compile 0m 56s the patch passed
+1 💚 javac 0m 56s the patch passed
+1 💚 shadedjars 5m 38s patch has no errors when building our shaded downstream artifacts.
+1 💚 javadoc 0m 38s the patch passed
_ Other Tests _
+1 💚 unit 138m 53s hbase-server in the patch passed.
164m 49s
Subsystem Report/Notes
Docker Client=19.03.11 Server=19.03.11 base: https://builds.apache.org/job/HBase-PreCommit-GitHub-PR/job/PR-1869/1/artifact/yetus-jdk8-hadoop3-check/output/Dockerfile
GITHUB PR #1869
Optional Tests javac javadoc unit shadedjars compile
uname Linux f21570eebe74 4.15.0-58-generic #64-Ubuntu SMP Tue Aug 6 11:12:41 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux
Build tool maven
Personality dev-support/hbase-personality.sh
git revision master / 9d0baa9
Default Java 1.8.0_232
Test Results https://builds.apache.org/job/HBase-PreCommit-GitHub-PR/job/PR-1869/1/testReport/
Max. process+thread count 4232 (vs. ulimit of 12500)
modules C: hbase-server U: hbase-server
Console output https://builds.apache.org/job/HBase-PreCommit-GitHub-PR/job/PR-1869/1/console
versions git=2.17.1 maven=(cecedd343002696d0abb50b32b541b8a6ba2883f)
Powered by Apache Yetus 0.11.1 https://yetus.apache.org

This message was automatically generated.

@Apache-HBase
Copy link

🎊 +1 overall

Vote Subsystem Runtime Comment
+0 🆗 reexec 1m 14s Docker mode activated.
-0 ⚠️ yetus 0m 2s Unprocessed flag(s): --brief-report-file --spotbugs-strict-precheck --whitespace-eol-ignore-list --whitespace-tabs-ignore-list --quick-hadoopcheck
_ Prechecks _
_ master Compile Tests _
+1 💚 mvninstall 4m 48s master passed
+1 💚 compile 1m 10s master passed
+1 💚 shadedjars 6m 23s branch has no errors when building our shaded downstream artifacts.
-0 ⚠️ javadoc 0m 45s hbase-server in master failed.
_ Patch Compile Tests _
+1 💚 mvninstall 4m 37s the patch passed
+1 💚 compile 1m 8s the patch passed
+1 💚 javac 1m 8s the patch passed
+1 💚 shadedjars 6m 26s patch has no errors when building our shaded downstream artifacts.
-0 ⚠️ javadoc 0m 41s hbase-server in the patch failed.
_ Other Tests _
+1 💚 unit 188m 46s hbase-server in the patch passed.
217m 48s
Subsystem Report/Notes
Docker Client=19.03.11 Server=19.03.11 base: https://builds.apache.org/job/HBase-PreCommit-GitHub-PR/job/PR-1869/1/artifact/yetus-jdk11-hadoop3-check/output/Dockerfile
GITHUB PR #1869
Optional Tests javac javadoc unit shadedjars compile
uname Linux aebea0e479fa 4.15.0-74-generic #84-Ubuntu SMP Thu Dec 19 08:06:28 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux
Build tool maven
Personality dev-support/hbase-personality.sh
git revision master / 9d0baa9
Default Java 2020-01-14
javadoc https://builds.apache.org/job/HBase-PreCommit-GitHub-PR/job/PR-1869/1/artifact/yetus-jdk11-hadoop3-check/output/branch-javadoc-hbase-server.txt
javadoc https://builds.apache.org/job/HBase-PreCommit-GitHub-PR/job/PR-1869/1/artifact/yetus-jdk11-hadoop3-check/output/patch-javadoc-hbase-server.txt
Test Results https://builds.apache.org/job/HBase-PreCommit-GitHub-PR/job/PR-1869/1/testReport/
Max. process+thread count 3862 (vs. ulimit of 12500)
modules C: hbase-server U: hbase-server
Console output https://builds.apache.org/job/HBase-PreCommit-GitHub-PR/job/PR-1869/1/console
versions git=2.17.1 maven=(cecedd343002696d0abb50b32b541b8a6ba2883f)
Powered by Apache Yetus 0.11.1 https://yetus.apache.org

This message was automatically generated.

@virajjasani virajjasani requested a review from Apache9 June 8, 2020 19:38
@virajjasani
Copy link
Contributor Author

@Apache9 Can you please take a look when you have bandwidth?
Thanks

@anoopsjohn
Copy link
Contributor

Is this applicable only for master branch?

@virajjasani
Copy link
Contributor Author

virajjasani commented Jun 9, 2020

Is this applicable only for master branch?

master and branch-2.x too. Ideally, this is something to be considered when we upgrade from 1.x to any 2.x (1.x will have namespace and good to ensure all regions come up before HMaster completed rest of initializations and then migrate namespace table data to meta table).

@virajjasani
Copy link
Contributor Author

@anoopsjohn Hope you are fine with this patch going to all 2.x branches.

@@ -1285,7 +1285,9 @@ private boolean waitForNamespaceOnline() throws InterruptedException, IOExceptio
}
// Else there are namespace regions up in meta. Ensure they are assigned before we go on.
for (RegionInfo ri : ris) {
isRegionOnline(ri);
if (!isRegionOnline(ri)) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually isRegionOnline() is waiting in a loop until this region's status become opened and that server is online. So there is no bug as such right? isRegionOnline() might return false iff the server is being stopped

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same question. So even if one region is offline you will not wait for namespace correct? That means that indicates some case where regions are gong down? What if a split region was opened and that has status offline? Even then we wont wait for namespace?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

isRegionOnline might also return false if region doesn't come online after all retry attempts with exponential backoff are completed (rare but valid case)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

BTW our RetryCounterFactory is with infinite retry. maxAttempts are Integer.MAX_VALUE. Also we don't account for a case whether RetryCounterFactory finishes the retry attempts.
There is no harm in adding this fix. But should we see whether the RetryCounterFactory usage is correct also?
Am I missing some thing?

Copy link
Contributor Author

@virajjasani virajjasani Jun 12, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh I see, for infinite retries, it is not a problem but then ideally we should change the signature of isRegionOnline() because boolean is of no use. Also, it is being used in similar manner for meta region also, and that's why I wanted to make namespace region open logic look same.

e.g

  public boolean waitForMetaOnline() throws InterruptedException {
    return isRegionOnline(RegionInfoBuilder.FIRST_META_REGIONINFO);
  }

However, now I understand the main purpose of isRegionOnline() which is to keep retrying infinite times and only when server is going down, return false. And with the current logic, waitForNamespaceOnline() returns true no matter what isRegionOnline() returns and hence, instead of returning from HMaster active initializations, it proceeds further.

    if (!waitForNamespaceOnline()) {
      return;
    }
    status.setStatus("Starting cluster schema service");
    initClusterSchemaService();
....
....
....

We should return from the above if condition ideally. Good to maintain same logic for namespace and meta regions initializations. Thought?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ramkrish86 from this code block, we just return false and we return back from finishActiveMasterInitialization() without any further initialization because mostly this will happen when HM server is going down so we expect finishActiveMasterInitialization() to return instead of going further with any further init:

finishActiveMasterInitialization():

    if (!waitForNamespaceOnline()) {
      return;
    }

waitForNamespaceOnline():

      if (!isRegionOnline(ri)) {
        return false;
      }

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ya seeing the meta case this seems fine. Just a unification. +1.

@virajjasani virajjasani requested a review from petersomogyi June 14, 2020 07:01
Copy link
Contributor

@ramkrish86 ramkrish86 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One thing is that if namespace does not come online and it keeps retrying infinitely u wil not be able to perform any operations is what I have seen. So if you are returning out of this just be careful that u r doing it in legitmate cases.

@@ -1285,7 +1285,9 @@ private boolean waitForNamespaceOnline() throws InterruptedException, IOExceptio
}
// Else there are namespace regions up in meta. Ensure they are assigned before we go on.
for (RegionInfo ri : ris) {
isRegionOnline(ri);
if (!isRegionOnline(ri)) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same question. So even if one region is offline you will not wait for namespace correct? That means that indicates some case where regions are gong down? What if a split region was opened and that has status offline? Even then we wont wait for namespace?

@virajjasani virajjasani merged commit 192dade into apache:master Jun 17, 2020
virajjasani added a commit that referenced this pull request Jun 17, 2020
clarax pushed a commit to clarax/hbase that referenced this pull request Nov 15, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants