Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

HBASE-26468 Region Server doesn't exit cleanly incase it crashes. #3862

Merged
merged 10 commits into from
Dec 1, 2021
Original file line number Diff line number Diff line change
Expand Up @@ -25,6 +25,7 @@
import java.util.Locale;
import java.util.Map.Entry;
import java.util.Set;
import java.util.concurrent.atomic.AtomicInteger;

import org.apache.hadoop.hbase.HConstants;
import org.apache.hadoop.conf.Configuration;
Expand Down Expand Up @@ -141,18 +142,51 @@ public static void logProcessInfo(Configuration conf) {
}

/**
* Parse and run the given command line. This may exit the JVM if
* a nonzero exit code is returned from <code>run()</code>.
* Parse and run the given command line. This will exit the JVM with
* the exit code returned from <code>run()</code>.
* If return code is 0, wait for atmost 30 seconds for all non-daemon threads to quit,
* otherwise exit the jvm
*/
public void doMain(String args[]) {
try {
int ret = ToolRunner.run(HBaseConfiguration.create(), this, args);
if (ret != 0) {
System.exit(ret);
}
// Return code is 0 here.
boolean forceStop = false;
long now = EnvironmentEdgeManager.currentTime();
virajjasani marked this conversation as resolved.
Show resolved Hide resolved
while (isNonDaemonThreadRunning()) {
if (EnvironmentEdgeManager.currentTime() - now > 30 * 1000) {
forceStop = true;
break;
}
Thread.sleep(1000);
}
if (forceStop) {
LOG.error("Failed to stop all non-daemon threads, so force quitting");
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

WARN or FATAL log level might be more suitable

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The process has already stopped/aborted so I don't want to log it at FATAL. So I thought ERROR is more appropriate. Kept it at ERROR. Please let me know if you feel v strong against ERROR.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ERROR seems good 👍
How about Failed to stop all non-daemon threads, so terminating the regionserver JVM ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removed regionserver word from above message since this is common to HMaster also.

System.exit(-1);
}
} catch (Exception e) {
LOG.error("Failed to run", e);
System.exit(-1);
}
}

/**
* Checks whether any non-daemon thread is running.
* @return true if there are non daemon threads running, otherwise false
*/
public boolean isNonDaemonThreadRunning() {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Move this method to Threads class under hbase-common?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done !

AtomicInteger nonDaemonThreadCount = new AtomicInteger();
Set<Thread> threads = Thread.getAllStackTraces().keySet();
threads.forEach(t -> {
// Exclude current thread
if (t.getId() != Thread.currentThread().getId() && !t.isDaemon()) {
nonDaemonThreadCount.getAndIncrement();
LOG.info("Non daemon thread name: " + t.getName() + " still alive");
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: use placeholder {} ?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It will be good if we could print the stack trace here.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done. Please review again.

}
});
return nonDaemonThreadCount.get() > 0;
}
}