HBASE-27650 Merging empty regions corrupts meta cache #5037

bbeaudreault · 2023-02-17T16:20:06Z

No description provided.

hbase-client/src/main/java/org/apache/hadoop/hbase/client/AsyncNonMetaRegionLocator.java

bbeaudreault · 2023-02-17T16:26:28Z

hbase-client/src/main/java/org/apache/hadoop/hbase/client/AsyncNonMetaRegionLocator.java

+    boolean isLast = Bytes.equals(region.getEndKey(), HConstants.EMPTY_END_ROW);
+
+    while (true) {
+      Map.Entry<byte[], RegionLocations> overlap =


Another option here would have been to iterate a subMap. I felt this approach was better because merges are rare and in almost all cases we'll exit here after just 1 floorEntry/lastEntry call and 1 reference equality check. Using a subMap requires at least 2 comparator comparisons, to get the head and tail of the subMap.

bbeaudreault · 2023-02-17T16:27:02Z

hbase-client/src/main/java/org/apache/hadoop/hbase/client/AsyncNonMetaRegionLocator.java

@@ -442,6 +485,10 @@ private RegionLocations locateRowInCache(TableCache tableCache, TableName tableN
      recordCacheHit();
      return locs;
    } else {
+      if (LOG.isTraceEnabled()) {


This log was helpful for diagnosing this bug, so I decided to keep it.

hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/HRegionServer.java

bbeaudreault · 2023-02-17T23:35:28Z

Will look into test failures shortly. They look related

hbase-client/src/main/java/org/apache/hadoop/hbase/client/AsyncNonMetaRegionLocator.java

Apache9 · 2023-02-18T09:09:55Z

hbase-client/src/main/java/org/apache/hadoop/hbase/client/AsyncNonMetaRegionLocator.java

+     * possible, calls beforeUpdate callback prior to making a change. Calls afterUpdate callback
+     * after making a change.
+     */
+    public synchronized void remove(HRegionLocation loc, Runnable beforeUpdate,


We need to be careful that the beforeUpdate and afterUpdate do not hold other locks otherwise it may introduce dead lock

Let me see if I can remove the callbacks. I was trying to keep the metaLocation.onError stuff out of here. I wasn't sure if the onError call needed to happen at the exact point

If it is not easy, just add more comments to warn others.

I removed the callbacks. I think remove can just return a boolean, and both actions can just happen after if an action was taken.

hbase-client/src/main/java/org/apache/hadoop/hbase/client/AsyncNonMetaRegionLocator.java

hbase-server/src/test/java/org/apache/hadoop/hbase/client/TestMetaCache.java

hbase-client/src/main/java/org/apache/hadoop/hbase/client/AsyncNonMetaRegionLocator.java

Apache9 · 2023-02-18T09:29:58Z

hbase-client/src/main/java/org/apache/hadoop/hbase/client/AsyncNonMetaRegionLocator.java

+
+      boolean isLast = isEmptyStopRow(region.getEndKey());
+
+      while (true) {


I think the implementation can still be improved? Now it will stop when we hit the location itself, but it is still possible that an region whose startKey is less than this location but the endKey is greater than the startKey of this location.

Once a region is merged, meta scan will only return the new child region whose start and end should fully encompass all the merged regions. So that is the only case we need to solve, which is handled here.

What you describe would only be possible if we tried to cache one of the merged parent regions. That should not happen.

Does that make sense?

I think theoretically, it is possible that the regions are merged/split again and again, for example all regions are merged to one, and then the region is split to multiple regions again. In this way, the boundaries can be anything...

Ok that makes sense, but not sure it's an issue. Just to be clear, I'm open to making a change here. I'm just trying to think through this, so please bear with me.

The problem we are trying to solve is due to how we use floorEntry to query the cache. Using floorEntry opens us to a problem where a stale cache entry with startKey greater than the correct one can cause the correct one to never be returned. The current solution solves that.

Since we use floorEntry, I don't think stale entries with startKey less than the correct location are really a problem. They would exist in cache but not cause any issues. If a request went to them they would be cleared. If they got overlapped by another region they'd be cleaned up at that point. Assuming a relatively active table, it would all clean up over time as different regions get requested.

That said, I did think about how to do it. All entries are indexed by startKey, but we're concerned about endKey. We could pretty easily check the entry just prior to the cached location. But that doesn't cover us. Theoretically even the first entry in the cache could have an endKey that overlaps.

So the only way to fully be sure of no overlaps given the endless possibility is to fully check all entries to the head of the cache. I don't think this is worth it given there could be many thousands of regions for a table and there sometimes be bursts of regions being cached which would all have to scan to head. We could also keep a secondary index by endKey, but again don't think it's worth the complexity given these don't cause issues.

Thoughts?

bbeaudreault · 2023-02-23T14:18:17Z

@Apache9 any chance you have time for another look here? Hopefully my reasoning above makes sense and we can keep the current implementation?

Apache9 · 2023-02-23T15:50:12Z

I'm on a business trip until next Wednesday, so do not have much time to access gmail and github(and you know, in China you need to use something like a proxy to access gmail...)

I think your argument is reasonable, but I need more time to think whether there are some concern cases we do not cover.

Please give me sometime...

Thanks.

bbeaudreault · 2023-02-23T15:51:33Z

Thanks for the update @Apache9, no worries. I will await your reply next week.

Apache9 · 2023-02-25T15:37:18Z

After consideration, I think we can make this assumption

The problem can only occur when the new region fully cover an old region, for example, we have Start_New < Start_Old < End_Old < End_New, then if we only access within range [End_Old, End_New], then it will always return the old region but it will then find out the row is not in the range, and try to get the new region, and then we get [Start_New, End_New), still fall into the same situation.

If Start_Old is less than Start_New, even if we have overlap, it is not a problem, as when the row is greater than Start_New, we will locate to the new region, and if the row is less than Start_New, it will fall into the old region's range and we will try to access the region and get a NotServing exception, and then we will clean the cache.

So I think the implementation here is OK. But let's add more comments here so later developers could know it better. And better rename the method to something like 'cleanProblematicOverlappedRegions' so developers could know that the design here is not to clean all the overlapped regions, just the ones which could cause trouble.

Thanks.

Apache-HBase · 2023-02-26T13:39:01Z

🎊 +1 overall

Vote	Subsystem	Runtime	Comment
+0 🆗	reexec	0m 35s	Docker mode activated.
		_ Prechecks _
+1 💚	dupname	0m 0s	No case conflicting files found.
+1 💚	hbaseanti	0m 0s	Patch does not have any anti-patterns.
+1 💚	@author	0m 0s	The patch does not contain any @author tags.
		_ master Compile Tests _
+0 🆗	mvndep	0m 11s	Maven dependency ordering for branch
+1 💚	mvninstall	4m 48s	master passed
+1 💚	compile	3m 34s	master passed
+1 💚	checkstyle	0m 52s	master passed
+1 💚	spotless	0m 46s	branch has no errors when running spotless:check.
+1 💚	spotbugs	2m 37s	master passed
		_ Patch Compile Tests _
+0 🆗	mvndep	0m 11s	Maven dependency ordering for patch
+1 💚	mvninstall	4m 37s	the patch passed
+1 💚	compile	3m 20s	the patch passed
+1 💚	javac	3m 20s	the patch passed
+1 💚	checkstyle	0m 51s	the patch passed
+1 💚	whitespace	0m 0s	The patch has no whitespace issues.
+1 💚	hadoopcheck	20m 6s	Patch does not cause any errors with Hadoop 3.2.4 3.3.4.
+1 💚	spotless	0m 49s	patch has no errors when running spotless:check.
+1 💚	spotbugs	3m 25s	the patch passed
		_ Other Tests _
+1 💚	asflicense	0m 21s	The patch does not generate ASF License warnings.
		56m 49s

Subsystem	Report/Notes
Docker	ClientAPI=1.42 ServerAPI=1.42 base: https://ci-hbase.apache.org/job/HBase-PreCommit-GitHub-PR/job/PR-5037/10/artifact/yetus-general-check/output/Dockerfile
GITHUB PR	#5037
Optional Tests	dupname asflicense javac spotbugs hadoopcheck hbaseanti spotless checkstyle compile
uname	Linux 3acd03ccc1e1 5.4.0-1094-aws #102~18.04.1-Ubuntu SMP Tue Jan 10 21:07:03 UTC 2023 x86_64 x86_64 x86_64 GNU/Linux
Build tool	maven
Personality	dev-support/hbase-personality.sh
git revision	master / `4a9cf99`
Default Java	Eclipse Adoptium-11.0.17+8
Max. process+thread count	84 (vs. ulimit of 30000)
modules	C: hbase-client hbase-server U: .
Console output	https://ci-hbase.apache.org/job/HBase-PreCommit-GitHub-PR/job/PR-5037/10/console
versions	git=2.34.1 maven=3.8.6 spotbugs=4.7.3
Powered by	Apache Yetus 0.12.0 https://yetus.apache.org

This message was automatically generated.

Apache-HBase · 2023-02-26T16:40:43Z

🎊 +1 overall

Vote	Subsystem	Runtime	Comment
+0 🆗	reexec	1m 23s	Docker mode activated.
-0 ⚠️	yetus	0m 4s	Unprocessed flag(s): --brief-report-file --spotbugs-strict-precheck --whitespace-eol-ignore-list --whitespace-tabs-ignore-list --quick-hadoopcheck
		_ Prechecks _
		_ master Compile Tests _
+0 🆗	mvndep	0m 13s	Maven dependency ordering for branch
+1 💚	mvninstall	4m 24s	master passed
+1 💚	compile	1m 10s	master passed
+1 💚	shadedjars	4m 26s	branch has no errors when building our shaded downstream artifacts.
+1 💚	javadoc	0m 42s	master passed
		_ Patch Compile Tests _
+0 🆗	mvndep	0m 13s	Maven dependency ordering for patch
+1 💚	mvninstall	3m 20s	the patch passed
+1 💚	compile	1m 8s	the patch passed
+1 💚	javac	1m 8s	the patch passed
+1 💚	shadedjars	4m 24s	patch has no errors when building our shaded downstream artifacts.
+1 💚	javadoc	0m 42s	the patch passed
		_ Other Tests _
+1 💚	unit	1m 16s	hbase-client in the patch passed.
+1 💚	unit	210m 33s	hbase-server in the patch passed.
		238m 27s

Subsystem	Report/Notes
Docker	ClientAPI=1.42 ServerAPI=1.42 base: https://ci-hbase.apache.org/job/HBase-PreCommit-GitHub-PR/job/PR-5037/10/artifact/yetus-jdk11-hadoop3-check/output/Dockerfile
GITHUB PR	#5037
Optional Tests	javac javadoc unit shadedjars compile
uname	Linux 127aa68e1226 5.4.0-137-generic #154-Ubuntu SMP Thu Jan 5 17:03:22 UTC 2023 x86_64 x86_64 x86_64 GNU/Linux
Build tool	maven
Personality	dev-support/hbase-personality.sh
git revision	master / `4a9cf99`
Default Java	Eclipse Adoptium-11.0.17+8
Test Results	https://ci-hbase.apache.org/job/HBase-PreCommit-GitHub-PR/job/PR-5037/10/testReport/
Max. process+thread count	2458 (vs. ulimit of 30000)
modules	C: hbase-client hbase-server U: .
Console output	https://ci-hbase.apache.org/job/HBase-PreCommit-GitHub-PR/job/PR-5037/10/console
versions	git=2.34.1 maven=3.8.6
Powered by	Apache Yetus 0.12.0 https://yetus.apache.org

This message was automatically generated.

Apache-HBase · 2023-02-26T16:42:27Z

🎊 +1 overall

Vote	Subsystem	Runtime	Comment
+0 🆗	reexec	0m 40s	Docker mode activated.
-0 ⚠️	yetus	0m 3s	Unprocessed flag(s): --brief-report-file --spotbugs-strict-precheck --whitespace-eol-ignore-list --whitespace-tabs-ignore-list --quick-hadoopcheck
		_ Prechecks _
		_ master Compile Tests _
+0 🆗	mvndep	0m 13s	Maven dependency ordering for branch
+1 💚	mvninstall	4m 37s	master passed
+1 💚	compile	1m 3s	master passed
+1 💚	shadedjars	5m 16s	branch has no errors when building our shaded downstream artifacts.
+1 💚	javadoc	0m 48s	master passed
		_ Patch Compile Tests _
+0 🆗	mvndep	0m 12s	Maven dependency ordering for patch
+1 💚	mvninstall	3m 49s	the patch passed
+1 💚	compile	1m 21s	the patch passed
+1 💚	javac	1m 21s	the patch passed
+1 💚	shadedjars	5m 19s	patch has no errors when building our shaded downstream artifacts.
+1 💚	javadoc	0m 49s	the patch passed
		_ Other Tests _
+1 💚	unit	1m 29s	hbase-client in the patch passed.
+1 💚	unit	210m 49s	hbase-server in the patch passed.
		240m 20s

Subsystem	Report/Notes
Docker	ClientAPI=1.42 ServerAPI=1.42 base: https://ci-hbase.apache.org/job/HBase-PreCommit-GitHub-PR/job/PR-5037/10/artifact/yetus-jdk8-hadoop3-check/output/Dockerfile
GITHUB PR	#5037
Optional Tests	javac javadoc unit shadedjars compile
uname	Linux 304544e9305b 5.4.0-1094-aws #102~18.04.1-Ubuntu SMP Tue Jan 10 21:07:03 UTC 2023 x86_64 x86_64 x86_64 GNU/Linux
Build tool	maven
Personality	dev-support/hbase-personality.sh
git revision	master / `4a9cf99`
Default Java	Temurin-1.8.0_352-b08
Test Results	https://ci-hbase.apache.org/job/HBase-PreCommit-GitHub-PR/job/PR-5037/10/testReport/
Max. process+thread count	2682 (vs. ulimit of 30000)
modules	C: hbase-client hbase-server U: .
Console output	https://ci-hbase.apache.org/job/HBase-PreCommit-GitHub-PR/job/PR-5037/10/console
versions	git=2.34.1 maven=3.8.6
Powered by	Apache Yetus 0.12.0 https://yetus.apache.org

This message was automatically generated.

bbeaudreault requested a review from Apache9 February 17, 2023 16:20

bbeaudreault commented Feb 17, 2023

View reviewed changes

hbase-client/src/main/java/org/apache/hadoop/hbase/client/AsyncNonMetaRegionLocator.java Show resolved Hide resolved

bbeaudreault commented Feb 17, 2023

View reviewed changes

hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/HRegionServer.java Outdated Show resolved Hide resolved

HBASE-27650 Merging empty regions corrupts meta cache

c06f052

bbeaudreault force-pushed the HBASE-27650 branch from b726003 to c06f052 Compare February 17, 2023 17:11