-
Notifications
You must be signed in to change notification settings - Fork 3.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
HBASE-27650 Merging empty regions corrupts meta cache #5037
Conversation
hbase-client/src/main/java/org/apache/hadoop/hbase/client/AsyncNonMetaRegionLocator.java
Show resolved
Hide resolved
boolean isLast = Bytes.equals(region.getEndKey(), HConstants.EMPTY_END_ROW); | ||
|
||
while (true) { | ||
Map.Entry<byte[], RegionLocations> overlap = |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Another option here would have been to iterate a subMap
. I felt this approach was better because merges are rare and in almost all cases we'll exit here after just 1 floorEntry/lastEntry call and 1 reference equality check. Using a subMap requires at least 2 comparator comparisons, to get the head and tail of the subMap.
@@ -442,6 +485,10 @@ private RegionLocations locateRowInCache(TableCache tableCache, TableName tableN | |||
recordCacheHit(); | |||
return locs; | |||
} else { | |||
if (LOG.isTraceEnabled()) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This log was helpful for diagnosing this bug, so I decided to keep it.
hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/HRegionServer.java
Outdated
Show resolved
Hide resolved
b726003
to
c06f052
Compare
This comment was marked as outdated.
This comment was marked as outdated.
This comment was marked as outdated.
This comment was marked as outdated.
Will look into test failures shortly. They look related |
This comment was marked as outdated.
This comment was marked as outdated.
b8528fe
to
376c153
Compare
This comment was marked as outdated.
This comment was marked as outdated.
hbase-client/src/main/java/org/apache/hadoop/hbase/client/AsyncNonMetaRegionLocator.java
Outdated
Show resolved
Hide resolved
This comment was marked as outdated.
This comment was marked as outdated.
This comment was marked as outdated.
This comment was marked as outdated.
hbase-client/src/main/java/org/apache/hadoop/hbase/client/AsyncNonMetaRegionLocator.java
Outdated
Show resolved
Hide resolved
* possible, calls beforeUpdate callback prior to making a change. Calls afterUpdate callback | ||
* after making a change. | ||
*/ | ||
public synchronized void remove(HRegionLocation loc, Runnable beforeUpdate, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We need to be careful that the beforeUpdate and afterUpdate do not hold other locks otherwise it may introduce dead lock
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let me see if I can remove the callbacks. I was trying to keep the metaLocation.onError stuff out of here. I wasn't sure if the onError call needed to happen at the exact point
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If it is not easy, just add more comments to warn others.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I removed the callbacks. I think remove
can just return a boolean, and both actions can just happen after if an action was taken.
hbase-client/src/main/java/org/apache/hadoop/hbase/client/AsyncNonMetaRegionLocator.java
Outdated
Show resolved
Hide resolved
hbase-server/src/test/java/org/apache/hadoop/hbase/client/TestMetaCache.java
Outdated
Show resolved
Hide resolved
hbase-client/src/main/java/org/apache/hadoop/hbase/client/AsyncNonMetaRegionLocator.java
Outdated
Show resolved
Hide resolved
|
||
boolean isLast = isEmptyStopRow(region.getEndKey()); | ||
|
||
while (true) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the implementation can still be improved? Now it will stop when we hit the location itself, but it is still possible that an region whose startKey is less than this location but the endKey is greater than the startKey of this location.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Once a region is merged, meta scan will only return the new child region whose start and end should fully encompass all the merged regions. So that is the only case we need to solve, which is handled here.
What you describe would only be possible if we tried to cache one of the merged parent regions. That should not happen.
Does that make sense?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think theoretically, it is possible that the regions are merged/split again and again, for example all regions are merged to one, and then the region is split to multiple regions again. In this way, the boundaries can be anything...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok that makes sense, but not sure it's an issue. Just to be clear, I'm open to making a change here. I'm just trying to think through this, so please bear with me.
The problem we are trying to solve is due to how we use floorEntry to query the cache. Using floorEntry opens us to a problem where a stale cache entry with startKey greater than the correct one can cause the correct one to never be returned. The current solution solves that.
Since we use floorEntry, I don't think stale entries with startKey less than the correct location are really a problem. They would exist in cache but not cause any issues. If a request went to them they would be cleared. If they got overlapped by another region they'd be cleaned up at that point. Assuming a relatively active table, it would all clean up over time as different regions get requested.
That said, I did think about how to do it. All entries are indexed by startKey, but we're concerned about endKey. We could pretty easily check the entry just prior to the cached location. But that doesn't cover us. Theoretically even the first entry in the cache could have an endKey that overlaps.
So the only way to fully be sure of no overlaps given the endless possibility is to fully check all entries to the head of the cache. I don't think this is worth it given there could be many thousands of regions for a table and there sometimes be bursts of regions being cached which would all have to scan to head. We could also keep a secondary index by endKey, but again don't think it's worth the complexity given these don't cause issues.
Thoughts?
This comment was marked as outdated.
This comment was marked as outdated.
This comment was marked as outdated.
This comment was marked as outdated.
1a88094
to
860f566
Compare
This comment was marked as outdated.
This comment was marked as outdated.
This comment was marked as outdated.
This comment was marked as outdated.
This comment was marked as outdated.
This comment was marked as outdated.
This comment was marked as outdated.
This comment was marked as outdated.
This comment was marked as outdated.
This comment was marked as outdated.
This comment was marked as outdated.
This comment was marked as outdated.
@Apache9 any chance you have time for another look here? Hopefully my reasoning above makes sense and we can keep the current implementation? |
I'm on a business trip until next Wednesday, so do not have much time to access gmail and github(and you know, in China you need to use something like a proxy to access gmail...) I think your argument is reasonable, but I need more time to think whether there are some concern cases we do not cover. Please give me sometime... Thanks. |
Thanks for the update @Apache9, no worries. I will await your reply next week. |
After consideration, I think we can make this assumption The problem can only occur when the new region fully cover an old region, for example, we have Start_New < Start_Old < End_Old < End_New, then if we only access within range [End_Old, End_New], then it will always return the old region but it will then find out the row is not in the range, and try to get the new region, and then we get [Start_New, End_New), still fall into the same situation. If Start_Old is less than Start_New, even if we have overlap, it is not a problem, as when the row is greater than Start_New, we will locate to the new region, and if the row is less than Start_New, it will fall into the old region's range and we will try to access the region and get a NotServing exception, and then we will clean the cache. So I think the implementation here is OK. But let's add more comments here so later developers could know it better. And better rename the method to something like 'cleanProblematicOverlappedRegions' so developers could know that the design here is not to clean all the overlapped regions, just the ones which could cause trouble. Thanks. |
This comment was marked as outdated.
This comment was marked as outdated.
This comment was marked as outdated.
This comment was marked as outdated.
This comment was marked as outdated.
This comment was marked as outdated.
This comment was marked as outdated.
This comment was marked as outdated.
This comment was marked as outdated.
This comment was marked as outdated.
This comment was marked as outdated.
This comment was marked as outdated.
This comment was marked as outdated.
This comment was marked as outdated.
🎊 +1 overall
This message was automatically generated. |
🎊 +1 overall
This message was automatically generated. |
🎊 +1 overall
This message was automatically generated. |
No description provided.