-
Notifications
You must be signed in to change notification settings - Fork 3.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
HBASE-23693 Split failure may cause region hole and data loss when use zk assign #1071
Conversation
💔 -1 overall
This message was automatically generated. |
67526c6
to
53de65e
Compare
💔 -1 overall
This message was automatically generated. |
53de65e
to
b3b12c4
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This seems to solve the main issue, which is the parent and its interim daughters been wiped off. However, I'm wondering if the problem happens because we are setting parent state offline and split=true too soon in the split operation. What would you think, @thangTang ?
Also, is it possible to provide a UT to reproduce? Might be too complex, though, since it involves some sort of race condition.
PairOfSameType<HRegionInfo> daughterRegions = | ||
MetaTableAccessor.getDaughterRegionsFromParent(this.server.getConnection(), region); | ||
if (daughterRegions != null) { | ||
if (daughterRegions.getFirst() != null) { | ||
daughter2Parent.put(daughterRegions.getFirst().getEncodedName(), region); | ||
} | ||
if (daughterRegions.getSecond() != null) { | ||
daughter2Parent.put(daughterRegions.getSecond().getEncodedName(), region); | ||
} | ||
} | ||
} catch (KeeperException ke) { | ||
server.abort("Unexpected ZK exception deleting node " + region, ke); | ||
} catch (IOException e) { | ||
LOG.warn("get daughter from meta exception " + region, e); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it really entering this block? My understanding is that parent region would be in OFFLINE state and SPLIT=true after createDaughters call completed inside SplitTransactionImpl.execute.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, but at this time(after createDaughters call completed inside SplitTransactionImpl.execute) the split transaction is still not entirely complete.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, but at this time(after createDaughters call completed inside SplitTransactionImpl.execute) the split transaction is still not entirely complete.
And that's why seems wrong to have SPLIT=true already at that point.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we update the meta information later, we can only put it after the completion of execute openDaughters
. It is not very clear to me what impact this might have now. I may need more detailed and comprehensive thinking. My idea is to merge this patch first, at least it can solve most of the problems. Do you think it is okay? @wchevreuil
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we update the meta information later, we can only put it after the completion of execute openDaughters.
Yeah, that's what I think.
It is not very clear to me what impact this might have now. I may need more detailed and comprehensive thinking. My idea is to merge this patch first, at least it can solve most of the problems. Do you think it is okay?
If you think it's too much work try fixing the state/split flag updates on this PR, then yeah, we can merge this one for now, then work on the other solution in a different jira/RP.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If you think it's too much work try fixing the state/split flag updates on this PR, then yeah, we can merge this one for now, then work on the other solution in a different jira/RP.
Yeah, I hope to merge this patch first, and other work can be completed in new JIRA. At least it can solve the problem of data loss. In extreme scenarios, the region hole can be temporarily repaired by HBCK.
HRegionInfo parent = daughter2Parent.get(hri.getEncodedName()); | ||
HRegionInfo info = getHRIFromMeta(parent); | ||
if (info != null && info.isSplit() && info.isOffline()) { | ||
regionsToClean.add(Pair.newPair(state.getRegion(), info)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh, so parent is OFFLINE and SPLIT=true in meta only, but not in assignedRegions? Sounds wrong that we marked parent as OFFLINE and SPLIT even when splitting was still not entirely complete.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If the RS machine crash when the SplitTransactionImpl step after PONR, master will handle the split rollback. Under normal conditions, it cleanup daughter region and try to assign parent region. If assign failed, parent region RIT so CatalogJanitor blocked; but at this time the master switch, new master will delete parent region because of OFFLINE and SPLIT=true.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This daughter2Parent added stingency makes sense to me. Nice.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If the RS machine crash when the SplitTransactionImpl step after PONR, master will handle the split rollback.
Yeah, but it seems to work only because Master has different info for the region than what is actual in meta. What if Active Master also crashes before it triggers SCP and daughters get cleaned up? The other master will load regions state as seen in meta, then it can run into same problem again.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If the RS machine crash when the SplitTransactionImpl step after PONR, master will handle the split rollback.
Yeah, but it seems to work only because Master has different info for the region than what is actual in meta. What if Active Master also crashes before it triggers SCP and daughters get cleaned up? The other master will load regions state as seen in meta, then it can run into same problem again.
daughter cleanup is excuted in SCP, and in this patch it will do three things together:
- delete daughter region in meta
- update parent region in meta
- delete daughter region dir in HDFS
if (regionPair != null) {
MetaTableAccessor.deleteRegion(this.server.getConnection(), hri); MetaTableAccessor.deleteRegion(this.server.getConnection(), hri);
} }
if (parentInfo != null) {
List<Mutation> mutations = new ArrayList<Mutation>();
HRegionInfo copyOfParent = new HRegionInfo(parentInfo);
copyOfParent.setOffline(false);
copyOfParent.setSplit(false);
Put putParent = MetaTableAccessor.makePutFromRegionInfo(copyOfParent);
mutations.add(putParent);
MetaTableAccessor.mutateMetaTable(this.server.getConnection(), mutations);
}
LOG.debug("Cleaning up HDFS since no meta entry exists, hri: " + hri); LOG.debug("Cleaning up HDFS since no meta entry exists, hri: " + hri);
FSUtils.deleteRegionDir(server.getConfiguration(), hri); FSUtils.deleteRegionDir(server.getConfiguration(), hri);
So if Active Master also crashes before it triggers SCP, the daughter won`t be deleted.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So if Active Master also crashes before it triggers SCP, the daughter won`t be deleted.
Yes, that's my point. We would now have a new active master that sees parent split as complete, although split was midway through. It will potentially remove the parent, and fail to online the daughters.
I wonder if working on correcting the regions state and split flag updates would sort split failures at different scenarios. It also does not seem consistent the way we do these updates in meta only and don't reflect it on the "in-memory" region info master has.
💔 -1 overall
This message was automatically generated. |
I think the direct reason for this problem is that the information in the meta table has not been updated after the daughter region is cleaned up. At this time, the sub daughter no longer exists and the master will also try to reassign the parent region, so updating the information in the meta table should not have a negative impact. As for whether to adjust the timing of setting the parent region state in the meta table, we may need to sort out the split process as a whole. After all, when ZK assign is used, split transaction is complex. I've considered the problem of unit testing, and it's difficult to mock this case --- I met this problem in an extreme scenario --- RS crash when split step after PONR but not entirely complete, no other RS is available, and the master switch occurs. |
// Offline regions outside the loop and synchronized block to avoid | ||
// ConcurrentModificationException and deadlock in case of meta anassigned, | ||
// but RegionState a blocked. | ||
Set<HRegionInfo> regionsToOffline = new HashSet<HRegionInfo>(); | ||
Map<String, HRegionInfo> daughter2Parent = new HashMap<>(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Suggest this needs more comment on why we need this accounting in daughter2Parent.
HRegionInfo parent = daughter2Parent.get(hri.getEncodedName()); | ||
HRegionInfo info = getHRIFromMeta(parent); | ||
if (info != null && info.isSplit() && info.isOffline()) { | ||
regionsToClean.add(Pair.newPair(state.getRegion(), info)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This daughter2Parent added stingency makes sense to me. Nice.
Just to be explicit, this issue needs @wchevreuil approval too before can be merged (I like his comments) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's move on with this and think on correcting the state/split flag updates on a separate jira, then.
@saintstack hi, can you help me to merge this patch? @wchevreuil has approved. After that we will
|
…e zk assign (#1071) Signed-off-by: stack <[email protected]> Signed-off-by: Wellington Chevreuil <[email protected]>
Thanks for the contribution, @thangTang . I had merged this PR into branch-1. |
This is my first patch. I'm very happy to have a good start. Thank you and @wchevreuil very much :) |
https://issues.apache.org/jira/browse/HBASE-23693