-
Notifications
You must be signed in to change notification settings - Fork 3.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
HBASE-26791 Memstore flush fencing issue for SFT #4202
Conversation
🎊 +1 overall
This message was automatically generated. |
🎊 +1 overall
This message was automatically generated. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe I'm missing it, but I think there's still a chance for race condition here. Let me try to summarize how I think you're solving this and you can correct me.
We add a sequence id (the current timestamp or one more than the previous ID if we have a clock skew) for each store's tracker. This sequence id is included in the filename for uniqueness. When we write a new tracker out, we increment the id. When we go to load a store, we find the largest ID and pick that seqID's tracker file to load. After we pick the biggest seqId tracker, we delete any trackers older than the one we picked.
- RS1 makes a tracker for storeA with seq=1
- RS1 goes zombie
- RS2 gets assignment for storeA opens seq=2
- RS1 wakes up and tries to memstore flush
I'm confused because I feel like we will call load(false)
which would just write a new tracker file with the newer time (or seqId + 1
). I will totally admit I am probably just getting it by reading the code.
// since read only is true, we should not delete the old track file | ||
// the deletion is in background, so sleep a bit then assert | ||
Thread.sleep(2000); | ||
assertTrue(fs.exists(trackFileStatus.getPath())); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Use the HTU.Waiter
to check a few times please.
Ah, so is it that we use the sequenceId only when opening the Region (store). After the region is open, choosing whether to use f1 or f2 is then based on what Sorry I missed your explanation on Jira. |
Some general thinking about this (not specific to your changes)
|
Ah you have missed a very important point... We will only bump the sequence id when loading at the primary replica(i.e, readOnly = false), this means we will only bump the sequence id when opening a region, so the scenario you described above has no problem. RS1 will only write the files with old sequence id while RS2 will write the files with new(greater) sequence id. This is the key trick here to solve the problem. |
As explained above, the key here is to make sure that the sequence id of the SFT files written by the new RS is greater than the ones written by the old RS. So the answer of these two questions are pretty easy:
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Overall, LGTM. Had made some nit comments on the subject of logging/supportability.
Test wise, could we add a test that mimics the issue, by instantiate two StoreFileListFile instances, update both with different store files, then check these updates don't reflect on each other lists of store files?
...r/src/main/java/org/apache/hadoop/hbase/regionserver/storefiletracker/StoreFileListFile.java
Show resolved
Hide resolved
...r/src/main/java/org/apache/hadoop/hbase/regionserver/storefiletracker/StoreFileListFile.java
Show resolved
Hide resolved
...r/src/main/java/org/apache/hadoop/hbase/regionserver/storefiletracker/StoreFileListFile.java
Show resolved
Hide resolved
🎊 +1 overall
This message was automatically generated. |
🎊 +1 overall
This message was automatically generated. |
🎊 +1 overall
This message was automatically generated. |
🎊 +1 overall
This message was automatically generated. |
🎊 +1 overall
This message was automatically generated. |
🎊 +1 overall
This message was automatically generated. |
Any other concerns? @wchevreuil @joshelser |
Oh that's tricky. I didn't even think about that case. 👍🏼 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
Signed-off-by: Josh Elser <[email protected]> Signed-off-by: Wellington Chevreuil <[email protected]> (cherry picked from commit e56ed40)
Signed-off-by: Josh Elser <[email protected]> Signed-off-by: Wellington Chevreuil <[email protected]> (cherry picked from commit e56ed40) Change-Id: If8773e1fd5147fb339a084bfd6b6a34dbd6f4950
No description provided.