-
Notifications
You must be signed in to change notification settings - Fork 3.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
HBASE-27621 Also clear the Dictionary when resetting when reading compressed WAL file #5016
Conversation
This seems like an ingenious idea. But I want to confirm that due to the eviction mechanism of LRUMap, even if findEntry is used instead of addEntry, is there still a possibility of inconsistent read-write path behavior in theory? |
🎊 +1 overall
This message was automatically generated. |
if (status == Dictionary.NOT_IN_DICTIONARY) { | ||
int tagLen = StreamUtils.readRawVarint32(src); | ||
offset = Bytes.putAsShort(dest, offset, tagLen); | ||
IOUtils.readFully(src, dest, offset, tagLen); | ||
tagDict.addEntry(dest, offset, tagLen); | ||
tagDict.findEntry(dest, offset, tagLen); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you think this change could be expensive? In the normal case, the entry will not exist in the dict. But now we're adding an extra map lookup for every call. Granted o(1), but involves cpu for hashcode, allocating lookup key, etc.
I wonder if we could trigger findEntry only if context has been reset? Otherwise use addEntry for first pass?
May not be a big issue, just checking
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is slower than before but I always think correctness comes first, and then we consider the performance. For log splitting and replication, reading is usually not the bottleneck.
Can file an follow on issue to do the optimization, maybe we could add a reset flag in CompressionContext too, to indicate that whether we need to do a lookup first.
Thanks.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sounds good, agree on correctness first.
Also agree on bottleneck for splitting/replications. However, this uncompressTags method is in the hot path of normal reads when DataBlockEncoding is used: here.
💔 -1 overall
This message was automatically generated. |
💔 -1 overall
This message was automatically generated. |
The most important thing here is to read WAL entries in order, and not skip any entries. If these two rules are guaranteed, it is OK to restart as many times as you want. And I think for replication, we must follow these two rules otherwise there will be data loss... |
Agree about that, but I think I didn't express my question clearly. For WAL Compression, The core logic is to build an index (LRUMap) in memory while writing/reading WAL. There is another key point here, that is, when operating a WAL file, the behavior of both read/write path needs to be exactly same. Using findEntry instead of addEntry in this patch, I think it could solve a part of problem. But however, for example, we did not resetPosition when we wrote WAL, but a certain position was reset many times when we read WAL. The implicit operation here is: this node has been movedToHead many times in LRUMap. So is it possible that the node evicted in the write path(write WAL) has inconsistencies in the read path(replication)? |
🎊 +1 overall
This message was automatically generated. |
🎊 +1 overall
This message was automatically generated. |
🎊 +1 overall
This message was automatically generated. |
🎊 +1 overall
This message was automatically generated. |
🎊 +1 overall
This message was automatically generated. |
🎊 +1 overall
This message was automatically generated. |
After deep consideration I think you are right. The solution here can only work perfectly when the dict is infinite, i.e, no eviction. If we also consider eviction, if go back for a long distance, the word of a given index will change due to eviction, then when reading, if we use a index to get the word(a qualifier, a row, for example), we may get a incorrect word on the given index. So, it seems that rebuilding the dict is necessary when reseting. But anyway, I could try to refactor the readNext method in ProtrobufLogReader, to have more fine-grained control on whether we need to reconstruct the dict. For example, if we just return before reading the actual WAL entry, i.e, we quit earlier after checking available bytes, we do not need to reconstruct the dict. Thanks for pointing this out! |
Agree. Although this solution will have a performance loss, but it should be the best way I can think of to completely solve this problem. |
The PR can not solve all the problem so I convert it to draft to avoid others may merge it accidentally. Thanks all for help reviewing and testing, especially @thangTang for pointing out the problem. Will change the title and provide a new PR soon. |
For ensure the compress and uncompress construct same dictionary, we should only use LRUDictionary#findEntry() to add entries, but need to keep LRUDictionary#getEntry() do not move ahead the entry? |
This is still not enough... As said above, if we go back for a long distance, the word on a given index could be completely different, and then lead to incorrect result when you find a field is 'in dictionary'... |
I tried to refactoring a bit but the implementation of ProtobufLogReader is too complicated. I think we'd better abstract two types of WAL.Reader for reading WAL file. Thanks. |
I understand that this is a complicated and dirty job, I am ashamed that I didn't solve it thoroughly before... |
@apurtell FYI, I think you may also be interested in this patch~ |
After all, manual +1 from me: ) |
🎊 +1 overall
This message was automatically generated. |
🎊 +1 overall
This message was automatically generated. |
I think this can be done step by step. WDYT? Thanks. |
Make sense. |
🎊 +1 overall
This message was automatically generated. |
@sunhelly Could you please try to see if this PR can also solve your problem? And is it possible to contribute your replication test case to hbase-it? Thanks. |
🎊 +1 overall
This message was automatically generated. |
🎊 +1 overall
This message was automatically generated. |
I tested morning, sadly still something wrong...The problem is focus on one scenario, replicated mostly whole row deletes. It seems should not be relevant to the operation, but I can't find more relevant changes. |
If WALPrettyPrinter can not output correct result, I think the problem is not about the replication implementation then, it should be something wrong when writing the WAL file. And I believe it will also make WAL splitting incorrect? Do you also enabled WAL value compression? Or just the dictionary based compression... Thanks. |
Yes, I also enabled WAL value compression. I'll check if the stuck recurs after disable it. |
Maybe the problem is that, in replication, we will check whether we have parsed all the bytes. But in WAL splitting, we just return after getting EOF... |
Oh, the cluster B really has lose data issue.. |
Is it OK for your company to upload the WAL file somewhere? So we can see the content of the WAL file and check what is the problem... |
OK. I'll prepare one. |
It works well after disabling WAL value compress with this fix PR on our cluster. We can reproduce the replication stuck by enable WAL value compression, while the WALPrettyPrinter stops at the middle position without any exceptions. The stuck issue now is not relevant to the dictionary. |
Thanks @sunhelly for providing the useful feedback. Let me merge this PR first to solve the dictionary problem. For replication value compression, seems there are still other bugs and @apurtell also pointed out that there are some tricks in the buffer reuse mechanism, will dig more and file other issues to try to fix. Thanks. |
…pressed WAL file (#5016) Signed-off-by: Xiaolin Ha <[email protected]> (cherry picked from commit 833b10e)
…pressed WAL file (#5016) Signed-off-by: Xiaolin Ha <[email protected]> (cherry picked from commit 833b10e)
…pressed WAL file (#5016) Signed-off-by: Xiaolin Ha <[email protected]> (cherry picked from commit 833b10e)
…g when reading compressed WAL file (apache#5016) Signed-off-by: Xiaolin Ha <[email protected]> (cherry picked from commit 833b10e)
…pressed WAL file (apache#5016) Signed-off-by: Xiaolin Ha <[email protected]> (cherry picked from commit 833b10e) (cherry picked from commit 8df3212) Change-Id: I469fa5b5a7ba6a41c3b8b28acb57a60f33c27fe9
No description provided.