-
Notifications
You must be signed in to change notification settings - Fork 3.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
HBASE-27232 Fix checking for encoded block size when deciding if bloc… #4640
Conversation
…k should be closed
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm. Few nitpicks only.
hbase-server/src/main/java/org/apache/hadoop/hbase/io/hfile/HFileWriterImpl.java
Outdated
Show resolved
Hide resolved
hbase-server/src/test/java/org/apache/hadoop/hbase/regionserver/TestHStoreFile.java
Outdated
Show resolved
Hide resolved
@@ -449,7 +449,7 @@ public int getOnDiskSizeWithHeader() { | |||
} | |||
|
|||
/** Returns the on-disk size of the data part + checksum (header excluded). */ | |||
int getOnDiskSizeWithoutHeader() { | |||
public int getOnDiskSizeWithoutHeader() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure if it's feasible, but if you move the test to TestHFile
, for instance, you don't need to make this method public.
hbase-server/src/main/java/org/apache/hadoop/hbase/io/hfile/HFileWriterImpl.java
Outdated
Show resolved
Hide resolved
|| blockWriter.blockSizeWritten() >= hFileContext.getBlocksize() | ||
) { | ||
boolean shouldFinishBlock = false; | ||
//This means hbase.writer.unified.encoded.blocksize.ratio was set to something different from 0 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
any implication on the comment that was here before which says we need to compare both? granted I think prefixTree has been removed
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, we don't have prefix tree anymore. So with the previous "||" condition, we could still not respect the encoded size desired if the data shrinkage by encoding is higher than the configured "hbase.writer.unified.encoded.blocksize.ratio' value. This change also allows for defining a 1:1 ration where we would then use the encoded size for the block limit.
@@ -172,8 +172,10 @@ public HFileWriterImpl(final Configuration conf, CacheConfig cacheConf, Path pat | |||
} | |||
closeOutputStream = path != null; | |||
this.cacheConf = cacheConf; | |||
float encodeBlockSizeRatio = conf.getFloat(UNIFIED_ENCODED_BLOCKSIZE_RATIO, 1f); | |||
this.encodedBlockSizeLimit = (int) (hFileContext.getBlocksize() * encodeBlockSizeRatio); | |||
float encodeBlockSizeRatio = conf.getFloat(UNIFIED_ENCODED_BLOCKSIZE_RATIO, 0f); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
so this changes the default behavior i believe. do you think this change is applicable to most users (i.e. net positive for them)? not against it, just asking... alternatively we can add handling of 0 value below and leave default. i have no opinion since i dont fully grasp the feature yet.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It doesn't change the default behaviour in the sense that if "hbase.writer.unified.encoded.blocksize.ratio" isn't set, we consider only the unencoded size for calculating block limit, which is the same with previous if condition on checkBlockBoundary method.
The difference is when "hbase.writer.unified.encoded.blocksize.ratio" is set, as we now can have 64KB of encoded data (whilst it would never be possible before).
💔 -1 overall
This message was automatically generated. |
💔 -1 overall
This message was automatically generated. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1, LGTM .
Please include release notes on how to use this property and changes in values.
🎊 +1 overall
This message was automatically generated. |
💔 -1 overall
This message was automatically generated. |
💔 -1 overall
This message was automatically generated. |
💔 -1 overall
This message was automatically generated. |
🎊 +1 overall
This message was automatically generated. |
🎊 +1 overall
This message was automatically generated. |
hbase-server/src/main/java/org/apache/hadoop/hbase/io/hfile/HFileBlock.java
Outdated
Show resolved
Hide resolved
🎊 +1 overall
This message was automatically generated. |
🎊 +1 overall
This message was automatically generated. |
🎊 +1 overall
This message was automatically generated. |
🎊 +1 overall
This message was automatically generated. |
🎊 +1 overall
This message was automatically generated. |
🎊 +1 overall
This message was automatically generated. |
#4640) Signed-off-by: Andor Molnár <[email protected]> Signed-off-by: Bryan Beaudreault <[email protected]> Signed-off-by: Ankit Singhal <[email protected]> Signed-off-by: Duo Zhang <[email protected]> (cherry picked from commit d5ed8f5)
…e when deciding if bloc… (apache#4640) Signed-off-by: Andor Molnár <[email protected]> Signed-off-by: Bryan Beaudreault <[email protected]> Signed-off-by: Ankit Singhal <[email protected]> Signed-off-by: Duo Zhang <[email protected]> Change-Id: I1ce3bd0f9f82526cbbbec3171d6eace547b0e57b
On HFileWriterImpl.checkBlockBoundary, we useed to consider the unencoded and uncompressed data size when deciding to close a block and start a new one. That could lead to varying "on-disk" block sizes, depending on the encoding efficiency for the cells in each block.
HBASE-17757 introduced the hbase.writer.unified.encoded.blocksize.ratio property, as ration of the original configured block size, to be compared against the encoded size. This was an attempt to ensure homogeneous block sizes. However, the check introduced by HBASE-17757 also considers the unencoded size, which in the cases where encoding efficiency is higher than what's configured in hbase.writer.unified.encoded.blocksize.ratio, it would still lead to varying block sizes.
This patch changes that logic, to only consider encoded size if hbase.writer.unified.encoded.blocksize.ratio property is set, otherwise, it will consider the unencoded size. This gives a finer control over the on-disk block sizes and the overall number of blocks when encoding is in use.