-
Notifications
You must be signed in to change notification settings - Fork 3.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
HBASE-27232 Fix checking for encoded block size when deciding if bloc… #4640
Changes from 2 commits
05ba4b4
79fd61e
4bf5d49
1b46055
8eb91cc
3b57fe6
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -172,8 +172,10 @@ public HFileWriterImpl(final Configuration conf, CacheConfig cacheConf, Path pat | |
} | ||
closeOutputStream = path != null; | ||
this.cacheConf = cacheConf; | ||
float encodeBlockSizeRatio = conf.getFloat(UNIFIED_ENCODED_BLOCKSIZE_RATIO, 1f); | ||
this.encodedBlockSizeLimit = (int) (hFileContext.getBlocksize() * encodeBlockSizeRatio); | ||
float encodeBlockSizeRatio = conf.getFloat(UNIFIED_ENCODED_BLOCKSIZE_RATIO, 0f); | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. so this changes the default behavior i believe. do you think this change is applicable to most users (i.e. net positive for them)? not against it, just asking... alternatively we can add handling of 0 value below and leave default. i have no opinion since i dont fully grasp the feature yet. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. It doesn't change the default behaviour in the sense that if "hbase.writer.unified.encoded.blocksize.ratio" isn't set, we consider only the unencoded size for calculating block limit, which is the same with previous if condition on checkBlockBoundary method. The difference is when "hbase.writer.unified.encoded.blocksize.ratio" is set, as we now can have 64KB of encoded data (whilst it would never be possible before). |
||
this.encodedBlockSizeLimit = encodeBlockSizeRatio >0 ? | ||
wchevreuil marked this conversation as resolved.
Show resolved
Hide resolved
|
||
(int) (hFileContext.getBlocksize() * encodeBlockSizeRatio) : 0; | ||
wchevreuil marked this conversation as resolved.
Show resolved
Hide resolved
|
||
|
||
finishInit(conf); | ||
if (LOG.isTraceEnabled()) { | ||
LOG.trace("Writer" + (path != null ? " for " + path : "") + " initialized with cacheConf: " | ||
|
@@ -309,12 +311,15 @@ protected void finishInit(final Configuration conf) { | |
* At a block boundary, write all the inline blocks and opens new block. | ||
*/ | ||
protected void checkBlockBoundary() throws IOException { | ||
// For encoder like prefixTree, encoded size is not available, so we have to compare both | ||
// encoded size and unencoded size to blocksize limit. | ||
if ( | ||
blockWriter.encodedBlockSizeWritten() >= encodedBlockSizeLimit | ||
|| blockWriter.blockSizeWritten() >= hFileContext.getBlocksize() | ||
) { | ||
boolean shouldFinishBlock = false; | ||
//This means hbase.writer.unified.encoded.blocksize.ratio was set to something different from 0 | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. any implication on the comment that was here before which says we need to compare both? granted I think prefixTree has been removed There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Yeah, we don't have prefix tree anymore. So with the previous "||" condition, we could still not respect the encoded size desired if the data shrinkage by encoding is higher than the configured "hbase.writer.unified.encoded.blocksize.ratio' value. This change also allows for defining a 1:1 ration where we would then use the encoded size for the block limit. |
||
//and we should use the encoding ratio | ||
if (encodedBlockSizeLimit > 0){ | ||
shouldFinishBlock = blockWriter.encodedBlockSizeWritten() >= encodedBlockSizeLimit; | ||
} else { | ||
shouldFinishBlock = blockWriter.blockSizeWritten() >= hFileContext.getBlocksize(); | ||
} | ||
if(shouldFinishBlock) { | ||
finishBlock(); | ||
writeInlineBlocks(false); | ||
newBlock(); | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure if it's feasible, but if you move the test to
TestHFile
, for instance, you don't need to make this method public.