-
Notifications
You must be signed in to change notification settings - Fork 3.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
HBASE-27227 Long running heavily filtered scans hold up too many ByteBuffAllocator buffers #4940
Closed
Closed
Changes from 3 commits
Commits
Show all changes
7 commits
Select commit
Hold shift + click to select a range
8d2fee2
HBASE-27227 Long running heavily filtered scans hold up too many Byte…
bbeaudreault 57542bb
Add intra-row eager release
bbeaudreault 17905a3
cleanup
bbeaudreault 589be49
Checkpoint once at construction, otherwise only checkpoint if filteri…
bbeaudreault 9982866
a few more tests
bbeaudreault 69b1904
Add a checkpointingEnabled boolean to scanner creation, rather than r…
bbeaudreault 82b2a55
fix mocks
bbeaudreault File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is just for RegionScannerImpl.filter is not null we should checkpoint?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
good point. i think i can do that, let me look into it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've moved the initial checkpoint call into the constructor, so we only have to do it once. The other checkpoint calls have been changed to a new
checkpointIfFiltering
method which returns early iffilter == null
.The initial checkpoint call is still important because it enables
retainBlock()
functionality in StoreScanner. Someone could submit a scan withaddColumn(...)
which looks for 1 column in rows with many columns. In which caseretainBlock()
in StoreScanner would still be very useful. I added a comment to explain that.Thanks for looking!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@bbeaudreault , thank you very much for detailed reply. Overall LGTM, it is really a very insightful PR, just have one suggestion , FYI.
HFileScannerImpl.lastCheckpointIndex
to 0 when initializing ? so we could simplify theif (shouldRetainBlock || lastCheckpointIndex < 0)
inHFileScannerImpl.handlePrevBlock
toif (shouldRetainBlock)
, after all, when we start to scan,HFileScannerImpl.lastCheckpointIndex
is always >=0, no matter there is filter or not, and we could alsoremove
checkpoint(State.START)
inRegionScannerImpl
ctor.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks again @comnetwork.
I agree ideally we could do what you say. The reason I have the complexity is because I'm concerned about use-cases of HFileScanner or StoreFileScanner which don't go through StoreScanner. For example HFilePrettyPrinter, bulk load verification, etc.
The
lastCheckpointIndex < 0
check ensures that we only honorshouldRetainBlock
if a call tocheckpoint(State)
has occurred. The contract is that if you are using checkpointing, you also need to callretainBlock()
. If you are not using checkpointing, you don't need to call retainBlock().Failing to call
retainBlock()
would result in blocks being released too early, so I only want this to apply for StoreScanner, which is the only place we currently do checkpointing.A better approach might be to add a new
boolean checkpointEnabled
in HFileScannerImpl constructor. This is more explicit but involves adding boolean arguments to many various getScanner methods. I can give this a shot, or also open to other ideas if you have them.Let me know if this wasn't clear.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@comnetwork I just pushed a commit which adds a
boolean checkpointingEnabled
to the creation of HFileScannerImpl. This involves minor modifications to many levels ofgetScanner(...)
calls. I added the new param everywhere necessary, defaulting tofalse
everywhere. This retains the old behavior for all usages. I then updated just StoreScanner to passtrue
, as this is the only place we want to enable this behavior right now.This is a bunch more small changes, but is probably the safest route. If there were a bug where we accidentally passed false, we'd just be reverting to the original behavior and losing the optimization. You can see how it affects things in HFileReaderImpl.
Let me know what you think. I can revert that commit or we can keep it. As a result I was able to simplify some of the checkpointing (default to 0 instead of -1, no need to call checkpoint on new scanners).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@bbeaudreault , thank you for the elaborate work and I agree with your point that adding a new boolean
checkpointEnabled
make the logic more clear.I agree with the first part of the sentence, but have doubts about the second. From you code, I think for scan by
RegionScannnerImpl
,retainBlock
is always needed to release blocks early, only when there is filter(especially row level filter),we need checkpoint further to narrow the blocks which should be retained after a row is filtered?I think we could just only useretainBlock
for user scan which specifying columns but not has filters, so we don't need to callcheckpoint
. If you agree with my point, I think the variable nameboolean checkpointingEnabled
is not very appropriate, maybeearlyReleaseBlockEnabled
is more indicative of intentions?