Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

HBASE-27227 Long running heavily filtered scans hold up too many ByteBuffAllocator buffers #4940

Closed
wants to merge 7 commits into from

Conversation

bbeaudreault
Copy link
Contributor

@bbeaudreault bbeaudreault commented Dec 27, 2022

  • HFileReaderImpl retains all blocks scanned for a request into a prevBlocks list. These blocks are currently only ever released in shipped() or close().
  • Large scans with narrow filters or sparse scans can retain many blocks that contain no cells needed for the request results.
  • Here we introduce a checkpoint system which allows us to release those blocks when it's determined that they are not needed.
  • We start a checkpoint each time a row is fetched in RegionScannerImpl, by calling checkpoint(State.START). This sets state in StoreScanner and HFileReaderImpl.
  • In StoreScanner, we call retainBlock() whenever a cell is added to the result list. This causes the block to be added to prevBlocks once it's fully scanned, otherwise it gets eagerly released. If no cells are pulled from a block at this level, we can easily immediately release it.
  • If cells are retained but then the row ends up fully filtered, we call checkpoint(State.FILTERED), which releases any blocks accumulated since the last checkpoint. This accounts for cases where filters clear the result list after being applied.
    • There are 3 places where a row can be fully filtered in RegionScannerImpl:
      • When a row is filtered due to filterRowKey.
      • When all cells have been accumulated but then filtered due to filterRowCells or filterRow.
      • When no cells have been returned after checking both storeHeap and joinedHeap.
  • When checkpoint is called in RegionScannerImpl, it fans out to all StoreScanners in the heap (and delayed close scanners). Similarly, when checkpoint is called in StoreScanner, it fans out to all StoreFileScanners in its heap (and delayed close scanners). StoreFileScanner propagates the checkpoint into HFileReaderImpl. The goal here is to checkpoint all HFileReaderImpls, including those in delayed close state because they may have had blocks filtered prior to close.
  • Additionally, StoreScanner will sometimes open new StoreFileScanners (due to reopen after flush or switching to stream read). If the StoreScanner had been checkpointed previously, we checkpoint these new scanners after open. This sets them up to be ready for another checkpoint next time a row is filtered.

I've added extensive tests in TestBlockEvictionFromClient. These verify the expected number of retained block for various filter cases, ensures the returned results are accurate, and additionally forces block cache corruption on any early released blocks. The corruption would cause the test to fail if any necessary blocks were wrongly released, because the rpc shipper tries serializing cells from buffers whose content has changed.

I also added unit tests to StoreScanner and HFileReaderImpl for the relevant smaller changes there.

@Apache-HBase

This comment was marked as outdated.

@Apache-HBase

This comment was marked as outdated.

@Apache-HBase

This comment was marked as outdated.

@Apache-HBase

This comment was marked as outdated.

@Apache-HBase

This comment was marked as outdated.

@Apache-HBase

This comment was marked as outdated.

@Apache-HBase

This comment was marked as outdated.

@Apache-HBase

This comment was marked as outdated.

@Apache-HBase

This comment was marked as outdated.

@Apache-HBase

This comment was marked as outdated.

@Apache-HBase

This comment was marked as outdated.

@Apache-HBase

This comment was marked as outdated.

@bbeaudreault

This comment was marked as outdated.

@bbeaudreault bbeaudreault force-pushed the HBASE-27227-2 branch 2 times, most recently from c2ede4b to 737b6dc Compare January 5, 2023 22:40
@Apache-HBase

This comment was marked as outdated.

@Apache-HBase

This comment was marked as outdated.

@Apache-HBase

This comment was marked as outdated.

@Apache-HBase

This comment was marked as outdated.

@bbeaudreault bbeaudreault force-pushed the HBASE-27227-2 branch 2 times, most recently from db87f12 to 4f3dd81 Compare January 6, 2023 13:48
@Apache-HBase

This comment was marked as outdated.

@bbeaudreault bbeaudreault marked this pull request as ready for review January 6, 2023 13:52
@Apache-HBase

This comment was marked as outdated.

@Apache-HBase

This comment was marked as outdated.

@Apache-HBase

This comment was marked as outdated.

@apache apache deleted a comment from Apache-HBase Jan 9, 2023
@apache apache deleted a comment from Apache-HBase Jan 9, 2023
@apache apache deleted a comment from Apache-HBase Jan 9, 2023
@apache apache deleted a comment from Apache-HBase Jan 9, 2023
@apache apache deleted a comment from Apache-HBase Jan 9, 2023
@apache apache deleted a comment from Apache-HBase Jan 9, 2023
@apache apache deleted a comment from Apache-HBase Jan 9, 2023
@apache apache deleted a comment from Apache-HBase Jan 9, 2023
@apache apache deleted a comment from Apache-HBase Jan 9, 2023
@apache apache deleted a comment from Apache-HBase Jan 9, 2023
@apache apache deleted a comment from Apache-HBase Jan 9, 2023
@apache apache deleted a comment from Apache-HBase Jan 9, 2023
@@ -426,6 +426,8 @@ private boolean nextInternal(List<Cell> results, ScannerContext scannerContext)
// Used to check time limit
LimitScope limitScope = LimitScope.BETWEEN_CELLS;

checkpoint(State.START);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is just for RegionScannerImpl.filter is not null we should checkpoint?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

good point. i think i can do that, let me look into it.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've moved the initial checkpoint call into the constructor, so we only have to do it once. The other checkpoint calls have been changed to a new checkpointIfFiltering method which returns early if filter == null.

The initial checkpoint call is still important because it enables retainBlock() functionality in StoreScanner. Someone could submit a scan with addColumn(...) which looks for 1 column in rows with many columns. In which case retainBlock() in StoreScanner would still be very useful. I added a comment to explain that.

Thanks for looking!

Copy link
Contributor

@comnetwork comnetwork Jan 10, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@bbeaudreault , thank you very much for detailed reply. Overall LGTM, it is really a very insightful PR, just have one suggestion , FYI.

  • Should we set HFileScannerImpl.lastCheckpointIndex to 0 when initializing ? so we could simplify the
    if (shouldRetainBlock || lastCheckpointIndex < 0) in HFileScannerImpl.handlePrevBlock to
    if (shouldRetainBlock), after all, when we start to scan,
    HFileScannerImpl.lastCheckpointIndex is always >=0, no matter there is filter or not, and we could also
    remove checkpoint(State.START) in RegionScannerImpl ctor.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks again @comnetwork.

I agree ideally we could do what you say. The reason I have the complexity is because I'm concerned about use-cases of HFileScanner or StoreFileScanner which don't go through StoreScanner. For example HFilePrettyPrinter, bulk load verification, etc.

The lastCheckpointIndex < 0 check ensures that we only honor shouldRetainBlock if a call to checkpoint(State) has occurred. The contract is that if you are using checkpointing, you also need to call retainBlock(). If you are not using checkpointing, you don't need to call retainBlock().

Failing to call retainBlock() would result in blocks being released too early, so I only want this to apply for StoreScanner, which is the only place we currently do checkpointing.

A better approach might be to add a new boolean checkpointEnabled in HFileScannerImpl constructor. This is more explicit but involves adding boolean arguments to many various getScanner methods. I can give this a shot, or also open to other ideas if you have them.

Let me know if this wasn't clear.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@comnetwork I just pushed a commit which adds a boolean checkpointingEnabled to the creation of HFileScannerImpl. This involves minor modifications to many levels of getScanner(...) calls. I added the new param everywhere necessary, defaulting to false everywhere. This retains the old behavior for all usages. I then updated just StoreScanner to pass true, as this is the only place we want to enable this behavior right now.

This is a bunch more small changes, but is probably the safest route. If there were a bug where we accidentally passed false, we'd just be reverting to the original behavior and losing the optimization. You can see how it affects things in HFileReaderImpl.

Let me know what you think. I can revert that commit or we can keep it. As a result I was able to simplify some of the checkpointing (default to 0 instead of -1, no need to call checkpoint on new scanners).

Copy link
Contributor

@comnetwork comnetwork Jan 11, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@bbeaudreault , thank you for the elaborate work and I agree with your point that adding a new boolean checkpointEnabled make the logic more clear.

The contract is that if you are using checkpointing, you also need to call retainBlock(). If you are not using checkpointing, you don't need to call retainBlock().

I agree with the first part of the sentence, but have doubts about the second. From you code, I think for scan by RegionScannnerImpl, retainBlock is always needed to release blocks early, only when there is filter(especially row level filter),we need checkpoint further to narrow the blocks which should be retained after a row is filtered?I think we could just only use retainBlock for user scan which specifying columns but not has filters, so we don't need to call checkpoint. If you agree with my point, I think the variable name boolean checkpointingEnabled is not very appropriate, maybe earlyReleaseBlockEnabled is more indicative of intentions?

@Apache-HBase

This comment was marked as outdated.

@Apache-HBase

This comment was marked as outdated.

@Apache-HBase

This comment was marked as outdated.

@Apache-HBase

This comment was marked as outdated.

@Apache-HBase

This comment was marked as outdated.

@Apache-HBase
Copy link

🎊 +1 overall

Vote Subsystem Runtime Comment
+0 🆗 reexec 0m 42s Docker mode activated.
_ Prechecks _
+1 💚 dupname 0m 0s No case conflicting files found.
+1 💚 hbaseanti 0m 0s Patch does not have any anti-patterns.
+1 💚 @author 0m 0s The patch does not contain any @author tags.
_ master Compile Tests _
+0 🆗 mvndep 0m 14s Maven dependency ordering for branch
+1 💚 mvninstall 2m 38s master passed
+1 💚 compile 2m 51s master passed
+1 💚 checkstyle 0m 43s master passed
+1 💚 spotless 0m 40s branch has no errors when running spotless:check.
+1 💚 spotbugs 1m 49s master passed
_ Patch Compile Tests _
+0 🆗 mvndep 0m 10s Maven dependency ordering for patch
+1 💚 mvninstall 2m 29s the patch passed
+1 💚 compile 2m 51s the patch passed
+1 💚 javac 2m 51s the patch passed
+1 💚 checkstyle 0m 42s the patch passed
+1 💚 whitespace 0m 0s The patch has no whitespace issues.
+1 💚 hadoopcheck 8m 57s Patch does not cause any errors with Hadoop 3.2.4 3.3.4.
+1 💚 spotless 0m 38s patch has no errors when running spotless:check.
+1 💚 spotbugs 2m 3s the patch passed
_ Other Tests _
+1 💚 asflicense 0m 15s The patch does not generate ASF License warnings.
33m 31s
Subsystem Report/Notes
Docker ClientAPI=1.41 ServerAPI=1.41 base: https://ci-hbase.apache.org/job/HBase-PreCommit-GitHub-PR/job/PR-4940/20/artifact/yetus-general-check/output/Dockerfile
GITHUB PR #4940
Optional Tests dupname asflicense javac spotbugs hadoopcheck hbaseanti spotless checkstyle compile
uname Linux 4ff8ad18ec40 5.4.0-1088-aws #96~18.04.1-Ubuntu SMP Mon Oct 17 02:57:48 UTC 2022 x86_64 x86_64 x86_64 GNU/Linux
Build tool maven
Personality dev-support/hbase-personality.sh
git revision master / 4add525
Default Java Eclipse Adoptium-11.0.17+8
Max. process+thread count 79 (vs. ulimit of 30000)
modules C: hbase-server hbase-mapreduce U: .
Console output https://ci-hbase.apache.org/job/HBase-PreCommit-GitHub-PR/job/PR-4940/20/console
versions git=2.34.1 maven=3.8.6 spotbugs=4.7.3
Powered by Apache Yetus 0.12.0 https://yetus.apache.org

This message was automatically generated.

@Apache-HBase
Copy link

🎊 +1 overall

Vote Subsystem Runtime Comment
+0 🆗 reexec 1m 22s Docker mode activated.
-0 ⚠️ yetus 0m 3s Unprocessed flag(s): --brief-report-file --spotbugs-strict-precheck --whitespace-eol-ignore-list --whitespace-tabs-ignore-list --quick-hadoopcheck
_ Prechecks _
_ master Compile Tests _
+0 🆗 mvndep 0m 13s Maven dependency ordering for branch
+1 💚 mvninstall 3m 6s master passed
+1 💚 compile 1m 16s master passed
+1 💚 shadedjars 4m 31s branch has no errors when building our shaded downstream artifacts.
+1 💚 javadoc 0m 46s master passed
_ Patch Compile Tests _
+0 🆗 mvndep 0m 12s Maven dependency ordering for patch
+1 💚 mvninstall 3m 24s the patch passed
+1 💚 compile 1m 23s the patch passed
+1 💚 javac 1m 23s the patch passed
+1 💚 shadedjars 4m 57s patch has no errors when building our shaded downstream artifacts.
+1 💚 javadoc 0m 49s the patch passed
_ Other Tests _
+1 💚 unit 244m 37s hbase-server in the patch passed.
+1 💚 unit 16m 10s hbase-mapreduce in the patch passed.
287m 5s
Subsystem Report/Notes
Docker ClientAPI=1.41 ServerAPI=1.41 base: https://ci-hbase.apache.org/job/HBase-PreCommit-GitHub-PR/job/PR-4940/20/artifact/yetus-jdk11-hadoop3-check/output/Dockerfile
GITHUB PR #4940
Optional Tests javac javadoc unit shadedjars compile
uname Linux ea94041c5f6a 5.4.0-135-generic #152-Ubuntu SMP Wed Nov 23 20:19:22 UTC 2022 x86_64 x86_64 x86_64 GNU/Linux
Build tool maven
Personality dev-support/hbase-personality.sh
git revision master / 4add525
Default Java Eclipse Adoptium-11.0.17+8
Test Results https://ci-hbase.apache.org/job/HBase-PreCommit-GitHub-PR/job/PR-4940/20/testReport/
Max. process+thread count 3171 (vs. ulimit of 30000)
modules C: hbase-server hbase-mapreduce U: .
Console output https://ci-hbase.apache.org/job/HBase-PreCommit-GitHub-PR/job/PR-4940/20/console
versions git=2.34.1 maven=3.8.6
Powered by Apache Yetus 0.12.0 https://yetus.apache.org

This message was automatically generated.

@Apache9
Copy link
Contributor

Apache9 commented Jan 11, 2023

Maybe a simpler solution is to just return the result(maybe empty) to client when we have already hold up too many buffers? We have a ScannerContext to limit the time and size, we can add a new buffer size to the ScannerContext?

@bbeaudreault
Copy link
Contributor Author

bbeaudreault commented Jan 11, 2023

Maybe a simpler solution is to just return the result(maybe empty) to client when we have already hold up too many buffers? We have a ScannerContext to limit the time and size, we can add a new buffer size to the ScannerContext?

You might be right. That was my initial implementation actually, but I decided to try releasing them instead because I had this checkpointing idea. I admit that it's getting complicated as is. I'm going to leave this open for now, but I will do some tests and open a separate PR to show how it would work with ScannerContext.

@bbeaudreault
Copy link
Contributor Author

I've pushed #4967 which handles this using ScannerContext. I still need to add tests, but it's working on one of our test clusters.

@bbeaudreault
Copy link
Contributor Author

@Apache9 my other PR which does this using ScannerContext is ready for review. #4967. I'm going to close this PR for now, though may revisit in the future if we ever need to do something more. Thanks for reviewing this to you and @comnetwork

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants