-
Notifications
You must be signed in to change notification settings - Fork 3.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
HBASE-27558 Scan quotas and limits should account for total block IO #4967
Conversation
🎊 +1 overall
This message was automatically generated. |
💔 -1 overall
This message was automatically generated. |
🎊 +1 overall
This message was automatically generated. |
🎊 +1 overall
This message was automatically generated. |
💔 -1 overall
This message was automatically generated. |
dd7170b
to
7f7c8b6
Compare
Pushed tests, marking ready for review. |
🎊 +1 overall
This message was automatically generated. |
🎊 +1 overall
This message was automatically generated. |
🎊 +1 overall
This message was automatically generated. |
* Returns the block size in bytes for the current block. Will only return a value once per block, | ||
* otherwise 0. Used for calculating block IO in ScannerContext. | ||
*/ | ||
int getCurrentBlockSizeOnce(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This API design is a bit strange... Let me take a look on the usage...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it is better to introduce a method called recordBlockSize? The comment could say that the implementation should make sure that for every block we only record once.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That seems intuitive and reasonable. Done. Please see latest commit.
hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/ScannerContext.java
Show resolved
Hide resolved
hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/RSRpcServices.java
Show resolved
Hide resolved
@@ -608,6 +610,11 @@ private boolean nextInternal(List<Cell> results, ScannerContext scannerContext) | |||
return scannerContext.setScannerState(NextState.NO_MORE_VALUES).hasMoreValues(); | |||
} | |||
if (!shouldStop) { | |||
// Read nothing as the cells were filtered, but still need to check time limit. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why we need to add a check here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We want to check the size limit any time we are potentially continuing the loop. This was the only case we missed, the others (which i converted from checkTimeLimit to checkAnyLimit above) are all similar.
Since nextRow
is now accumulating block size, we want to check after calling nextRow to ensure we haven't exceeded the limit.
I could make this checkSizeLimit if you'd like. I made it checkAnyLimitReached so that it is the same as the other calls above, which were just checking time limit.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To be honest, I think this should have been here all along and was just missed along the way. I'm not sure why we'd want to check time limit for the nextRow calls above but not this one. This check here ensures that populating from joined heap + nextRow does not exceed time or (new) size limit.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This method is on the critial path of reading, and the code here will executed every time when we get a row, so it may affect scan performance if we add more checks here.
I just mean is it a must to have a check here? Why we do not need to check here in the past...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually, this section of the method is not nearly as hot as the rest of the method. The only real way we reach this point is when filter.filterRowCells(kvs)
clears all cells from the results after having been accumulated in StoreScanner. There are only 2 standard filters which do this -- DependentColumnFilter and SingleColumnValueExcludeFilter.
That said, you do make a good point. We have never had a time limit here, so we may not need it. We do need a size limit check here, now that we track block sizes. Previously, we would not check size limit here because the results are empty so wouldn't have accumulated size progress. Now that we accumulate block size progress even for filtered rows, we need a check.
For this type of scan, we will have accumulated blocks in both populateResults()
and possibly nextRow()
. Right after populateResults()
there's a scannerContext.checkAnyLimitReached(LimitScope.BETWEEN_CELLS)
call. That call doesn't protect against this case, because it passes BETWEEN_CELLS
. For scans with filter.hasFilterRow()
, the limit scope is changed to LimitScope.BETWEEN_ROWS
. So this check is skipped for these. Scans which enter this code block will have skipped all other limit checks above. The checkSizeLimit I add here is the only safe place we can check BETWEEN_ROWS for these types of filtered scans.
The best way to illustrate this is with a test -- I just pushed a change which does the following:
- Change this line to just
checkSizeLimit
- Adds a new test
testCheckLimitAfterFilteringRowCells
If I comment out this checkSizeLimit, the added test fails -- the whole scan is able to complete in 1 rpc instead of the expected 4. So this illustrates that we need to have a size check here.
Personally I think it's also accurate to have a time limit check here, because for these types of scans I think they'd be able to circumvent our existing time limits. But within the scope of this JIRA, I can keep it to just size limit for now.
hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/RegionScannerImpl.java
Show resolved
Hide resolved
hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/StoreScanner.java
Outdated
Show resolved
Hide resolved
* Returns the block size in bytes for the current block. Will only return a value once per block, | ||
* otherwise 0. Used for calculating block IO in ScannerContext. | ||
*/ | ||
int getCurrentBlockSizeOnce(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it is better to introduce a method called recordBlockSize? The comment could say that the implementation should make sure that for every block we only record once.
hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/RSRpcServices.java
Show resolved
Hide resolved
hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/RSRpcServices.java
Show resolved
Hide resolved
@@ -608,6 +610,11 @@ private boolean nextInternal(List<Cell> results, ScannerContext scannerContext) | |||
return scannerContext.setScannerState(NextState.NO_MORE_VALUES).hasMoreValues(); | |||
} | |||
if (!shouldStop) { | |||
// Read nothing as the cells were filtered, but still need to check time limit. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This method is on the critial path of reading, and the code here will executed every time when we get a row, so it may affect scan performance if we add more checks here.
I just mean is it a must to have a check here? Why we do not need to check here in the past...
Thanks for sticking with me on this @Apache9. Great feedback. I've pushed changes based on your recent comments and provided some extra context on a couple of your comments. For some reason they don't show up as responses to your most recent review, but you should see them if you scroll up. |
🎊 +1 overall
This message was automatically generated. |
🎊 +1 overall
This message was automatically generated. |
🎊 +1 overall
This message was automatically generated. |
hbase-server/src/main/java/org/apache/hadoop/hbase/io/hfile/HFileScanner.java
Outdated
Show resolved
Hide resolved
hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/RSRpcServices.java
Show resolved
Hide resolved
9ebcfce
to
cfd4b55
Compare
🎊 +1 overall
This message was automatically generated. |
🎊 +1 overall
This message was automatically generated. |
💔 -1 overall
This message was automatically generated. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the comment. Much clear now. I think we are almost there~
hbase-server/src/main/java/org/apache/hadoop/hbase/io/hfile/HFileReaderImpl.java
Outdated
Show resolved
Hide resolved
hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/RSRpcServices.java
Outdated
Show resolved
Hide resolved
hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/RSRpcServices.java
Outdated
Show resolved
Hide resolved
🎊 +1 overall
This message was automatically generated. |
🎊 +1 overall
This message was automatically generated. |
🎊 +1 overall
This message was automatically generated. |
…4967) Signed-off-by: Duo Zhang <[email protected]>
…pache#4967) Signed-off-by: Duo Zhang <[email protected]>
…pache#4967) Signed-off-by: Duo Zhang <[email protected]>
…pache#4967) Signed-off-by: Duo Zhang <[email protected]>
…or total block IO (apache#4967) Signed-off-by: Duo Zhang <[email protected]>
I've deployed this on a test cluster and verified that it helps to reduce excess retained blocks due to heavily filtered scans. It is nice to have a upper limit on the cost of a scan, whether filtered or unfiltered and equally applying to both.