-
Notifications
You must be signed in to change notification settings - Fork 3.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
HBASE-27570 Unify tracking of block IO across all read request types #5004
Conversation
This comment was marked as outdated.
This comment was marked as outdated.
This comment was marked as outdated.
This comment was marked as outdated.
This comment was marked as outdated.
This comment was marked as outdated.
This comment was marked as outdated.
This comment was marked as outdated.
This comment was marked as outdated.
This comment was marked as outdated.
This comment was marked as outdated.
This comment was marked as outdated.
2ee8826
to
786f2b4
Compare
This comment was marked as outdated.
This comment was marked as outdated.
This comment was marked as outdated.
This comment was marked as outdated.
This comment was marked as outdated.
This comment was marked as outdated.
// This get returns nothing since the filter doesn't match. Filtered cells still retain | ||
// blocks, and this is a full row scan of both blocks. This equals 100 bytes so we should | ||
// throw a multiResponseTooLarge after this get if we are counting filtered cells correctly. | ||
Get g0 = new Get(row).addFamily(FAMILY).setFilter( | ||
new QualifierFilter(CompareOperator.EQUAL, new BinaryComparator(Bytes.toBytes("sdf")))); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
old method removed from addSize would not count any block io for this get.
// g1 and g2 each count the first 55 byte block, so we end up with block size of 110 | ||
// after g2 and throw a multiResponseTooLarge before g3 | ||
Get g1 = new Get(row); | ||
g1.addColumn(FAMILY, cols[0]); | ||
gets.add(g1); | ||
|
||
Get g2 = new Get(row); | ||
g2.addColumn(FAMILY, cols[3]); | ||
gets.add(g2); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
old method would have returned these 2 gets in the same rpc, because of lastBlock tracking (both columns in the same block). We could add that functionality back with a change like this
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
but this only works because of the order of the gets g1, g2, g3. if we instead ordered it g1, g3, g2 lastBlock would not help.
This comment was marked as outdated.
This comment was marked as outdated.
1 similar comment
This comment was marked as outdated.
This comment was marked as outdated.
91e02cf
to
7e0ef57
Compare
💔 -1 overall
This message was automatically generated. |
💔 -1 overall
This message was automatically generated. |
Build broken due to https://issues.apache.org/jira/browse/HBASE-27608. Will rebase once that's merged. |
🎊 +1 overall
This message was automatically generated. |
7e0ef57
to
a4da6f1
Compare
🎊 +1 overall
This message was automatically generated. |
🎊 +1 overall
This message was automatically generated. |
🎊 +1 overall
This message was automatically generated. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice clean up!
…5004) Signed-off-by: Duo Zhang <[email protected]>
…pache#5004) Signed-off-by: Duo Zhang <[email protected]>
…pache#5004) Signed-off-by: Duo Zhang <[email protected]>
…pache#5004) Signed-off-by: Duo Zhang <[email protected]>
…ead request types (apache#5004) Signed-off-by: Duo Zhang <[email protected]>
There are two benefits to this change:
However, there is one important functional change here. Previously we kept track of
lastBlock
in order to try not to overcount blocks when submitting multigets for individual cells of the same row or multiple contiguous rows in the same block. For example:table.get(List.of(1, 2, 3))
I could add similar handling by stashing lastBlock in RpcCallContext (see example commit here). For measuring work done by a request, the new solution is more accurate as-is. For limiting retained blocks, the overcounting may just cause some requests to be broken into smaller chunks than they are today.
The default block size is 64kb and default max result size is 100mb. So one would need to submit a multiget with 1600 gets for individual columns of a row or perfectly ordered contiguous rows within 1 block in order to run into a difference. This feels like an edge case, and lastBlock tracking has numerous ways it can mess up:
With those caveats, it did not feel worth adding the complexity of RpcCallContext lastBlock tracking. But I could easily be convinced otherwise.