-
Notifications
You must be signed in to change notification settings - Fork 2.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Vertical Block Sharding Proposal #4191
base: cap-size-com
Are you sure you want to change the base?
Changes from all commits
11ddafa
5de0618
c41d3ff
99429e9
d4d511d
a18ad22
c88e9a4
68e3413
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -3,7 +3,7 @@ title: "Vertical Block Sharding" | |
type: proposal | ||
menu: proposals | ||
status: proposed | ||
owner: @bwplotka | ||
owner: @bwplotka @biswajitghosh98 | ||
Date: 2 November 2020 | ||
--- | ||
|
||
|
@@ -75,8 +75,8 @@ index size). | |
|
||
## Goals | ||
|
||
* Automatically cap the size of index per block to X GB without impacting read performance. | ||
* Optional: CLI for vertical sharding of old blocks. | ||
* Automatically cap the size of index per block to X GB without impacting read performance. | ||
* Alternatively, allow grouping series within blocks that spans over larger time windows defined in compaction level (e.g up to 2w), to leverage downsampling for querying. | ||
|
||
## Non Goals | ||
|
||
|
@@ -87,17 +87,49 @@ index size). | |
|
||
## Proposal | ||
|
||
**We propose to vertically shard all blocks which goes beyond the specified index size limit during the compaction process.** | ||
**We propose to vertically shard blocks when it reaches the cap limit in terms of number of series or size of index file, and then adaptively decide the number of shards depending on the users requirement** | ||
|
||
* Special metadata in meta.json indicating a shard. | ||
* We'll be using a special metadata in meta.json indicating a shard. | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Could this be another external label? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Yes, I think |
||
|
||
TBD | ||
### Adaptive Binary Sharding | ||
|
||
The idea behind this approach is to leverage on the structural benefits of a binary tree to adaptively shard blocks which hit the maximum limit, either in terms of index size or number of series (set by user). | ||
|
||
For this approach, we'd need to add two extra external label `hash_number`, initialized to 0, `shard_level` initialized to 0, denoting the number of times it has been sharded during the compaction process, and a set of 16 `hash_function`s. The reason for chosing 16 is because 2 raised to the power 16 is 65536, which is a logical upper bound for the number of shards a user might want to have. | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Are There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. @hitanshu-mehta Yes. Apologies if it was unclear from my end :) There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. No worries :) I think we should use the same term everywhere in the proposal because it can lead up to confusion sometimes. WDYT? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. @hitanshu-mehta Yes, definitely. I'll change it and add more pictures ASAP |
||
|
||
## Alternatives | ||
We'll be using `hash_level` and `hash_number` to group and plan blocks to compact together. The way it'd work is, we'll allow grouper to group only those blocks together whose `hash_level`s are same. Also, we'd allow planner to compact only those blocks together that share the same `hash_number`. So, with this, the compactor would run as it is unless a compacted block has hit the limit for the first time. | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Does this mean blocks can get merged even if they're not part of the same stream (have the same external labels)? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. No, I think it is not mentioned explicitly but all external labels should be the same during the grouping. |
||
|
||
* Shard number stored in external label. | ||
![vbs-image1](../img/vertical_block_sharding_image1.png) | ||
|
||
We're allowing a compacted block to hit the limit for a maximum of 16 times (if not specified otherwise by the user), and at the 16th level, if/when it further compacts and hits the limit, we're marking it for no more further compactions. The algorithm for sharding would come into play once a compacted block is declared invalid, or its size hits the limit. If the compacted block is still valid, but its index size is too large (or the number of series it holds exceeds the limit decided by the user), we'll look at the current `hash_level`(say `hl`) of the compacted block (if valid), or the blocks participating in the compaction process (if the compacted block is invalid), then we'll either shard the compacted block or the participating blocks (depending on the situation), `using hash_function[hl+1]`, and hence set the `hash_level` of the newly created blocks to `hl+1`, and then upload them for further compaction processes, if `hl` is not equal to `max_shard_level_limit`, which is 16 by default. In case `hl` is equal to `max_shard_level_limit`, we will take one of the 2 decisions depending on whether the compacted block is valid or invalid. | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. What does it mean for an index size to be too large? Will this be configurable? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Yeah I think it should be configurable. We have supported planning considering index size from #3410 and the index limit is configurable. |
||
* If the compacted block is valid, then continue the usual compaction process until it results into an invalid block. | ||
* If the compacted block is invalid, then add the no compact filter to all the participating blocks. | ||
|
||
We can keep growing the binary tree by recursively using the same logic. The reason for calling this adaptive is because at any point of time, the total number of shards is equal to the number of nodes on the tree (say x). If any of the leaves overloads (a block in that particular shards hits the limit), that particular leaf would split into 2, effectively increasing the number of shards to x+2, hence providing just the right amount of shards to get the job done. | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I think the total number of shards should be equal to the number of leaf nodes and every vertical sharding will increase the number of shards to x + 1. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. @hitanshu-mehta Say we have 10 blocks in shard (0,0), and we're compacting 3 blocks out of those together. Suppose the resultant block hits the limit, then we'll shard the 3 participating blocks, leaving 7 blocks at shard (0,0), 3 at (1,0) and 3 at (1,1), increasing the number of shards from 1 to 3 There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Thank you for clearing things up! I was calculating shards the different(maybe wrong) way. To give a rough idea how I calculated, block_number(hash_number, hash_level)
The total number of stared(*) blocks is the number of shards. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. @hitanshu-mehta Oh, I see. But this is different from what I had in mind. The proposal would need more pictures and re-wording to make it more clear for everyone, I suppose :) |
||
|
||
![vbs-image2](../img/vertical_block_sharding_image2.png) | ||
![vbs-image3](../img/vertical_block_sharding_image3.png) | ||
|
||
To give the user control over selecting number of shards (if needed), we can add a flag `max_shard_level_limit`, and the user can change the value from the default value of 16 to anything between 0 and 16 (inclusive). Let the current value for `max_shard_level_limit` be `max_l`. Then, instead of compactors grouper grouping blocks with the same `shard_level` together, it'd group blocks on the basis of same `max(shard_level,max_l)`, hence reverting the sharding process in a controlled manner. A similar logic would work if we decide to increase the value of `max_shard_level_limit`. | ||
|
||
### Pros and cons | ||
|
||
#### Pros | ||
|
||
* This layered sharding approach would statistically give more evenly distributed shards as compared to using only one hash function and hence sharding the blocks into some x shards. For layered binary sharding, let's assume that all the 16 hash functions are bad hash functions, and each one of them gives a skewed distribution, even then we'll get a fairly even distribution after sharding through all the layers. This is pictorially explained below ![vbs-image4](../img/vertical_block_sharding_image4.png) | ||
* The process is adaptive, hence any user can opt to go for default configuration and not worry about setting the number of shards. | ||
* If for some reasons, the user decides to decrease the number of shards, it can be done by setting a `max_shard_level_limit` to any value between 0 and 16. | ||
|
||
#### Cons | ||
|
||
* We didn't solve the problem completely, but we delayed it as much as we could to get some benefits out of the delaying process. | ||
* Cases where large number of blocks participate in a compaction process and result in an invalid block would cause the sharding algorithm to be triggered, even though sharding at that point might not be necessary. | ||
|
||
### Design Decisions | ||
|
||
#### Decision 1: Using size of index file vs number of metrics in a block as an indicator for sharding | ||
|
||
The pros and cons for selecting one design choice over the other is yet to be discovered, and would be more clear after implementation and testing. | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I wonder how the proposed approach compares to the current workaround of setting a maximum index size per block. |
||
|
||
## Future Work | ||
|
||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we still want to wait for the block reaching the cap limit to do sharding? With vertical sharding, I think we want to shard the blocks eagerly to get better query performance.