-
Notifications
You must be signed in to change notification settings - Fork 3.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Enable auto minor compaction #9712
Comments
Hi @yuanlihan, thank you for trying out minor compaction! The thing is that you can enable auto minor compaction now by setting Also, please note that the segment lock which is used in minor compaction is still experimental and there are a couple of known issues such as #9571. Probably I could work on fixing those bugs with segment lock and improving auto compaction to support both major and minor compactions at some point this year, but am unsure when it could be yet. |
Thanks @jihoonson for the explanation. Now technically, the auto minor compaction is available by setting Here is a typical arrangement of segments within an hour in my case: click to expand list
It would be better to only do minor compaction for the first M small segments and the tail N small segments. Also found that minor compaction tasks will always fail if the partitionId is not consecutive.
|
@yuanlihan your assessment is very correct! Yes, the coordinator should be able to run both minor compaction and major compactions; minor compaction for recent data and major compaction for old data. As you mentioned, minor compaction should be able to run on a subset of segments in a time chunk instead of grabbing all of them.
This sounds nice, but I'm not sure how we can do it. Auto compaction algorithm used to use segment size as a trigger for compaction, but this caused a bunch of bugs since the segment size after compaction can be still small based on your configuration such as maxRowsPerSegment. Also parallel task will create at least one small segment in most cases since the last task will be likely assigned small number of segments. As a result, we changed the algorithm to be stateful in #8573. Do you have a good idea?
Oh yeah, this is a known issue. Just opened #9768. |
Hi, @jihoonson I like the stateful algorithm for auto compaction.
I would like to introduce two extra properties to
First filter out big segments from candidates( |
@yuanlihan thanks for the suggestion. I agree that the auto compaction should be able to skip compaction for segments which are already in a good size. Maybe my previous concern is not a problem anymore if we check both the segment size and whether the segment is already compacted. For example we can skip compaction even for small segments if they are created by compaction. However, the first property For the second property, is there a use case where you don't want to compact segments using minor compaction? Or can we always compact if there are 2 or more segments?
This is true for now, but I think it should support all partitionsSpec types. Hash and range partitioning will be a primary use case when they are supported. But there are also needs for supporting |
In these cases, I would like to increase
Personally, I prefer specifying by size in bytes per segment because it may be more obvious for users.
No special use case here. In my opinion, it's acceptable to omit the occasional small segments mixed in segments with regular size, especially when using minor compaction. This property may make it more flexible for users to configure. And I agree that we should consider the use case of splitting a big segment which may caused by data skew. |
This issue has been marked as stale due to 280 days of inactivity. |
This issue has been closed due to lack of activity. If you think that |
Description
Note that @jihoonson introduced the minor compaction feature to Druid at version 0.16.x, which helps to effectively compact small segments. As we know, the cluster may suffer from the too many small segments issue if there are big pipelines that have a long trickle of late data.
But after upgrading to version 0.16.1, I found that the
auto minor compaction
was not implemented then. But still I rushed to the feature by making an adaptation based on the implementation of minor compaction and it works well till now. Thanks @jihoonson and all others involved with this greate feature.Nowadays, we decide to upgrade to newer version, but find that the
auto minor compaction
is still not available. May I know if @jihoonson or anyone else WIP about this feature now? And if not, maybe I can have a try to raise a PR about the adaptation.PS: there was a detail discuss #8489 between @jihoonson and @himanshug about
CompactionTask
and theauto compaction
.Motivation
auto minor compaction
.The text was updated successfully, but these errors were encountered: