-
Notifications
You must be signed in to change notification settings - Fork 3.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Disable ZK-based segment loading, remove usage of skife from DruidCoordinatorConfig #15705
Disable ZK-based segment loading, remove usage of skife from DruidCoordinatorConfig #15705
Conversation
server/src/test/java/org/apache/druid/server/coordinator/duty/KillUnusedSegmentsTest.java
Show resolved
Hide resolved
server/src/test/java/org/apache/druid/server/coordinator/duty/KillSupervisorsTest.java
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks great! Checkpointing an initial review with some questions and suggestions.
server/src/main/java/org/apache/druid/server/coordinator/config/CoordinatorKillConfigs.java
Outdated
Show resolved
Hide resolved
.withCoordinatorKillMaxSegments(10) | ||
.withCoordinatorKillIgnoreDurationToRetain(false) | ||
.build() | ||
// 100ms is a great price to pay if it removes the flakeyness, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
;)
// 100ms is a great price to pay if it removes the flakeyness, | |
// 100ms is a great price to pay if it removes the flakiness, |
server/src/main/java/org/apache/druid/server/coordinator/config/CoordinatorKillConfigs.java
Show resolved
Hide resolved
server/src/main/java/org/apache/druid/server/coordinator/config/CoordinatorKillConfigs.java
Outdated
Show resolved
Hide resolved
server/src/main/java/org/apache/druid/server/coordinator/duty/KillSupervisorsCustomDuty.java
Outdated
Show resolved
Hide resolved
); | ||
this.metadataSupervisorManager = metadataSupervisorManager; | ||
log.warn("This is only an example implementation of a custom duty and" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I suppose if the delegate is KillSupervisors
, this log is not accurate anymore?
Also, curious, if this is meant to be an example implementation of a custom duty, should this code just reside in a separate tree in the repo?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, I actually wanted to get rid of this duty altogether and have a simpler DummyCustomCoordinatorDuty
which just logged something in every run rather than doing something meaningful. It doesn't make sense to have a full fledged implementation and then advise users against using it.
What do you think?
server/src/main/java/org/apache/druid/server/coordinator/duty/KillUnusedSegments.java
Outdated
Show resolved
Hide resolved
server/src/main/java/org/apache/druid/server/coordinator/config/MetadataCleanupConfig.java
Outdated
Show resolved
Hide resolved
Thanks a lot for the review, @abhishekrb19 ! I have replied to your comments. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the refactor @kfaraz! The configuration is so much easier to follow now. I've left a few non-blocking comments.
server/src/main/java/org/apache/druid/server/coordinator/config/CoordinatorKillConfigs.java
Show resolved
Hide resolved
server/src/main/java/org/apache/druid/server/coordinator/config/CoordinatorKillConfigs.java
Show resolved
Hide resolved
server/src/main/java/org/apache/druid/server/coordinator/config/CoordinatorKillConfigs.java
Show resolved
Hide resolved
|
||
public class MetadataCleanupConfig | ||
{ | ||
public static final MetadataCleanupConfig DEFAULT = new MetadataCleanupConfig(null, null, null); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Now that I look into this again, it looks like this DEFAULT conflcits with the actual defaults applied in the constructor. Do we need this static null config -- won't the json creator below actually handle the default values when it's invoked from the caller?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It doesn't actually conflict with the constructor. Since it is using the same constructor, it would have the same default values applied, i.e. true
, 1 day
, 90 days
.
This is just used as short-hand in tests and in the class CoordinatorKillConfigs
.
server/src/main/java/org/apache/druid/server/coordinator/config/KillUnusedSegmentsConfig.java
Outdated
Show resolved
Hide resolved
server/src/main/java/org/apache/druid/server/coordinator/config/CoordinatorKillConfigs.java
Show resolved
Hide resolved
server/src/main/java/org/apache/druid/server/coordinator/config/CoordinatorKillConfigs.java
Outdated
Show resolved
Hide resolved
server/src/main/java/org/apache/druid/server/coordinator/loading/CuratorLoadQueuePeon.java
Show resolved
Hide resolved
Follow up to #15705 Changes: - Remove references to ZK-based segment loading in the docs - Fix doc for existing config `druid.coordinator.loadqueuepeon.http.repeatDelay`
* Remove usage of skife from DruidCoordinatorConfig * Remove old config class * Address static checks * Fix tests * Remove unnecessary mocks * Fix config typos * Fix config condition * Fix test, spotbug check * Move validation to DruidCoordinatorConfig * Move DruidCoordinatorConfig to different package * Fix validation of killunusedconfig * Simplify and fix KillSupervisorsCustomDuty * Address review comments * Fix new tests * Add KillUnusedSchemasConfig * Remove KillUnusedSchemasConfig * Minor renames
Background: ZK-based segment loading has been completely disabled in #15705 . ZK `servedSegmentsPath` has been deprecated since Druid 0.7.1, #1182 . This legacy path has been replaced by the `liveSegmentsPath` and is not used in the code anymore. Changes: - Never create ZK loadQueuePath as it is never used. - Never create ZK servedSegmentsPath as it is never used. - Do not create ZK liveSegmentsPath if announcement on ZK is disabled - Fix up tests
Description
This is a follow up to #14695 to move configs from using skife to regular
JsonConfigurator
.Advantages:
DruidCoordinatorConfig
objectChange summary
CoordinatorRunConfig
,CoordinatorKillConfigs
,CoordinatorPeriodConfig
DruidCoordinatorConfig
DruidCoordinatorConfig
act as a container for the above configsMetadataCleanupConfig
to serve as a common config container for all metadata cleanupsMetadataCleanupDuty
toDruidCoordinatorConfig
for cleaner code and to allow coordinator to fail fast on startup in case of invalid config values.Added configs
1.
CoordinatorRunConfig
Path:
druid.coordinator
Fields:
period
,startDelay
Description: Configs related to the running of the coordinator
2.
CoordinatorPeriodConfig
Path:
druid.coordinator.period
Fields:
indexingPeriod
,metadataStoreManagementPeriod
Description: Additional periods related to the operation of the coordinator
3.
CoordinatorKillConfigs
Path:
druid.coordinator.kill
Fields 1:
audit
,compaction
,datasource
,rules
,supervisors
,pendingSegments
Description: Each of the above fields is deserialized as a
MetadataCleanupConfig
which in-turn contains the following fields:on
: Whether cleanup is enabledperiod
: Cleanup perioddurationToRetain
: Duration of metadata to retainFields 2:
on
,period
,durationToRetain
,bufferPeriod
,ignoreDurationToRetain
,maxSegments
Description: These fields are related to cleanup of unused segments.
ZK-based segment loading disabled
druid.coordinator.loadqueuepeon.type
, thus always usingHttpLoadQueuePeon
Release note
Zookeeper-based segment loading is being removed as it is known to have issues and has been deprecated for several releases. The recent improvements made to the Druid coordinator are also known to work much better with HTTP-based segment loading.
The following configs are being removed as they are not needed anymore:
druid.coordinator.load.timeout
: Not needed as the default value of this parameter (15 minutes) is known to work well for all clustersdruid.coordinator.loadqueuepeon.type
: Not needed as this value will always behttp
druid.coordinator.curator.loadqueuepeon.numCallbackThreads
: Not needed as zookeeper(curator)-based segment loading is not an option anymoreAuto-cleanup of compaction configs of inactive datasources is now enabled by default.
This PR has: