Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

deprecate config-magic in favor of json configuration stuff #14695

Merged
merged 9 commits into from
Aug 17, 2023

Conversation

clintropolis
Copy link
Member

@clintropolis clintropolis commented Jul 29, 2023

Description

This PR aims to deprecate usage of config-magic (ConfigProvider) in favor of using JsonConfigProvider everywhere. In the process I have made the config paths a bit more sane (one of the reasons to replace ConfigProvider is because of how easily devs can make paths that don't make much structural sense due to the free-form nature of how config paths are defined).

The main users are DruidProcessingConfig and DruidCoordinatorConfig, the former of which has been reworked to be simple jackson annotated objects which are injected with a fallback to support legacy config-magic paths as needed. After these changes are released, we can consider finally deleting config-magic in a future release.

For DruidProcessingConfig, the structure was mostly ok, but I have flattened the broker parallel merge config into just living at druid.processing.merge (i guess we could change this, i don't feel very strongly), and split it into a totally separate config from the rest of DruidProcessingConfig since it is only used by brokers for CachingClusteredClient.

For DruidCoordinatorConfig WIP (i haven't done anything to it yet, but it definitely needs reworked...)

MonitorSchedulerConfig was also using config-magic annotations, but was no longer using ConfigProvider and so seemed unnecessary so I have just flattened it into DruidMonitorSchedulerConfig which has moved to druid-processing, because it was already wired up to JsonConfigProvider.

Release note

Broker parallel merge config options at paths druid.processing.merge.pool.* and druid.processing.merge.task.* have been flattened to use the druid.processing.merge prefix. Note that the legacy paths will still function, but will issue a warning log noting that these configs have been deprecated and will be removed in a future release, along with the new paths which should be used to continue configuring these values.


This PR has:

  • been self-reviewed.
  • added documentation for new or modified features or behaviors.
  • a release note entry in the PR description.
  • added Javadocs for most classes and all non-trivial methods. Linked related entities via Javadoc links.
  • added or updated version, license, or notice information in licenses.yaml
  • added comments explaining the "why" and the intent of the code wherever would not be obvious for an unfamiliar reader.
  • added unit tests or modified existing tests to cover new code paths, ensuring the threshold for code coverage is met.
  • added integration tests.
  • been tested in a test Druid cluster.

@clintropolis clintropolis changed the title json config based processing and broker merge configs to deprecate config-magic deprecate config-magic in favor of json configuration stuff Jul 29, 2023
);
}
@Rule
public ExpectedException expectedException = ExpectedException.none();

Check notice

Code scanning / CodeQL

Deprecated method or constructor invocation Note test

Invoking
ExpectedException.none
should be avoided because it has been deprecated.
Comment on lines +132 to +152
Injector injector = Initialization.makeInjectorWithModules(
GuiceInjectors.makeStartupInjector(),
ImmutableList.of(
Modules.override(
(binder) -> {
binder.bindConstant().annotatedWith(Names.named("serviceName")).to("test");
binder.bindConstant().annotatedWith(Names.named("servicePort")).to(0);
binder.bindConstant().annotatedWith(Names.named("tlsServicePort")).to(-1);
binder.bind(Properties.class).toInstance(props);
ConfigurationObjectFactory factory = Config.createFactory(props);
LegacyBrokerParallelMergeConfig legacyConfig = factory.build(LegacyBrokerParallelMergeConfig.class);
binder.bind(ConfigurationObjectFactory.class).toInstance(factory);
binder.bind(LegacyBrokerParallelMergeConfig.class).toInstance(legacyConfig);
},
target
).with((binder) -> {
binder.bind(CachePopulatorStats.class).toInstance(cachePopulatorStats);
binder.bind(CacheConfig.class).toInstance(cacheConfig);
}
})
)
)));
);

Check notice

Code scanning / CodeQL

Deprecated method or constructor invocation Note test

Invoking
Initialization.makeInjectorWithModules
should be avoided because it has been deprecated.
Copy link
Contributor

@kfaraz kfaraz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@clintropolis , the changes look good to me overall. I have left some minor comments.

As for DruidCoordinatorConfig, I am looking into how we can clean up the structure. I will share my thoughts here soon.

But for now, I think we should merge the current PR (as it has already touched several files) and tackle DruidCoordinatorConfig in a separate PR of its own.

: numThreads;
this.numMergeBuffers = numMergeBuffers == null ? Math.max(2, this.numThreads / 4) : numMergeBuffers;
this.fifo = fifo == null || fifo;
this.tmpDir = tmpDir == null ? System.getProperty("java.io.tmpdir") : tmpDir;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: You could consider using Configs.valueOrDefault() utility method for better readability.

Suggested change
this.tmpDir = tmpDir == null ? System.getProperty("java.io.tmpdir") : tmpDir;
this.tmpDir = Configs.valueOrDefault(tmpDir, System.getProperty("java.io.tmpdir"));

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

cool, didn't know that existed, changed to use this where appropriate

{
@JsonProperty
private String schedulerClassName = BasicMonitorScheduler.class.getName();

@JsonProperty
private Period emissionPeriod = new Period("PT1M");

@JsonProperty
public String getSchedulerClassName()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: Do methods need to be annotated too? Would this object need to be serialized?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

they don't currently get serialized so its not really necessary, not sure why some of the getters here had them, removed them all to make consistent

@@ -46,7 +45,6 @@ public Period getEmissionPeriod()
return emissionPeriod;
}

@Override
public Duration getEmitterPeriod()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should probably be renamed to getEmissionDuration to avoid confusion with the other method.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

renamed, though the other method wasn't actually used which became apparent after removing the json annotation, so i removed it too


@Config({"druid.computation.buffer.size", "${base_path}.buffer.sizeBytes"})
public HumanReadableBytes intermediateComputeSizeBytesConfigured()
public static long computeMaxMemoryFromMaxHeapSize()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This method is probably not needed, we should just in-line it in initializeBufferSize.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

inlined


public class DruidProcessingConfigModule implements Module
@Deprecated
public class LegacyBrokerParallelMergeConfigModule implements Module
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please add a comment about why this is still needed and what needs to be done to remove it completely.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

added javadoc here and to the associated config

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks!

if (config.getNumThreadsConfigured() != ExecutorServiceConfig.DEFAULT_NUM_THREADS) {
log.error("numThreads[%d] configured, that is ignored on Router", config.getNumThreadsConfigured());
if (config.isNumThreadsConfigured()) {
log.error("numThreads[%d] configured, that is ignored on Router", config.getNumThreads());
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if we even need to log this. Definitely doesn't need to be an error.

Suggested change
log.error("numThreads[%d] configured, that is ignored on Router", config.getNumThreads());
log.warn("Ignoring configured value for 'numThreads'[%d] as it is not used by the Router.", config.getNumThreads());

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

changed to warn, iirc the idea was that these configs have no effect on router so logging about it makes it obvious that a config change didn't actually change anything. I don't really feel strongly either way, though i do see some merit to getting feedback when a config is specified that has no effect, however we aren't doing it very consistently

public class DruidProcessingBufferConfig
{
public static final HumanReadableBytes DEFAULT_PROCESSING_BUFFER_SIZE_BYTES = HumanReadableBytes.valueOf(-1);
public static final int MAX_DEFAULT_PROCESSING_BUFFER_SIZE_BYTES = 1024 * 1024 * 1024;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit:

Suggested change
public static final int MAX_DEFAULT_PROCESSING_BUFFER_SIZE_BYTES = 1024 * 1024 * 1024;
public static final int MAX_PROCESSING_BUFFER_SIZE_BYTES = 1024 * 1024 * 1024;

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hmm, it is only a default though, if the user specifies a buffer size we use that instead, so this just specifies the max size we use if the user didn't define the buffer size explicitly and we are on a JVM that supports auto computing the size

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

iow, i didn't change this for now, but am happy to do so if you feel more strongly about it

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, it is fine. We can leave as it is.

@kfaraz
Copy link
Contributor

kfaraz commented Aug 14, 2023

For DruidCoordinatorConfig, I think the following structure should work:

Existing class DruidCoordinatorConfig bound to druid.coordinator

  • startDelay
  • period
  • indexingPeriod
  • metadataStoreManagementPeriod
  • kill: add new class CoordinatorKillConfigs
    • unused: add new class KillUnusedSegmentsConfig
      • period
      • durationToRetain
      • ignoreDurationToRetain
      • maxSegments
    • supervisor: add new class MetadataCleanupConfig for this and other cleanups below
      • period
      • durationToRetain
    • audit
      • period
      • durationToRetain
    • datasource
      • period
      • durationToRetain
    • rule
      • period
      • durationToRetain
    • compaction
      • period
  • loadqueuepeon: add new class LoadQueuePeonConfig
    • type
    • loadTimeout (to replace druid.coordinator.load.timeout)
    • http: add new class HttpLoadQueuePeonConfig
      • batchSize
      • repeatDelay
      • hostTimeout

Paths to deprecate

Add new class LegacyKillUnusedSegmentsConfig which has the following deprecated paths

  • druid.coordinator.kill.period
  • druid.coordinator.kill.durationToRetain
  • druid.coordinator.kill.ignoreDurationToRetain
  • druid.coordinator.kill.maxSegments

Add new class LegacyCoordinatorPeriodConfig which has the following deprecated paths

  • druid.coordinator.period.indexingPeriod
  • druid.coordinator.period.metadataStoreManagementPeriod

Paths to remove

  • druid.coordinator.curator.loadqueuepeon.numCallbackThreads
    Curator loading is already deprecated and will be removed before the next release. We can remove this config and hardcode the only usage in CliCoordinator to 2 for the time being.
  • druid.coordinator.load.timeout
    This can be replaced by druid.coordinator.loadqueuepeon.loadTimeout. The older config can just be dropped as I have never seen any cluster configure this value to anything other than the default (15 minutes). Also, I have other changes planned for this area which are going to further refine this config.

@clintropolis , let me know what you think.

@clintropolis
Copy link
Member Author

For DruidCoordinatorConfig, I think the following structure should work:
...

That seems pretty reasonable to me, since we are switching to JSON config, and a ton of those configs have a period and
durationToRetain, it feels like there is some common class that should be able to be re-used for most of those configs.

druid.coordinator.curator.loadqueuepeon.numCallbackThreads
Curator loading is already deprecated and will be removed before the next release. We can remove this config and hardcode the only usage in CliCoordinator to 2 for the time being.

If for some reason we don't get this removed prior to the next release, we should probably fix it to be consistent with the http load queue peon config, and make its path be druid.coordinator.loadqueuepeon.curator.numCallbackThreads

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants