Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add API to update cluster-level compaction configs #16803

Merged
merged 5 commits into from
Jul 27, 2024

Conversation

kfaraz
Copy link
Contributor

@kfaraz kfaraz commented Jul 25, 2024

Changes

  • Add API /druid/coordinator/v1/config/compaction/cluster to update cluster level compaction config
  • Add class CompactionConfigUpdateRequest
  • Fix bug in CoordinatorCompactionConfig which caused compaction engine to not be persisted.
    Use json field name engine instead of compactionEngine because JSON field names must align
    with the getter name.
  • Update MSQ validation error messages
  • Complete overhaul of CoordinatorCompactionConfigResourceTest to remove unnecessary mocking
    and add more meaningful tests.
  • Add TuningConfigBuilder to easily build tuning configs for tests.
  • Add DatasourceCompactionConfigBuilder

Release notes

(Some of the details mentioned below are in a follow up PR #16810)

Add API to update cluster-level compaction dynamic config

Path: /druid/coordinator/v1/config/compaction/cluster
Method: POST
Sample Payload:

{
     "compactionTaskSlotRatio": 0.5,
     "maxCompactionTaskSlots": 10,
     "engine": "msq"
}

This API deprecates the older API /druid/coordinator/v1/config/compaction/taskslots.



This PR has:

  • been self-reviewed.
  • added documentation for new or modified features or behaviors.
  • a release note entry in the PR description.
  • added Javadocs for most classes and all non-trivial methods. Linked related entities via Javadoc links.
  • added or updated version, license, or notice information in licenses.yaml
  • added comments explaining the "why" and the intent of the code wherever would not be obvious for an unfamiliar reader.
  • added unit tests or modified existing tests to cover new code paths, ensuring the threshold for code coverage is met.
  • added integration tests.
  • been tested in a test Druid cluster.

Comment on lines +79 to +83
final DataSourceCompactionConfig config = DataSourceCompactionConfig
.builder()
.forDataSource("dataSource")
.withInputSegmentSizeBytes(500L)
.withMaxRowsPerSegment(30)

Check notice

Code scanning / CodeQL

Deprecated method or constructor invocation Note test

Invoking
Builder.withMaxRowsPerSegment
should be avoided because it has been deprecated.
Comment on lines +128 to +132
final DataSourceCompactionConfig config = DataSourceCompactionConfig
.builder()
.forDataSource("dataSource")
.withInputSegmentSizeBytes(500L)
.withMaxRowsPerSegment(10000)

Check notice

Code scanning / CodeQL

Deprecated method or constructor invocation Note test

Invoking
Builder.withMaxRowsPerSegment
should be avoided because it has been deprecated.
@github-actions github-actions bot added Area - Batch Ingestion Area - MSQ For multi stage queries - https://github.com/apache/druid/issues/12262 labels Jul 25, 2024

final List<DataSourceCompactionConfig> datasourceConfigs = newConfig.getCompactionConfigs();
if (CollectionUtils.isNullOrEmpty(datasourceConfigs)
|| current.getEngine() == newConfig.getEngine()) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it possible for the payload to contain the current engine but incompatible configs?
For example the current engine is MSQ, and the update payload is also MSQ but with Native-only configs.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, the update payload is only to update the cluster level configs. It does not contain any datasource level configs right now.

Copy link
Contributor

@gargvishesh gargvishesh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the PR @kfaraz. Overall changes LGTM. A few minor comments.


import java.util.Map;

public class DataSourceCompactionConfigBuilder
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: Any particular reason to separate out the builder in a file of its own instead of adding to the DataSourceCompactionConfig file itself?

Copy link
Contributor Author

@kfaraz kfaraz Jul 26, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not particularly, just didn't want to bloat up the DataSourceCompactionConfig class as it is currently a neat bean.
Since the builder is only 100 lines of code, I can put it in the same file too.
Let me know what you think.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think changes to both would typically be in tandem, so would prefer to keep in the same file.

private final CompactionEngine compactionEngine;

@JsonCreator
public CompactionConfigUpdateRequest(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would ClusterCompactionConfigUpdateRequest be a better name here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be. I think we should also rename CoordinatorCompactionConfig to ClusterCompactionConfig or GlobalCompactionConfig. What do you think?

Also, would it be okay to do the rename changes and the API deprecation in a follow up PR?
(I have some more follow-up changes lined up).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, particularly since it would no longer reside with the coordinator. Among the 2, I would prefer ClusterCompactionConfig

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cool, I guess we should rename the API to /cluster as well then.

UnaryOperator<CoordinatorCompactionConfig> operator =
current -> CoordinatorCompactionConfig.from(
current,
return updateClusterCompactionConfig(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we mark the above API as deprecated, now that there is also the /global endpoint?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Else engine should be supported here as well.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, fair point. Let me mark this API as deprecated.

Copy link
Contributor

@gargvishesh gargvishesh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Approving as I'm okay with the discussed modifications being taken up in the follow-up PR as well.


import java.util.Map;

public class DataSourceCompactionConfigBuilder
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think changes to both would typically be in tandem, so would prefer to keep in the same file.

private final CompactionEngine compactionEngine;

@JsonCreator
public CompactionConfigUpdateRequest(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, particularly since it would no longer reside with the coordinator. Among the 2, I would prefer ClusterCompactionConfig

Copy link
Contributor

@AmatyaAvadhanula AmatyaAvadhanula left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the clarification, @kfaraz. LGTM

@kfaraz kfaraz merged commit caedeb6 into apache:master Jul 27, 2024
88 checks passed
@kfaraz
Copy link
Contributor Author

kfaraz commented Jul 27, 2024

Thanks for the reviews, @gargvishesh , @AmatyaAvadhanula !

@kfaraz kfaraz deleted the compaction_config_api branch July 27, 2024 03:45
@kfaraz kfaraz changed the title Add API to update compaction engine Add API to update cluster-level compaction configs Jul 29, 2024
sreemanamala pushed a commit to sreemanamala/druid that referenced this pull request Aug 6, 2024
Changes:
- Add API `/druid/coordinator/v1/config/compaction/global` to update cluster level compaction config
- Add class `CompactionConfigUpdateRequest`
- Fix bug in `CoordinatorCompactionConfig` which caused compaction engine to not be persisted.
Use json field name `engine` instead of `compactionEngine` because JSON field names must align
with the getter name.
- Update MSQ validation error messages
- Complete overhaul of `CoordinatorCompactionConfigResourceTest` to remove unnecessary mocking
and add more meaningful tests.
- Add `TuningConfigBuilder` to easily build tuning configs for tests.
- Add `DatasourceCompactionConfigBuilder`
@kfaraz kfaraz added this to the 31.0.0 milestone Oct 4, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Area - Batch Ingestion Area - Ingestion Area - MSQ For multi stage queries - https://github.com/apache/druid/issues/12262
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants