Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Enable smartSegmentLoading on the Coordinator #13197

Merged
merged 109 commits into from
Jun 19, 2023

Conversation

kfaraz
Copy link
Contributor

@kfaraz kfaraz commented Oct 10, 2022

Fixes #12881

Description

This PR lays the ground work to allow load queue to safely have an unlimited number of items, and thus
eventually phase out maxSegmentsInNodeLoadingQueue and replicationThrottleLimit.

Load queue is already allowed to have unlimited items (by setting maxSegmentsInNodeLoadingQueue = 0)
but this leads to

  • each coordinator run taking a very long time
  • poor assignments which take several more runs to be rectified

Changes

Classes to review

  • StrategicSegmentAssigner
  • LoadRule, BroadcastDistributionRule, DropRule, Rule.SegmentActionHandler
  • SegmentLoadQueueManager
  • LoadQueuePeon: http and curator
  • SegmentHolder
  • ServerHolder

Behavioral changes

Change Motivation
Both loaded and loading items count towards replication. Allow coordinator to take corrective action of removing superfluous replicas without waiting for them to be fully loaded.
Load, drop or move operations can be cancelled. - Allow move of loading items from queue of one server to another.
- Allow coordinator to take corrective actions quickly.
During tier shift, always maintain the currently configured level of replication, no matter which tier it happens to be on. - Queue drop of unneeded segments as soon as possible, thus allowing faster decommission of servers and freeing up disk space to load new segments.
- Always maintain target level of replication, thus ensuring that segment read concurrency does not suffer
LoadQueuePeon can distinguish and prioritize segment actions as DROP > LOAD > REPLICATE > MOVE (i.e. balancing). - Allow prioritization of items, which becomes important if load queue size is unlimited.
- Avoid considering balancing items in load queue as over-replicated.
replicationThrottleLimit does not act on a tier if the segment is not loaded on that tier at all Throttling first replica on a tier undermines the purpose of tiering. Tiering is not meant for fault tolerance, rather serving different query needs. Thus segments should be available on target tiers as soon as possible.
maxNonPrimaryReplicantsToLoad does not act on first replica in any tier This was done keeping in line with the changes to replicationThrottleLimit
num items in load queue at start of run + num items assigned to load queue during run must be <=maxSegmentsInNodeLoadingQueue Currently, if the configured load queue size is large enough to allow load of some segments while a coordinator run is in progress, the load queue limit is violated as there is always some room in the queue. This causes coordinator runs to get stuck cycling through all the segments in spite of a limited load queue.

Structural changes

Change Motivation
Add StrategicSegmentAssigner which handles all segment assignments. The lifecycle of the assigner is tied to a single coordinator run. - Allow reuse of logic for loading, balancing and broadcasting.
- Single place to maintain state of a single run thus allowing better metrics and logging.
Load rules just specify their desired state and leave the actual decision making to the StrategicSegmentAssigner. Simpler logic for load rules.
Add SegmentLoadQueueManager that interacts with the load queues. - Single place to interact with load queue
- Allow reporting of metrics from queue callbacks.
- Prevent callbacks from holding references to items from the previous coordinator run.

New metrics

  • segment/loadQueue/assigned
  • segment/loadQueue/success
  • segment/loadQueue/cancelled
  • segment/loadQueue/failed
  • Added dimension "datasource" to most segment level metrics

Release notes

The Druid coordinator has been completely revamped to make it much more stable and user-friendly. This is accompanied by several bug fixes, logging and metric improvements and a whole new range of capabilities.

Features:

  • Coordinator now supports a smartSegmentLoading mode, which is enabled by default. When enabled, users need not specify any of the following dynamic configs as they would be ignored by the coordinator. Instead, the coordinator computes the optimal values of these configs at run time to best utilize coordinator runs.
    • maxSegmentsInNodeLoadingQueue
    • maxSegmentsToMove
    • replicationThrottleLimit
    • useRoundRobinSegmentAssignment
    • useBatchedSegmentSampler
    • emitBalancingStats
      These configs are now deprecated and will be removed in subsequent releases.
  • Coordinator is now capable of prioritization and cancellation of items in segment load queues. Simply put, this means that the coordinator now reacts faster to changes in the cluster and makes better segment assignment decisions.

Monitoring:


This PR has:

  • been self-reviewed.
  • added documentation for new or modified features or behaviors.
  • added Javadocs for most classes and all non-trivial methods. Linked related entities via Javadoc links.
  • added or updated version, license, or notice information in licenses.yaml
  • added comments explaining the "why" and the intent of the code wherever would not be obvious for an unfamiliar reader.
  • added unit tests or modified existing tests to cover new code paths, ensuring the threshold for code coverage is met.
  • added integration tests.
  • been tested in a test Druid cluster.

Copy link
Contributor Author

@kfaraz kfaraz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added change summary to important classes.
The tests have been updated to use new flow but the same behaviour is still being verifiied.

@@ -63,7 +56,7 @@
* of the same or different methods.
*/
@Deprecated
public class CuratorLoadQueuePeon extends LoadQueuePeon
public class CuratorLoadQueuePeon implements LoadQueuePeon
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Change summary:
Commoned out class SegmentHolder as QueuedSegment to represent an item in a load queue and to be used by both HttpLoadQueuePeon and CuratorLoadQueuePeon.
Implemented new methods in LoadQueuePeon.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the change is trying to simplify the management of the queues. In which case, I think the abstraction you want is a LoadQueue that is given segment load/drop requests and then can be read from "in order". That could then be used by either Peon to do what it needs to that.

Additionally though, the CuratorLoadQueuePeon is effectively broken at this point anyway because it puts all of the znodes on ZK as quickly as possible and it's probably too expensive to really fix (we should just use http and ignore zk for this), so I don't see a reason to fix it. So, another approach is to consider the ZK based stuff dead and only improve on the http stuff.

We should likely queue up the death of the zk-based stuff too.

*/
private static final Comparator<Pair<Double, ServerHolder>> CHEAPEST_SERVERS_FIRST
= Comparator.<Pair<Double, ServerHolder>, Double>comparing(pair -> pair.lhs)
.thenComparing(pair -> ThreadLocalRandom.current().nextInt());
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would it help to use ServerHolder.getSizeUsed instead of a random integer for the second comparison?

Copy link
Contributor Author

@kfaraz kfaraz Oct 12, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, that would work too (but it would make more sense to use free size or free percentage rather than size used).

This PR does not make any modifications to strategies, so it has not been included here. The only modification done here is reduction in the number of calls to strategy.

Copy link
Contributor

@imply-cheddar imply-cheddar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not done, but need to do something else and don't want the comments to be stuck in draft, so submitting for now to make the comments visible.

@@ -63,7 +56,7 @@
* of the same or different methods.
*/
@Deprecated
public class CuratorLoadQueuePeon extends LoadQueuePeon
public class CuratorLoadQueuePeon implements LoadQueuePeon
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the change is trying to simplify the management of the queues. In which case, I think the abstraction you want is a LoadQueue that is given segment load/drop requests and then can be read from "in order". That could then be used by either Peon to do what it needs to that.

Additionally though, the CuratorLoadQueuePeon is effectively broken at this point anyway because it puts all of the znodes on ZK as quickly as possible and it's probably too expensive to really fix (we should just use http and ignore zk for this), so I don't see a reason to fix it. So, another approach is to consider the ZK based stuff dead and only improve on the http stuff.

We should likely queue up the death of the zk-based stuff too.

&& stateManager.loadSegment(segment, server, true)) {
return true;
} else {
log.makeAlert("Failed to broadcast segment for [%s]", segment.getDataSource())
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Be careful with messages in this code. "Failed to broadcast segment" is ambiguous about whether there was an issue with the server actually downloading the segment (definitely not the case given the code here) versus an issue with the coordinator believing that it is safe to assign the segment (much more likely). We should be very explicit about what it is that is happening and, also, what we might expect the end user to do about it.

Copy link
Contributor Author

@kfaraz kfaraz Oct 13, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I am going through all the log/alert messages to make sure they capture the right information.

@kfaraz
Copy link
Contributor Author

kfaraz commented Oct 17, 2022

@imply-cheddar , thanks a lot for your review! I have incorporated your feedback.
More improvements to logging, metrics and docs will be included in a quick follow-up PR.

segmentsToLoad.put(segment, new LoadSegmentHolder(segment, callback));
queuedSize.addAndGet(segment.getSize());
holder = new SegmentHolder(segment, action, callback);
segmentsToLoad.put(segment, holder);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The old code separated segmentsToLoad and segmentsToDrop so that it could prioritize drops over loads. If I'm udnerstanding correctly, we are doing that prioriziation through the queuedSegments prioritization now, which makes me wonder if we need to keep the old segmentsToLoad and segmentsToDrop around anymore? Are those data structures still used for some meaningful purpose?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We might still need the segmentsToLoad and segmentsToDrop atleast for now because the balancer strategies use these to compute cost. This can be done with queuedSegments itself but we might have to filter out the relevant entries on every cost computation.

I do have a follow up PR which deals with the fixes in the strategy. I will try to clean up this part there.

}
}

// Drop as many replicas as possible from decommissioning servers
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why drop things from a decommissioning server? As long as the server is up, the dta is available it can be used. If you don't want the server to be used for anything, just kill -9 the process. If it's up and working, keep using it until it's kill -9d. If we are going to support decommissioning, it shouldn't be a "I need to remove things from this server" but rather "I'm going to pretend as if that server doesn't exist anymore".

That said, decommissioning for historicals is not a really good model. Instead, we need the ability to start up as a replica.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the suggestion. Will include this changes along with the changes for full node replication in a follow up PR.

Copy link
Contributor

@imply-cheddar imply-cheddar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall, I don't like the introduction of so many magical boolean parameters on methods. It makes it more difficult to understand what is happening.

Additionally, there is commentary in here, but I'm not sure any of them necessarily need to block this PR, it's all mostly just hygiene stuff that could be done later too (especially given that this PR is already so large and has existed for so long).

Given that we've run this in some legitimate production clusters as well as some performance environments and it's done what we expect, I'm approving this.

Comment on lines 446 to 514
}
if (useBatchedSegmentSampler != that.useBatchedSegmentSampler) {
return false;
}
if (replicantLifetime != that.replicantLifetime) {
return false;
}
if (replicationThrottleLimit != that.replicationThrottleLimit) {
return false;
}
if (balancerComputeThreads != that.balancerComputeThreads) {
return false;
}
if (emitBalancingStats != that.emitBalancingStats) {
return false;
}
if (maxSegmentsInNodeLoadingQueue != that.maxSegmentsInNodeLoadingQueue) {
return false;
}
if (!Objects.equals(specificDataSourcesToKillUnusedSegmentsIn, that.specificDataSourcesToKillUnusedSegmentsIn)) {
return false;
}
if (!Objects.equals(dataSourcesToNotKillStalePendingSegmentsIn, that.dataSourcesToNotKillStalePendingSegmentsIn)) {
return false;
}
if (!Objects.equals(decommissioningNodes, that.decommissioningNodes)) {
return false;
}
if (pauseCoordination != that.pauseCoordination) {
return false;
}
if (replicateAfterLoadTimeout != that.replicateAfterLoadTimeout) {
return false;
}
if (maxNonPrimaryReplicantsToLoad != that.maxNonPrimaryReplicantsToLoad) {
return false;
}
return decommissioningMaxPercentOfMaxSegmentsToMove == that.decommissioningMaxPercentOfMaxSegmentsToMove;
return markSegmentAsUnusedDelayMillis == that.markSegmentAsUnusedDelayMillis
&& mergeBytesLimit == that.mergeBytesLimit
&& mergeSegmentsLimit == that.mergeSegmentsLimit
&& maxSegmentsToMove == that.maxSegmentsToMove
&& percentOfSegmentsToConsiderPerMove == that.percentOfSegmentsToConsiderPerMove
&& decommissioningMaxPercentOfMaxSegmentsToMove == that.decommissioningMaxPercentOfMaxSegmentsToMove
&& useBatchedSegmentSampler == that.useBatchedSegmentSampler
&& balancerComputeThreads == that.balancerComputeThreads
&& emitBalancingStats == that.emitBalancingStats
&& replicantLifetime == that.replicantLifetime
&& replicationThrottleLimit == that.replicationThrottleLimit
&& replicateAfterLoadTimeout == that.replicateAfterLoadTimeout
&& maxSegmentsInNodeLoadingQueue == that.maxSegmentsInNodeLoadingQueue
&& maxNonPrimaryReplicantsToLoad == that.maxNonPrimaryReplicantsToLoad
&& useRoundRobinSegmentAssignment == that.useRoundRobinSegmentAssignment
&& pauseCoordination == that.pauseCoordination
&& Objects.equals(
specificDataSourcesToKillUnusedSegmentsIn,
that.specificDataSourcesToKillUnusedSegmentsIn)
&& Objects.equals(
dataSourcesToNotKillStalePendingSegmentsIn,
that.dataSourcesToNotKillStalePendingSegmentsIn)
&& Objects.equals(decommissioningNodes, that.decommissioningNodes);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This (and hashcode) appear to be ignoring the various debugDimensions things, is that intentional?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for catching this, must have missed adding it.

Copy link
Contributor Author

@kfaraz kfaraz Jun 16, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was even thinking of getting rid of the equals and hashCode in this class. Don't see them serving any purpose (unless the JsonConfigProvider does something with it, need to double check). Even the tests do an item-by-item comparison. But will do it later.

Comment on lines 872 to 897
private void cancelLoadsOnDecommissioningServers(DruidCluster cluster)
{
final AtomicInteger cancelledCount = new AtomicInteger(0);
final List<ServerHolder> decommissioningServers
= cluster.getAllServers().stream()
.filter(ServerHolder::isDecommissioning)
.collect(Collectors.toList());

for (ServerHolder server : decommissioningServers) {
server.getQueuedSegments().forEach(
(segment, action) -> {
// Cancel the operation if it is a type of load
if (action.isLoad() && server.cancelOperation(action, segment)) {
cancelledCount.incrementAndGet();
}
}
);
}

if (cancelledCount.get() > 0) {
log.info(
"Cancelled [%d] load/move operations on [%d] decommissioning servers.",
cancelledCount.get(), decommissioningServers.size()
);
}
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would've expected that the act of decommissioning would put the queue into a state where it will only accept DROP requests and then dropped the load queue immediately. It seems weird to me that this is logic handled in this class instead. The only thing I'd expect a CoordinatorDuty to do about it is to look for decomissioning servers and move the segments away.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cancelling all loads on decomissioning servers is faster than moving the segments away.

Balancing moves are subject to the cost computation performance in BalancerStrategy and thus are typically limited (maxSegmentsToMove <= 1000). With unlimited load queues, there can potentially be many more segments in the load queue of decommissioning servers.

With cancellation of loads on decommissioning servers, these segments would immediately be (round-robin) assigned to active servers in this run itself.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We are doing this cancellation in this duty (as opposed to RunRules or BalanceSegments) since the SegmentReplicantLookup is constructed right after this method and thus the Coordinator knows in this run itself that some segments are under-replicated and it needs to queue up some loads on active servers.

Comment on lines +368 to +372
CoordinatorDynamicConfig.builder()
.withSmartSegmentLoading(false)
.withMaxSegmentsToMove(1)
.withUseBatchedSegmentSampler(true)
.withPercentOfSegmentsToConsiderPerMove(40)

Check notice

Code scanning / CodeQL

Deprecated method or constructor invocation

Invoking [Builder.withPercentOfSegmentsToConsiderPerMove](1) should be avoided because it has been deprecated.
Comment on lines +368 to +371
CoordinatorDynamicConfig.builder()
.withSmartSegmentLoading(false)
.withMaxSegmentsToMove(1)
.withUseBatchedSegmentSampler(true)

Check notice

Code scanning / CodeQL

Deprecated method or constructor invocation

Invoking [Builder.withUseBatchedSegmentSampler](1) should be avoided because it has been deprecated.
Comment on lines +397 to +400
CoordinatorDynamicConfig.builder()
.withSmartSegmentLoading(false)
.withMaxSegmentsToMove(2)
.withUseBatchedSegmentSampler(true)

Check notice

Code scanning / CodeQL

Deprecated method or constructor invocation

Invoking [Builder.withUseBatchedSegmentSampler](1) should be avoided because it has been deprecated.
Comment on lines +334 to +335
ServerTestHelper.MAPPER
.writerWithType(HttpLoadQueuePeon.RESPONSE_ENTITY_TYPE_REF)

Check notice

Code scanning / CodeQL

Deprecated method or constructor invocation

Invoking [ObjectMapper.writerWithType](1) should be avoided because it has been deprecated.
Comment on lines 355 to 369
if (action.isLoad()) {
projectedSegments.add(segment);
} else {
queuedSegments.remove(segment);
projectedSegments.remove(segment);
}

final long sizeDelta = addToQueue ? segment.getSize() : -segment.getSize();
if (action.isLoad()) {
sizeOfLoadingSegments += sizeDelta;
sizeOfLoadingSegments += segment.getSize();
} else if (action == SegmentAction.DROP) {
sizeOfDroppingSegments += sizeDelta;
sizeOfDroppingSegments += segment.getSize();
} else {
// MOVE_FROM actions graduate to DROP after the corresponding MOVE_TO has finished
// Do not consider size delta until then, otherwise we might over-assign the server
}
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

After the change to remove teh addToQueue boolean, you now have the exact same if clause at the beginning of 2 subsequence if statements. You should be able to make it

if (action.isLoad()) {
  projectedSegments.add(segment);
  sizeOfLoadingSegments += segment.getSize();
} else {
  projectedSegments.remove(segment);
  // The current else block could be from a DROP or a MOVE_FROM.  The MOVE_FROM will eventually graduate to
  // DROP after ...
  if (action == SegmentAction.DROP) {
    sizeOfDroppingSegments += segment.getSize()
  }
}

Comment on lines 375 to 386
if (action.isLoad()) {
projectedSegments.remove(segment);
} else {
projectedSegments.add(segment);
}

return true;
if (action.isLoad()) {
sizeOfLoadingSegments -= segment.getSize();
} else if (action == SegmentAction.DROP) {
sizeOfDroppingSegments -= segment.getSize();
}
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same comments here, basically have the same if statement twice.

* @return Iterator over {@link BalancerSegmentHolder}s, each of which contains
* a segment picked for moving and the server currently loading it.
*/
public static List<BalancerSegmentHolder> pickMovableLoadingSegmentsFrom(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: the different in name is soooo minute it took me a long time to realize. One is ing and the other is ed.

Is there a reason not to expose the 4-argument method as public and have the call sites pass in one of the two lambdas? It looks like there's only a single call-site for each of them except for tests.

That or maybe try to make the names a bit more different from each other.

private final boolean useRoundRobinSegmentAssignment;
private final boolean emitBalancingStats;

public SegmentLoadingConfig(CoordinatorDynamicConfig dynamicConfig, int numUsedSegments)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

design nit: a constructor should generally define the "dependency relationship" of a class. I.e. the things passed in on the constructor are the things that the current class is dependent upon. This constructor is doing a bunch of work, that work is dependent on the CoordinatorDynamicConfig object, but SegmentLoadingConfig is not dependent on the object.

In this case, it would be preferable for this to be a static method and the constructor just take all of the various values.

Copy link
Contributor

@cheddar cheddar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

making my gray check green.

{
this.replicaCountsInTier = ImmutableMap.copyOf(replicaCountsInTier);

final Map<SegmentId, SegmentReplicaCount> totalReplicaCounts = new HashMap<>();

Check notice

Code scanning / CodeQL

Possible confusion of local and field

Potentially confusing name: [SegmentReplicationStatus](1) also refers to field [totalReplicaCounts](2) (as this.totalReplicaCounts).
return new SegmentLoadingConfig(
dynamicConfig.getMaxSegmentsInNodeLoadingQueue(),
dynamicConfig.getReplicationThrottleLimit(),
dynamicConfig.getMaxNonPrimaryReplicantsToLoad(),

Check notice

Code scanning / CodeQL

Deprecated method or constructor invocation

Invoking [CoordinatorDynamicConfig.getMaxNonPrimaryReplicantsToLoad](1) should be avoided because it has been deprecated.
dynamicConfig.getMaxNonPrimaryReplicantsToLoad(),
dynamicConfig.getReplicantLifetime(),
dynamicConfig.getMaxSegmentsToMove(),
dynamicConfig.getDecommissioningMaxPercentOfMaxSegmentsToMove(),

Check notice

Code scanning / CodeQL

Deprecated method or constructor invocation

Invoking [CoordinatorDynamicConfig.getDecommissioningMaxPercentOfMaxSegmentsToMove](1) should be avoided because it has been deprecated.
@kfaraz kfaraz merged commit 50461c3 into apache:master Jun 19, 2023
kfaraz added a commit that referenced this pull request Jul 6, 2023
After #13197 , several coordinator configs are now redundant as they are not being
used anymore, neither with `smartSegmentLoading` nor otherwise.

Changes:
- Remove dynamic configs `emitBalancingStats`: balancer error stats are always
emitted, debug stats can be logged by using `debugDimensions`
- `useBatchedSegmentSampler`, `percentOfSegmentsToConsiderPerMove`:
batched segment sampling is always used
- Add test to verify deserialization with unknown properties
- Update `CoordinatorRunStats` to always track stats, this can be optimized later.
kfaraz added a commit to kfaraz/druid that referenced this pull request Jul 18, 2023
@abhishekagarwal87 abhishekagarwal87 added this to the 27.0 milestone Jul 19, 2023
sergioferragut pushed a commit to sergioferragut/druid that referenced this pull request Jul 21, 2023
After apache#13197 , several coordinator configs are now redundant as they are not being
used anymore, neither with `smartSegmentLoading` nor otherwise.

Changes:
- Remove dynamic configs `emitBalancingStats`: balancer error stats are always
emitted, debug stats can be logged by using `debugDimensions`
- `useBatchedSegmentSampler`, `percentOfSegmentsToConsiderPerMove`:
batched segment sampling is always used
- Add test to verify deserialization with unknown properties
- Update `CoordinatorRunStats` to always track stats, this can be optimized later.
sergioferragut pushed a commit to sergioferragut/druid that referenced this pull request Jul 21, 2023
@abhishekagarwal87 abhishekagarwal87 changed the title Segment loading: Allow cancellation and prioritization of load queue items Enable smartSegmentLoading on the Coordinator Jul 31, 2023
@kfaraz kfaraz deleted the improve_load_queue branch August 1, 2023 04:19
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Segment loading and balancing improvements
5 participants