Enable smartSegmentLoading on the Coordinator #13197

kfaraz · 2022-10-10T02:48:38Z

Description

This PR lays the ground work to allow load queue to safely have an unlimited number of items, and thus
eventually phase out maxSegmentsInNodeLoadingQueue and replicationThrottleLimit.

Load queue is already allowed to have unlimited items (by setting maxSegmentsInNodeLoadingQueue = 0)
but this leads to

each coordinator run taking a very long time
poor assignments which take several more runs to be rectified

Changes

Classes to review

StrategicSegmentAssigner
LoadRule, BroadcastDistributionRule, DropRule, Rule.SegmentActionHandler
SegmentLoadQueueManager
LoadQueuePeon: http and curator
SegmentHolder
ServerHolder

Behavioral changes

Change	Motivation
Both loaded and loading items count towards replication.	Allow coordinator to take corrective action of removing superfluous replicas without waiting for them to be fully loaded.
Load, drop or move operations can be cancelled.	- Allow move of loading items from queue of one server to another. - Allow coordinator to take corrective actions quickly.
During tier shift, always maintain the currently configured level of replication, no matter which tier it happens to be on.	- Queue drop of unneeded segments as soon as possible, thus allowing faster decommission of servers and freeing up disk space to load new segments. - Always maintain target level of replication, thus ensuring that segment read concurrency does not suffer
LoadQueuePeon can distinguish and prioritize segment actions as DROP > LOAD > REPLICATE > MOVE (i.e. balancing).	- Allow prioritization of items, which becomes important if load queue size is unlimited. - Avoid considering balancing items in load queue as over-replicated.
`replicationThrottleLimit` does not act on a tier if the segment is not loaded on that tier at all	Throttling first replica on a tier undermines the purpose of tiering. Tiering is not meant for fault tolerance, rather serving different query needs. Thus segments should be available on target tiers as soon as possible.
`maxNonPrimaryReplicantsToLoad` does not act on first replica in any tier	This was done keeping in line with the changes to `replicationThrottleLimit`
num items in load queue at start of run + num items assigned to load queue during run must be <=`maxSegmentsInNodeLoadingQueue`	Currently, if the configured load queue size is large enough to allow load of some segments while a coordinator run is in progress, the load queue limit is violated as there is always some room in the queue. This causes coordinator runs to get stuck cycling through all the segments in spite of a limited load queue.

Structural changes

Change	Motivation
Add `StrategicSegmentAssigner` which handles all segment assignments. The lifecycle of the assigner is tied to a single coordinator run.	- Allow reuse of logic for loading, balancing and broadcasting. - Single place to maintain state of a single run thus allowing better metrics and logging.
Load rules just specify their desired state and leave the actual decision making to the `StrategicSegmentAssigner`.	Simpler logic for load rules.
Add `SegmentLoadQueueManager` that interacts with the load queues.	- Single place to interact with load queue - Allow reporting of metrics from queue callbacks. - Prevent callbacks from holding references to items from the previous coordinator run.

New metrics

segment/loadQueue/assigned
segment/loadQueue/success
segment/loadQueue/cancelled
segment/loadQueue/failed
Added dimension "datasource" to most segment level metrics

Release notes

The Druid coordinator has been completely revamped to make it much more stable and user-friendly. This is accompanied by several bug fixes, logging and metric improvements and a whole new range of capabilities.

Features:

Coordinator now supports a smartSegmentLoading mode, which is enabled by default. When enabled, users need not specify any of the following dynamic configs as they would be ignored by the coordinator. Instead, the coordinator computes the optimal values of these configs at run time to best utilize coordinator runs.
- maxSegmentsInNodeLoadingQueue
- maxSegmentsToMove
- replicationThrottleLimit
- useRoundRobinSegmentAssignment
- useBatchedSegmentSampler
- emitBalancingStats
  These configs are now deprecated and will be removed in subsequent releases.
Coordinator is now capable of prioritization and cancellation of items in segment load queues. Simply put, this means that the coordinator now reacts faster to changes in the cluster and makes better segment assignment decisions.

Monitoring:

This PR has:

been self-reviewed.
- using the concurrency checklist (Remove this item if the PR doesn't have any relation to concurrency.)
added documentation for new or modified features or behaviors.
added Javadocs for most classes and all non-trivial methods. Linked related entities via Javadoc links.
added or updated version, license, or notice information in licenses.yaml
added comments explaining the "why" and the intent of the code wherever would not be obvious for an unfamiliar reader.
added unit tests or modified existing tests to cover new code paths, ensuring the threshold for code coverage is met.
added integration tests.
been tested in a test Druid cluster.

kfaraz

Added change summary to important classes.
The tests have been updated to use new flow but the same behaviour is still being verifiied.

kfaraz · 2022-10-10T14:15:17Z

server/src/main/java/org/apache/druid/server/coordinator/CuratorLoadQueuePeon.java

@@ -63,7 +56,7 @@
 * of the same or different methods.
 */
 @Deprecated
-public class CuratorLoadQueuePeon extends LoadQueuePeon
+public class CuratorLoadQueuePeon implements LoadQueuePeon


Change summary:
Commoned out class SegmentHolder as QueuedSegment to represent an item in a load queue and to be used by both HttpLoadQueuePeon and CuratorLoadQueuePeon.
Implemented new methods in LoadQueuePeon.

I think the change is trying to simplify the management of the queues. In which case, I think the abstraction you want is a LoadQueue that is given segment load/drop requests and then can be read from "in order". That could then be used by either Peon to do what it needs to that.

Additionally though, the CuratorLoadQueuePeon is effectively broken at this point anyway because it puts all of the znodes on ZK as quickly as possible and it's probably too expensive to really fix (we should just use http and ignore zk for this), so I don't see a reason to fix it. So, another approach is to consider the ZK based stuff dead and only improve on the http stuff.

We should likely queue up the death of the zk-based stuff too.

server/src/main/java/org/apache/druid/server/coordinator/HttpLoadQueuePeon.java

server/src/main/java/org/apache/druid/server/coordinator/SegmentAction.java

AmatyaAvadhanula · 2022-10-12T13:22:39Z

server/src/main/java/org/apache/druid/server/coordinator/CostBalancerStrategy.java

+   */
+  private static final Comparator<Pair<Double, ServerHolder>> CHEAPEST_SERVERS_FIRST
+      = Comparator.<Pair<Double, ServerHolder>, Double>comparing(pair -> pair.lhs)
+      .thenComparing(pair -> ThreadLocalRandom.current().nextInt());


Would it help to use ServerHolder.getSizeUsed instead of a random integer for the second comparison?

Sure, that would work too (but it would make more sense to use free size or free percentage rather than size used).

This PR does not make any modifications to strategies, so it has not been included here. The only modification done here is reduction in the number of calls to strategy.

server/src/main/java/org/apache/druid/server/coordinator/TierLoadingState.java

imply-cheddar

I'm not done, but need to do something else and don't want the comments to be stuck in draft, so submitting for now to make the comments visible.

server/src/main/java/org/apache/druid/server/coordinator/BalancerStrategy.java

server/src/main/java/org/apache/druid/server/coordinator/CuratorLoadQueuePeon.java

imply-cheddar · 2022-10-13T01:03:12Z

server/src/main/java/org/apache/druid/server/coordinator/CuratorLoadQueuePeon.java

@@ -63,7 +56,7 @@
 * of the same or different methods.
 */
 @Deprecated
-public class CuratorLoadQueuePeon extends LoadQueuePeon
+public class CuratorLoadQueuePeon implements LoadQueuePeon


I think the change is trying to simplify the management of the queues. In which case, I think the abstraction you want is a LoadQueue that is given segment load/drop requests and then can be read from "in order". That could then be used by either Peon to do what it needs to that.

Additionally though, the CuratorLoadQueuePeon is effectively broken at this point anyway because it puts all of the znodes on ZK as quickly as possible and it's probably too expensive to really fix (we should just use http and ignore zk for this), so I don't see a reason to fix it. So, another approach is to consider the ZK based stuff dead and only improve on the http stuff.

We should likely queue up the death of the zk-based stuff too.

server/src/main/java/org/apache/druid/server/coordinator/DruidCluster.java

server/src/main/java/org/apache/druid/server/coordinator/DruidCoordinator.java

server/src/main/java/org/apache/druid/server/coordinator/HttpLoadQueuePeon.java

server/src/main/java/org/apache/druid/server/coordinator/RandomBalancerStrategy.java

server/src/main/java/org/apache/druid/server/coordinator/SegmentLoader.java

imply-cheddar · 2022-10-13T02:18:38Z

server/src/main/java/org/apache/druid/server/coordinator/SegmentLoader.java

+        && stateManager.loadSegment(segment, server, true)) {
+      return true;
+    } else {
+      log.makeAlert("Failed to broadcast segment for [%s]", segment.getDataSource())


Be careful with messages in this code. "Failed to broadcast segment" is ambiguous about whether there was an issue with the server actually downloading the segment (definitely not the case given the code here) versus an issue with the coordinator believing that it is safe to assign the segment (much more likely). We should be very explicit about what it is that is happening and, also, what we might expect the end user to do about it.

Yes, I am going through all the log/alert messages to make sure they capture the right information.

server/src/main/java/org/apache/druid/server/coordinator/TierLoadingState.java

kfaraz · 2022-10-17T02:03:17Z

@imply-cheddar , thanks a lot for your review! I have incorporated your feedback.
More improvements to logging, metrics and docs will be included in a quick follow-up PR.

server/src/main/java/org/apache/druid/server/coordinator/SegmentStateManager.java

imply-cheddar · 2022-11-01T00:21:38Z

server/src/main/java/org/apache/druid/server/coordinator/HttpLoadQueuePeon.java

-        segmentsToLoad.put(segment, new LoadSegmentHolder(segment, callback));
+        queuedSize.addAndGet(segment.getSize());
+        holder = new SegmentHolder(segment, action, callback);
+        segmentsToLoad.put(segment, holder);


The old code separated segmentsToLoad and segmentsToDrop so that it could prioritize drops over loads. If I'm udnerstanding correctly, we are doing that prioriziation through the queuedSegments prioritization now, which makes me wonder if we need to keep the old segmentsToLoad and segmentsToDrop around anymore? Are those data structures still used for some meaningful purpose?

We might still need the segmentsToLoad and segmentsToDrop atleast for now because the balancer strategies use these to compute cost. This can be done with queuedSegments itself but we might have to filter out the relevant entries on every cost computation.

I do have a follow up PR which deals with the fixes in the strategy. I will try to clean up this part there.

server/src/main/java/org/apache/druid/server/coordinator/SegmentAction.java

server/src/main/java/org/apache/druid/server/coordinator/SegmentHolder.java

server/src/main/java/org/apache/druid/server/coordinator/SegmentLoader.java

imply-cheddar · 2022-11-01T00:41:01Z

server/src/main/java/org/apache/druid/server/coordinator/SegmentLoader.java

+      }
+    }
+
+    // Drop as many replicas as possible from decommissioning servers


Why drop things from a decommissioning server? As long as the server is up, the dta is available it can be used. If you don't want the server to be used for anything, just kill -9 the process. If it's up and working, keep using it until it's kill -9d. If we are going to support decommissioning, it shouldn't be a "I need to remove things from this server" but rather "I'm going to pretend as if that server doesn't exist anymore".

That said, decommissioning for historicals is not a really good model. Instead, we need the ability to start up as a replica.

Thanks for the suggestion. Will include this changes along with the changes for full node replication in a follow up PR.

server/src/main/java/org/apache/druid/server/coordinator/SegmentReplicantLookup.java

server/src/test/java/org/apache/druid/server/coordinator/BalancerStrategyTest.java

server/src/test/java/org/apache/druid/server/coordinator/RunRulesTest.java

…SegmentLoading

server/src/main/java/org/apache/druid/server/coordinator/DruidCoordinator.java

imply-cheddar

Overall, I don't like the introduction of so many magical boolean parameters on methods. It makes it more difficult to understand what is happening.

Additionally, there is commentary in here, but I'm not sure any of them necessarily need to block this PR, it's all mostly just hygiene stuff that could be done later too (especially given that this PR is already so large and has existed for so long).

Given that we've run this in some legitimate production clusters as well as some performance environments and it's done what we expect, I'm approving this.

server/src/main/java/org/apache/druid/server/coordinator/balancer/BalancerStrategy.java

server/src/main/java/org/apache/druid/server/coordinator/CoordinatorDynamicConfig.java

imply-cheddar · 2023-06-16T02:37:35Z

server/src/main/java/org/apache/druid/server/coordinator/CoordinatorDynamicConfig.java

-    }
-    if (useBatchedSegmentSampler != that.useBatchedSegmentSampler) {
-      return false;
-    }
-    if (replicantLifetime != that.replicantLifetime) {
-      return false;
-    }
-    if (replicationThrottleLimit != that.replicationThrottleLimit) {
-      return false;
-    }
-    if (balancerComputeThreads != that.balancerComputeThreads) {
-      return false;
-    }
-    if (emitBalancingStats != that.emitBalancingStats) {
-      return false;
-    }
-    if (maxSegmentsInNodeLoadingQueue != that.maxSegmentsInNodeLoadingQueue) {
-      return false;
-    }
-    if (!Objects.equals(specificDataSourcesToKillUnusedSegmentsIn, that.specificDataSourcesToKillUnusedSegmentsIn)) {
-      return false;
-    }
-    if (!Objects.equals(dataSourcesToNotKillStalePendingSegmentsIn, that.dataSourcesToNotKillStalePendingSegmentsIn)) {
-      return false;
-    }
-    if (!Objects.equals(decommissioningNodes, that.decommissioningNodes)) {
-      return false;
-    }
-    if (pauseCoordination != that.pauseCoordination) {
-      return false;
-    }
-    if (replicateAfterLoadTimeout != that.replicateAfterLoadTimeout) {
-      return false;
-    }
-    if (maxNonPrimaryReplicantsToLoad != that.maxNonPrimaryReplicantsToLoad) {
-      return false;
-    }
-    return decommissioningMaxPercentOfMaxSegmentsToMove == that.decommissioningMaxPercentOfMaxSegmentsToMove;
+    return markSegmentAsUnusedDelayMillis == that.markSegmentAsUnusedDelayMillis
+           && mergeBytesLimit == that.mergeBytesLimit
+           && mergeSegmentsLimit == that.mergeSegmentsLimit
+           && maxSegmentsToMove == that.maxSegmentsToMove
+           && percentOfSegmentsToConsiderPerMove == that.percentOfSegmentsToConsiderPerMove
+           && decommissioningMaxPercentOfMaxSegmentsToMove == that.decommissioningMaxPercentOfMaxSegmentsToMove
+           && useBatchedSegmentSampler == that.useBatchedSegmentSampler
+           && balancerComputeThreads == that.balancerComputeThreads
+           && emitBalancingStats == that.emitBalancingStats
+           && replicantLifetime == that.replicantLifetime
+           && replicationThrottleLimit == that.replicationThrottleLimit
+           && replicateAfterLoadTimeout == that.replicateAfterLoadTimeout
+           && maxSegmentsInNodeLoadingQueue == that.maxSegmentsInNodeLoadingQueue
+           && maxNonPrimaryReplicantsToLoad == that.maxNonPrimaryReplicantsToLoad
+           && useRoundRobinSegmentAssignment == that.useRoundRobinSegmentAssignment
+           && pauseCoordination == that.pauseCoordination
+           && Objects.equals(
+               specificDataSourcesToKillUnusedSegmentsIn,
+               that.specificDataSourcesToKillUnusedSegmentsIn)
+           && Objects.equals(
+               dataSourcesToNotKillStalePendingSegmentsIn,
+               that.dataSourcesToNotKillStalePendingSegmentsIn)
+           && Objects.equals(decommissioningNodes, that.decommissioningNodes);


This (and hashcode) appear to be ignoring the various debugDimensions things, is that intentional?

Thanks for catching this, must have missed adding it.

I was even thinking of getting rid of the equals and hashCode in this class. Don't see them serving any purpose (unless the JsonConfigProvider does something with it, need to double check). Even the tests do an item-by-item comparison. But will do it later.

server/src/main/java/org/apache/druid/server/coordinator/SegmentReplicantLookup.java

imply-cheddar · 2023-06-16T03:07:59Z

server/src/main/java/org/apache/druid/server/coordinator/DruidCoordinator.java

+    private void cancelLoadsOnDecommissioningServers(DruidCluster cluster)
+    {
+      final AtomicInteger cancelledCount = new AtomicInteger(0);
+      final List<ServerHolder> decommissioningServers
+          = cluster.getAllServers().stream()
+                   .filter(ServerHolder::isDecommissioning)
+                   .collect(Collectors.toList());
+
+      for (ServerHolder server : decommissioningServers) {
+        server.getQueuedSegments().forEach(
+            (segment, action) -> {
+              // Cancel the operation if it is a type of load
+              if (action.isLoad() && server.cancelOperation(action, segment)) {
+                cancelledCount.incrementAndGet();
+              }
+            }
+        );
+      }
+
+      if (cancelledCount.get() > 0) {
+        log.info(
+            "Cancelled [%d] load/move operations on [%d] decommissioning servers.",
+            cancelledCount.get(), decommissioningServers.size()
+        );
+      }
    }


I would've expected that the act of decommissioning would put the queue into a state where it will only accept DROP requests and then dropped the load queue immediately. It seems weird to me that this is logic handled in this class instead. The only thing I'd expect a CoordinatorDuty to do about it is to look for decomissioning servers and move the segments away.

Cancelling all loads on decomissioning servers is faster than moving the segments away.

Balancing moves are subject to the cost computation performance in BalancerStrategy and thus are typically limited (maxSegmentsToMove <= 1000). With unlimited load queues, there can potentially be many more segments in the load queue of decommissioning servers.

With cancellation of loads on decommissioning servers, these segments would immediately be (round-robin) assigned to active servers in this run itself.

We are doing this cancellation in this duty (as opposed to RunRules or BalanceSegments) since the SegmentReplicantLookup is constructed right after this method and thus the Coordinator knows in this run itself that some segments are under-replicated and it needs to queue up some loads on active servers.

server/src/main/java/org/apache/druid/server/coordinator/ServerHolder.java

server/src/main/java/org/apache/druid/server/coordinator/StrategicSegmentAssigner.java

server/src/main/java/org/apache/druid/server/coordinator/loadqueue/HttpLoadQueuePeon.java

server/src/main/java/org/apache/druid/server/coordinator/loading/SegmentLoadingConfig.java

server/src/test/java/org/apache/druid/server/coordinator/duty/BalanceSegmentsTest.java

+                CoordinatorDynamicConfig.builder()
+                                        .withSmartSegmentLoading(false)
+                                        .withMaxSegmentsToMove(1)
+                                        .withUseBatchedSegmentSampler(true)
+                                        .withPercentOfSegmentsToConsiderPerMove(40)


server/src/test/java/org/apache/druid/server/coordinator/duty/BalanceSegmentsTest.java

+                CoordinatorDynamicConfig.builder()
+                                        .withSmartSegmentLoading(false)
+                                        .withMaxSegmentsToMove(1)
+                                        .withUseBatchedSegmentSampler(true)


server/src/test/java/org/apache/druid/server/coordinator/duty/BalanceSegmentsTest.java

+            CoordinatorDynamicConfig.builder()
+                                    .withSmartSegmentLoading(false)
+                                    .withMaxSegmentsToMove(2)
+                                    .withUseBatchedSegmentSampler(true)


server/src/test/java/org/apache/druid/server/coordinator/loading/HttpLoadQueuePeonTest.java

+                ServerTestHelper.MAPPER
+                    .writerWithType(HttpLoadQueuePeon.RESPONSE_ENTITY_TYPE_REF)


server/src/main/java/org/apache/druid/server/coordinator/loading/HttpLoadQueuePeon.java

…mentReplicaCountMap

imply-cheddar · 2023-06-19T02:33:56Z

server/src/main/java/org/apache/druid/server/coordinator/ServerHolder.java

+    if (action.isLoad()) {
+      projectedSegments.add(segment);
    } else {
-      queuedSegments.remove(segment);
+      projectedSegments.remove(segment);
    }

-    final long sizeDelta = addToQueue ? segment.getSize() : -segment.getSize();
    if (action.isLoad()) {
-      sizeOfLoadingSegments += sizeDelta;
+      sizeOfLoadingSegments += segment.getSize();
    } else if (action == SegmentAction.DROP) {
-      sizeOfDroppingSegments += sizeDelta;
+      sizeOfDroppingSegments += segment.getSize();
+    } else {
+      // MOVE_FROM actions graduate to DROP after the corresponding MOVE_TO has finished
+      // Do not consider size delta until then, otherwise we might over-assign the server
    }
+  }


After the change to remove teh addToQueue boolean, you now have the exact same if clause at the beginning of 2 subsequence if statements. You should be able to make it

if (action.isLoad()) { projectedSegments.add(segment); sizeOfLoadingSegments += segment.getSize(); } else { projectedSegments.remove(segment); // The current else block could be from a DROP or a MOVE_FROM. The MOVE_FROM will eventually graduate to // DROP after ... if (action == SegmentAction.DROP) { sizeOfDroppingSegments += segment.getSize() } }

imply-cheddar · 2023-06-19T02:34:50Z

server/src/main/java/org/apache/druid/server/coordinator/ServerHolder.java

+    if (action.isLoad()) {
      projectedSegments.remove(segment);
    } else {
      projectedSegments.add(segment);
    }

-    return true;
+    if (action.isLoad()) {
+      sizeOfLoadingSegments -= segment.getSize();
+    } else if (action == SegmentAction.DROP) {
+      sizeOfDroppingSegments -= segment.getSize();
+    }
  }


Same comments here, basically have the same if statement twice.

imply-cheddar · 2023-06-19T02:43:49Z

server/src/main/java/org/apache/druid/server/coordinator/balancer/ReservoirSegmentSampler.java

+   * @return Iterator over {@link BalancerSegmentHolder}s, each of which contains
+   * a segment picked for moving and the server currently loading it.
+   */
+  public static List<BalancerSegmentHolder> pickMovableLoadingSegmentsFrom(


nit: the different in name is soooo minute it took me a long time to realize. One is ing and the other is ed.

Is there a reason not to expose the 4-argument method as public and have the call sites pass in one of the two lambdas? It looks like there's only a single call-site for each of them except for tests.

That or maybe try to make the names a bit more different from each other.

imply-cheddar · 2023-06-19T02:47:25Z

server/src/main/java/org/apache/druid/server/coordinator/loading/SegmentLoadingConfig.java

+  private final boolean useRoundRobinSegmentAssignment;
+  private final boolean emitBalancingStats;
+
+  public SegmentLoadingConfig(CoordinatorDynamicConfig dynamicConfig, int numUsedSegments)


design nit: a constructor should generally define the "dependency relationship" of a class. I.e. the things passed in on the constructor are the things that the current class is dependent upon. This constructor is doing a bunch of work, that work is dependent on the CoordinatorDynamicConfig object, but SegmentLoadingConfig is not dependent on the object.

In this case, it would be preferable for this to be a static method and the constructor just take all of the various values.

cheddar

making my gray check green.

server/src/main/java/org/apache/druid/server/coordinator/loading/SegmentReplicationStatus.java

+  {
+    this.replicaCountsInTier = ImmutableMap.copyOf(replicaCountsInTier);
+
+    final Map<SegmentId, SegmentReplicaCount> totalReplicaCounts = new HashMap<>();


server/src/main/java/org/apache/druid/server/coordinator/loading/SegmentLoadingConfig.java

+      return new SegmentLoadingConfig(
+          dynamicConfig.getMaxSegmentsInNodeLoadingQueue(),
+          dynamicConfig.getReplicationThrottleLimit(),
+          dynamicConfig.getMaxNonPrimaryReplicantsToLoad(),


server/src/main/java/org/apache/druid/server/coordinator/loading/SegmentLoadingConfig.java

+          dynamicConfig.getMaxNonPrimaryReplicantsToLoad(),
+          dynamicConfig.getReplicantLifetime(),
+          dynamicConfig.getMaxSegmentsToMove(),
+          dynamicConfig.getDecommissioningMaxPercentOfMaxSegmentsToMove(),


After #13197 , several coordinator configs are now redundant as they are not being used anymore, neither with `smartSegmentLoading` nor otherwise. Changes: - Remove dynamic configs `emitBalancingStats`: balancer error stats are always emitted, debug stats can be logged by using `debugDimensions` - `useBatchedSegmentSampler`, `percentOfSegmentsToConsiderPerMove`: batched segment sampling is always used - Add test to verify deserialization with unknown properties - Update `CoordinatorRunStats` to always track stats, this can be optimized later.

…he#14590)

…14603)

After apache#13197 , several coordinator configs are now redundant as they are not being used anymore, neither with `smartSegmentLoading` nor otherwise. Changes: - Remove dynamic configs `emitBalancingStats`: balancer error stats are always emitted, debug stats can be logged by using `debugDimensions` - `useBatchedSegmentSampler`, `percentOfSegmentsToConsiderPerMove`: batched segment sampling is always used - Add test to verify deserialization with unknown properties - Update `CoordinatorRunStats` to always track stats, this can be optimized later.

…he#14590)

Allow cancellation of load queue items

43cb8b8

kfaraz added Release Notes Design Review Area - Segment Balancing/Coordination labels Oct 10, 2022

kfaraz requested a review from cheddar October 10, 2022 02:54

abhishekagarwal87 requested a review from capistrant October 10, 2022 04:27

kfaraz commented Oct 10, 2022

View reviewed changes

Fix tests

c1af872

kfaraz commented Oct 12, 2022

View reviewed changes

server/src/main/java/org/apache/druid/server/coordinator/SegmentAction.java Outdated Show resolved Hide resolved

AmatyaAvadhanula reviewed Oct 12, 2022

View reviewed changes

server/src/main/java/org/apache/druid/server/coordinator/TierLoadingState.java Outdated Show resolved Hide resolved

AmatyaAvadhanula reviewed Oct 12, 2022

View reviewed changes

server/src/main/java/org/apache/druid/server/coordinator/TierLoadingState.java Outdated Show resolved Hide resolved

Emit broadcast metrics

1992777

imply-cheddar reviewed Oct 13, 2022

View reviewed changes

Add ServerHolder.canLoadSegment

4a22f94

AmatyaAvadhanula reviewed Oct 13, 2022

View reviewed changes

server/src/main/java/org/apache/druid/server/coordinator/TierLoadingState.java Outdated Show resolved Hide resolved

kfaraz added 5 commits October 13, 2022 21:43

Maintain replication level during tier shift

22413c6

Fix injection and initialization of SegmentStateManager

40ac8f7

Emit some more metrics

60801df

Fix LGTM failure

d2bde9d

Fix tests

f5b9f19

imply-cheddar reviewed Oct 18, 2022

View reviewed changes

kfaraz added 6 commits October 18, 2022 13:17

Enable priorization of segment actions

bd0db30

Clean up replication throttling

b7a8c01

Fix inspection

ee80210

Simplify SegmentLoader logic, fix under-replication bug

7a6318c

Maintain count of loading, dropping, moving segments in lookup

030d2df

Rename segment actions

6d964c3

imply-cheddar reviewed Nov 1, 2022

View reviewed changes

kfaraz mentioned this pull request Jun 12, 2023

Add replication factor column to sys table #14403

Merged

Emit metrics moveSkipped/count, assignSkipped/count. Add config smart…

9d0c756

…SegmentLoading

github-advanced-security bot found potential problems Jun 13, 2023

View reviewed changes

server/src/main/java/org/apache/druid/server/coordinator/DruidCoordinator.java Fixed Show fixed Hide fixed

imply-cheddar approved these changes Jun 16, 2023

View reviewed changes

kfaraz added 2 commits June 17, 2023 12:02

Add SegmentReplicaCount, SegmentLoadingConfig

6f4b306

Merge branch 'master' of github.com:apache/druid into improve_load_queue

e4cd4f4

github-advanced-security bot found potential problems Jun 17, 2023

View reviewed changes

Break up SegmentReplicantLookup into SegmentReplicationStatus and Seg…

da88ec4

…mentReplicaCountMap

imply-cheddar reviewed Jun 19, 2023

View reviewed changes

Merge branch 'master' of github.com:apache/druid into improve_load_queue

51e9dd0

cheddar approved these changes Jun 19, 2023

View reviewed changes

kfaraz added 2 commits June 19, 2023 09:41

Fix test, address feedback

a661837

Fix dependency

1d785ca

github-advanced-security bot found potential problems Jun 19, 2023

View reviewed changes

kfaraz merged commit 50461c3 into apache:master Jun 19, 2023

kfaraz mentioned this pull request Jul 5, 2023

Remove unused coordinator dynamic configs #14524

Merged

10 tasks

kfaraz mentioned this pull request Jul 16, 2023

Docs: Changes for coordinator improvements #14590

Merged

kfaraz added a commit that referenced this pull request Jul 18, 2023

Docs: Changes for coordinator improvements done in #13197 (#14590)

88dc330

kfaraz added a commit to kfaraz/druid that referenced this pull request Jul 18, 2023

Docs: Changes for coordinator improvements done in apache#13197 (apac…

fe1a56c

…he#14590)

kfaraz added a commit that referenced this pull request Jul 18, 2023

Docs: Changes for coordinator improvements done in #13197 (#14590) (#…

323a504

…14603)

vogievetsky mentioned this pull request Jul 18, 2023

Web console: add support for smartSegmentLoading #14610

Merged

abhishekagarwal87 added this to the 27.0 milestone Jul 19, 2023

sergioferragut pushed a commit to sergioferragut/druid that referenced this pull request Jul 21, 2023

Docs: Changes for coordinator improvements done in apache#13197 (apac…

23a02fc

…he#14590)

abhishekagarwal87 changed the title ~~Segment loading: Allow cancellation and prioritization of load queue items~~ Enable smartSegmentLoading on the Coordinator Jul 31, 2023

kfaraz deleted the improve_load_queue branch August 1, 2023 04:19

AmatyaAvadhanula mentioned this pull request Aug 6, 2023

[DRAFT] 27.0.0 release notes #14761

Closed

abhishekagarwal87 mentioned this pull request Aug 16, 2023

Community roadmap 2023 #14157

Open

kfaraz mentioned this pull request Jan 19, 2024

Tasks failing due to Coordinator it's too busy moving large amounts of segments in the Historicals #11274

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Enable smartSegmentLoading on the Coordinator #13197

Enable smartSegmentLoading on the Coordinator #13197

kfaraz commented Oct 10, 2022 •

edited

Loading

kfaraz left a comment

kfaraz Oct 10, 2022

imply-cheddar Oct 13, 2022

AmatyaAvadhanula Oct 12, 2022

kfaraz Oct 12, 2022 •

edited

Loading

imply-cheddar left a comment

imply-cheddar Oct 13, 2022

imply-cheddar Oct 13, 2022

kfaraz Oct 13, 2022 •

edited

Loading

kfaraz commented Oct 17, 2022 •

edited

Loading

imply-cheddar Nov 1, 2022

kfaraz Nov 1, 2022

imply-cheddar Nov 1, 2022

kfaraz Nov 1, 2022

imply-cheddar left a comment

imply-cheddar Jun 16, 2023

kfaraz Jun 16, 2023

kfaraz Jun 16, 2023 •

edited

Loading

imply-cheddar Jun 16, 2023

kfaraz Jun 16, 2023

kfaraz Jun 16, 2023

imply-cheddar Jun 19, 2023

imply-cheddar Jun 19, 2023

imply-cheddar Jun 19, 2023

imply-cheddar Jun 19, 2023

cheddar left a comment

		ServerTestHelper.MAPPER
		.writerWithType(HttpLoadQueuePeon.RESPONSE_ENTITY_TYPE_REF)

Enable smartSegmentLoading on the Coordinator #13197

Enable smartSegmentLoading on the Coordinator #13197

Conversation

kfaraz commented Oct 10, 2022 • edited Loading

Description

Changes

Classes to review

Behavioral changes

Structural changes

New metrics

Release notes

kfaraz left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

kfaraz Oct 12, 2022 • edited Loading

Choose a reason for hiding this comment

imply-cheddar left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

kfaraz Oct 13, 2022 • edited Loading

Choose a reason for hiding this comment

kfaraz commented Oct 17, 2022 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

imply-cheddar left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

kfaraz Jun 16, 2023 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

cheddar left a comment

Choose a reason for hiding this comment

kfaraz commented Oct 10, 2022 •

edited

Loading

kfaraz Oct 12, 2022 •

edited

Loading

kfaraz Oct 13, 2022 •

edited

Loading

kfaraz commented Oct 17, 2022 •

edited

Loading

kfaraz Jun 16, 2023 •

edited

Loading