-
Notifications
You must be signed in to change notification settings - Fork 3.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Optimize isOvershadowed when there is a unique minor version for an interval #15952
Optimize isOvershadowed when there is a unique minor version for an interval #15952
Conversation
server/src/main/java/org/apache/druid/metadata/SegmentsMetadataManagerConfig.java
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@AmatyaAvadhanula , the updated logic looks pretty neat! left some minor comments regarding logs.
server/src/main/java/org/apache/druid/metadata/SqlSegmentsMetadataManager.java
Show resolved
Hide resolved
@AmatyaAvadhanula , looking at the benchmarking, is it correct to conclude that the new logic is faster even when So the performance would be worst only when we have multiple minor versions? Another point, is the default minor version of -1 safe? What is the default value of minor version for any segment when it actually has no minor version? I should assume that the value would be |
processing/src/main/java/org/apache/druid/timeline/partition/PartitionHolder.java
Show resolved
Hide resolved
} | ||
|
||
public boolean add(PartitionChunk<T> chunk) | ||
{ | ||
return overshadowableManager.addChunk(chunk); | ||
boolean added = overshadowableManager.addChunk(chunk); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Don't we need to do anything on remove()
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we do, we would have to maintain counts and complicate the logic.
This patch is intended to help large clusters without segment locking i.e distinct minor versions.
Would it be better if a more elaborate solution which updates the state when remove is called is implemented?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, we would have to maintain counts for each minor version.
So if we skip the impl on remove
, if a minor version was ever added to the timeline, we will continue to assume that that minor version is still present for the purposes of computing maxMinorVersion. So we might sometimes enter into the slow code flow even though it could have been avoided. I guess that's not so bad.
For the timeline in the DataSourcesSnapshot
, the impl in remove
doesn't matter (at least in the current implementation of the SqlSegmentsMetadataManager
) since objects are never removed, only added to the timeline.
I guess we can keep the impl simple for now, as you suggest, but we should atleast call it out in a comment.
Also, since the logic of the maxMinorVersion
is very adhoc and confusing, we should add comments as appropriate and keep the changes limited to the VersionedIntervalTimeline
class.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Left some final comments.
@@ -34,16 +34,26 @@ public class PartitionHolder<T extends Overshadowable<T>> implements Iterable<Pa | |||
{ | |||
private final OvershadowableManager<T> overshadowableManager; | |||
|
|||
// Maintains the maximum minor version across all the added segments. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This comment should instead be a javadoc in the getMinorVersion
method.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
/** | ||
* 0) Determine the TimelineEntry with the given interval | ||
* 1) If the major versions are equal, the segment is overshadowed iff any of the segments in the TimelineEntry overshadows it. | ||
* 2) If the major versions are different, simply return true if the TimelineEntry has a greater major version. False otherwise. | ||
* 3) If there is no TimelineEntry with the same interval, find one which contains the input interval. | ||
* 4) If a TimelineEntry is found in step 3, repeat steps 1, 2 with it. Return false otherwise. | ||
* | ||
* @param interval The interval of the overshadowable object (segment) | ||
* @param version The version of the overshadowable object (segment) | ||
* @param object The overshadowable object to be determined as overshadowed or not | ||
* @return true iff the object is overshadowed in this timeline | ||
*/ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This javadoc seems more like an implementation detail rather than the overall behaviour of this method. I don't think this is needed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Removed
@@ -815,6 +832,11 @@ public PartitionHolder<ObjectType> getPartitionHolder() | |||
return partitionHolder; | |||
} | |||
|
|||
public short getMaxMinorVersion() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does this need to be public?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No
processing/src/main/java/org/apache/druid/timeline/partition/PartitionHolder.java
Show resolved
Hide resolved
@@ -1063,12 +1063,20 @@ public DataSegment map(int index, ResultSet r, StatementContext ctx) throws SQLE | |||
if (segments.isEmpty()) { | |||
log.info("No segments found in the database!"); | |||
} else { | |||
log.info("Polled and found %,d segments in the database", segments.size()); | |||
log.info("Polled and found %,d segments in the database.", segments.size()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
log.info("Polled and found %,d segments in the database.", segments.size()); | |
log.info("Polled and found [%,d] segments in the database.", segments.size()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
dataSourcesSnapshot = DataSourcesSnapshot.fromUsedSegments( | ||
Iterables.filter(segments, Objects::nonNull), // Filter corrupted entries (see above in this method). | ||
dataSourceProperties | ||
); | ||
final long snapshotCreationEndTime = System.currentTimeMillis(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Instead of this, initialize a Druid Stopwatch
at the start of this method. And in every log, print the result of .millisElapsed()
to get a clear idea of the total time spent after performing each step.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks! Done.
* Returns the maximum minor version across all the added segments. | ||
* We do not handle updates of this variable when segments are removed for the sake of simplicity. | ||
*/ | ||
short getMaxMinorVersion() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: This method should be private.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
processing/src/main/java/org/apache/druid/timeline/partition/PartitionHolder.java
Show resolved
Hide resolved
@@ -1014,7 +1015,8 @@ public void poll() | |||
@GuardedBy("pollLock") | |||
private void doPoll() | |||
{ | |||
log.debug("Starting polling of segment table"); | |||
Stopwatch stopwatch = Stopwatch.createStarted(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Stopwatch stopwatch = Stopwatch.createStarted(); | |
final Stopwatch stopwatch = Stopwatch.createStarted(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done, thanks
} | ||
stopwatch.restart(); | ||
|
||
log.debug("Building datasources snapshot from polled segments."); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This log line is not needed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Removed
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
some minor comments, otherwise changes look good.
Thank you for the review @kfaraz! |
On a cluster with 600k active segments, this patch reduce the time to build timeline from 160,000ms to 2,000ms. |
The coordinator tries to poll segments every minute (or based on the configured periodicPollDelay)
In each cycle, the DataSourcesSnapshot
The third step to compute overshadowed segments is currently based on calling
isOvershadowed
on each segment individually. The complexity for this operation is O(n ^ 2) for each interval with n segments.This can lead to coordinator polls happening infrequently, which causes a delay in segments being discovered after having been committed, which leads to stale data and slow handoffs etc.
Optimize the isOvershadowed method when there there is a single value of minorVersion across all segments in an interval (When segment locking is not used, for example) to reduce the complexity to O(n) in such cases.
This is a simple optimization that may be useful for most large clusters since segment locking is seldom used.
BENCHMARKS:
VersionedIntervalTimelineBenchmark#benchIsOvershadowedTotal
Benchmarks were run with 1000 / 10000 segments per interval with Monthly granularity over a year.
Existing code:
With this patch:
TODO:
This PR has: