Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Optimize isOvershadowed when there is a unique minor version for an interval #15952

Merged

Conversation

AmatyaAvadhanula
Copy link
Contributor

@AmatyaAvadhanula AmatyaAvadhanula commented Feb 23, 2024

The coordinator tries to poll segments every minute (or based on the configured periodicPollDelay)
In each cycle, the DataSourcesSnapshot

  1. Polls the database for all used segments
  2. Create a segment timeline for each of the datasources
  3. Determine all the overshadowed segments

The third step to compute overshadowed segments is currently based on calling isOvershadowed on each segment individually. The complexity for this operation is O(n ^ 2) for each interval with n segments.
This can lead to coordinator polls happening infrequently, which causes a delay in segments being discovered after having been committed, which leads to stale data and slow handoffs etc.

Optimize the isOvershadowed method when there there is a single value of minorVersion across all segments in an interval (When segment locking is not used, for example) to reduce the complexity to O(n) in such cases.
This is a simple optimization that may be useful for most large clusters since segment locking is seldom used.

BENCHMARKS:
VersionedIntervalTimelineBenchmark#benchIsOvershadowedTotal
Benchmarks were run with 1000 / 10000 segments per interval with Monthly granularity over a year.

Existing code:

Benchmark                                                    (numInitialRootGenSegmentsPerInterval)  (numNonRootGenerations)  (segmentGranularity)  (useSegmentLock)   Mode  Cnt  Score    Error  Units
VersionedIntervalTimelineBenchmark.benchIsOvershadowedTotal                                    1000                        1                 MONTH             false  thrpt    3  0.347 ±  0.030  ops/s
VersionedIntervalTimelineBenchmark.benchIsOvershadowedTotal                                    1000                        1                 MONTH              true  thrpt    3  0.366 ±  0.033  ops/s
VersionedIntervalTimelineBenchmark.benchIsOvershadowedTotal                                    1000                        2                 MONTH             false  thrpt    3  0.343 ±  0.015  ops/s
VersionedIntervalTimelineBenchmark.benchIsOvershadowedTotal                                    1000                        2                 MONTH              true  thrpt    3  0.365 ±  0.033  ops/s
VersionedIntervalTimelineBenchmark.benchIsOvershadowedTotal                                   10000                        1                 MONTH             false  thrpt    3  0.004 ±  0.001  ops/s
VersionedIntervalTimelineBenchmark.benchIsOvershadowedTotal                                   10000                        1                 MONTH              true  thrpt    3  0.004 ±  0.001  ops/s
VersionedIntervalTimelineBenchmark.benchIsOvershadowedTotal                                   10000                        2                 MONTH             false  thrpt    3  0.003 ±  0.001  ops/s
VersionedIntervalTimelineBenchmark.benchIsOvershadowedTotal                                   10000                        2                 MONTH              true  thrpt    3  0.004 ±  0.001  ops/s

With this patch:

Benchmark                                                    (numInitialRootGenSegmentsPerInterval)  (numNonRootGenerations)  (segmentGranularity)  (useSegmentLock)   Mode  Cnt   Score    Error  Units
VersionedIntervalTimelineBenchmark.benchIsOvershadowedTotal                                    1000                        1                 MONTH             false  thrpt    3  82.062 ±  5.082  ops/s
VersionedIntervalTimelineBenchmark.benchIsOvershadowedTotal                                    1000                        1                 MONTH              true  thrpt    3  80.937 ± 33.370  ops/s
VersionedIntervalTimelineBenchmark.benchIsOvershadowedTotal                                    1000                        2                 MONTH             false  thrpt    3  59.582 ±  6.060  ops/s
VersionedIntervalTimelineBenchmark.benchIsOvershadowedTotal                                    1000                        2                 MONTH              true  thrpt    3  59.314 ±  9.134  ops/s
VersionedIntervalTimelineBenchmark.benchIsOvershadowedTotal                                   10000                        1                 MONTH             false  thrpt    3   8.165 ±  0.675  ops/s
VersionedIntervalTimelineBenchmark.benchIsOvershadowedTotal                                   10000                        1                 MONTH              true  thrpt    3   8.193 ±  0.431  ops/s
VersionedIntervalTimelineBenchmark.benchIsOvershadowedTotal                                   10000                        2                 MONTH             false  thrpt    3   5.831 ±  1.910  ops/s
VersionedIntervalTimelineBenchmark.benchIsOvershadowedTotal                                   10000                        2                 MONTH              true  thrpt    3   5.927 ±  0.421  ops/s

TODO:

  • Add tests to ensure correctness of the new method

This PR has:

  • been self-reviewed.
  • added documentation for new or modified features or behaviors.
  • a release note entry in the PR description.
  • added Javadocs for most classes and all non-trivial methods. Linked related entities via Javadoc links.
  • added or updated version, license, or notice information in licenses.yaml
  • added comments explaining the "why" and the intent of the code wherever would not be obvious for an unfamiliar reader.
  • added unit tests or modified existing tests to cover new code paths, ensuring the threshold for code coverage is met.
  • added integration tests.
  • been tested in a test Druid cluster.

@AmatyaAvadhanula AmatyaAvadhanula changed the title Replace individual isOvershadowed with findFullyOvershadowed Optimize isOvershadowed when there is a unique minor version for an interval Feb 27, 2024
@kfaraz kfaraz marked this pull request as ready for review March 1, 2024 06:38
Copy link
Contributor

@kfaraz kfaraz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@AmatyaAvadhanula , the updated logic looks pretty neat! left some minor comments regarding logs.

@kfaraz
Copy link
Contributor

kfaraz commented Mar 1, 2024

@AmatyaAvadhanula , looking at the benchmarking, is it correct to conclude that the new logic is faster even when useSegmentLock = true (i.e. minor versions exist)? But how can that be the case?
Is it because the object.getMinorVersion turns out to be equal to the maxMinorVersion?

So the performance would be worst only when we have multiple minor versions?

Another point, is the default minor version of -1 safe?

What is the default value of minor version for any segment when it actually has no minor version? I should assume that the value would be 0.

}

public boolean add(PartitionChunk<T> chunk)
{
return overshadowableManager.addChunk(chunk);
boolean added = overshadowableManager.addChunk(chunk);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Don't we need to do anything on remove()?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we do, we would have to maintain counts and complicate the logic.
This patch is intended to help large clusters without segment locking i.e distinct minor versions.
Would it be better if a more elaborate solution which updates the state when remove is called is implemented?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, we would have to maintain counts for each minor version.

So if we skip the impl on remove, if a minor version was ever added to the timeline, we will continue to assume that that minor version is still present for the purposes of computing maxMinorVersion. So we might sometimes enter into the slow code flow even though it could have been avoided. I guess that's not so bad.

For the timeline in the DataSourcesSnapshot, the impl in remove doesn't matter (at least in the current implementation of the SqlSegmentsMetadataManager) since objects are never removed, only added to the timeline.

I guess we can keep the impl simple for now, as you suggest, but we should atleast call it out in a comment.

Also, since the logic of the maxMinorVersion is very adhoc and confusing, we should add comments as appropriate and keep the changes limited to the VersionedIntervalTimeline class.

@kfaraz kfaraz self-requested a review March 1, 2024 09:40
Copy link
Contributor

@kfaraz kfaraz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Left some final comments.

@@ -34,16 +34,26 @@ public class PartitionHolder<T extends Overshadowable<T>> implements Iterable<Pa
{
private final OvershadowableManager<T> overshadowableManager;

// Maintains the maximum minor version across all the added segments.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This comment should instead be a javadoc in the getMinorVersion method.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

Comment on lines 453 to 464
/**
* 0) Determine the TimelineEntry with the given interval
* 1) If the major versions are equal, the segment is overshadowed iff any of the segments in the TimelineEntry overshadows it.
* 2) If the major versions are different, simply return true if the TimelineEntry has a greater major version. False otherwise.
* 3) If there is no TimelineEntry with the same interval, find one which contains the input interval.
* 4) If a TimelineEntry is found in step 3, repeat steps 1, 2 with it. Return false otherwise.
*
* @param interval The interval of the overshadowable object (segment)
* @param version The version of the overshadowable object (segment)
* @param object The overshadowable object to be determined as overshadowed or not
* @return true iff the object is overshadowed in this timeline
*/
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This javadoc seems more like an implementation detail rather than the overall behaviour of this method. I don't think this is needed.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removed

@@ -815,6 +832,11 @@ public PartitionHolder<ObjectType> getPartitionHolder()
return partitionHolder;
}

public short getMaxMinorVersion()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this need to be public?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No

@@ -1063,12 +1063,20 @@ public DataSegment map(int index, ResultSet r, StatementContext ctx) throws SQLE
if (segments.isEmpty()) {
log.info("No segments found in the database!");
} else {
log.info("Polled and found %,d segments in the database", segments.size());
log.info("Polled and found %,d segments in the database.", segments.size());
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
log.info("Polled and found %,d segments in the database.", segments.size());
log.info("Polled and found [%,d] segments in the database.", segments.size());

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

dataSourcesSnapshot = DataSourcesSnapshot.fromUsedSegments(
Iterables.filter(segments, Objects::nonNull), // Filter corrupted entries (see above in this method).
dataSourceProperties
);
final long snapshotCreationEndTime = System.currentTimeMillis();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Instead of this, initialize a Druid Stopwatch at the start of this method. And in every log, print the result of .millisElapsed() to get a clear idea of the total time spent after performing each step.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks! Done.

@kfaraz kfaraz self-requested a review March 20, 2024 11:22
* Returns the maximum minor version across all the added segments.
* We do not handle updates of this variable when segments are removed for the sake of simplicity.
*/
short getMaxMinorVersion()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: This method should be private.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

@@ -1014,7 +1015,8 @@ public void poll()
@GuardedBy("pollLock")
private void doPoll()
{
log.debug("Starting polling of segment table");
Stopwatch stopwatch = Stopwatch.createStarted();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
Stopwatch stopwatch = Stopwatch.createStarted();
final Stopwatch stopwatch = Stopwatch.createStarted();

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done, thanks

}
stopwatch.restart();

log.debug("Building datasources snapshot from polled segments.");
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This log line is not needed.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removed

Copy link
Contributor

@kfaraz kfaraz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

some minor comments, otherwise changes look good.

@AmatyaAvadhanula AmatyaAvadhanula merged commit 488d376 into apache:master Mar 20, 2024
85 checks passed
@AmatyaAvadhanula
Copy link
Contributor Author

Thank you for the review @kfaraz!

@adarshsanjeev adarshsanjeev added this to the 30.0.0 milestone May 6, 2024
@maytasm
Copy link
Contributor

maytasm commented May 22, 2024

On a cluster with 600k active segments, this patch reduce the time to build timeline from 160,000ms to 2,000ms.
On a cluster with ~7 million active segments, this patch reduce the time to build timeline from 850,000ms to 35,000ms. We have some datasources with 100K++ segments.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants