Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Reconcile terminology and method naming to 'used/unused segments'; Rename MetadataSegmentManager to MetadataSegmentsManager #7306

Merged
merged 67 commits into from
Jan 27, 2020
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
67 commits
Select commit Hold shift + click to select a range
87556bc
Reconcile terminology and method naming to 'used/unused segments'; Do…
leventov Mar 19, 2019
e44e199
Merge remote-tracking branch 'upstream/master' into used-segments
leventov Mar 20, 2019
25de70d
Fix brace
leventov Mar 20, 2019
fe13d9c
Import order
leventov Mar 21, 2019
050bd3b
Rename withKillDataSourceWhitelist to withSpecificDataSourcesToKill
leventov Mar 21, 2019
562f5b4
Fix tests
leventov Mar 21, 2019
a6138da
Fix tests by adding proper methods without interval parameters to Ind…
leventov Mar 25, 2019
cb6fe7f
More aligned names of DruidCoordinatorHelpers, rename several Coordin…
leventov Mar 28, 2019
164f783
Merge remote-tracking branch 'upstream/master' into used-segments
leventov Mar 28, 2019
d959bb0
Rename ClientCompactTaskQuery to ClientCompactionTaskQuery for consis…
leventov Mar 28, 2019
01c4494
More variable and method renames
leventov Mar 28, 2019
06ec389
Rename MetadataSegments to SegmentsMetadata
leventov Apr 1, 2019
09fab95
Merge remote-tracking branch 'upstream/master' into used-segments
leventov Apr 1, 2019
f4774e1
Javadoc update
leventov Apr 1, 2019
8d5200d
Simplify SegmentsMetadata.getUnusedSegmentIntervals(), more javadocs
leventov Apr 1, 2019
255f320
Merge remote-tracking branch 'upstream/master' into used-segments
leventov Apr 9, 2019
f34eef0
Update Javadoc of VersionedIntervalTimeline.iterateAllObjects()
leventov Apr 9, 2019
2ebb23c
Reorder imports
leventov Apr 9, 2019
ba8ed62
Rename SegmentsMetadata.tryMark... methods to mark... and make them t…
leventov Apr 19, 2019
64b0404
Merge remote-tracking branch 'upstream/master' into used-segments
leventov Apr 19, 2019
231a529
Complete merge
leventov Apr 19, 2019
a113231
Add CollectionUtils.newTreeSet(); Refactor DruidCoordinatorRuntimePar…
leventov Apr 22, 2019
abad3f3
Remove MetadataSegmentManager
leventov Apr 22, 2019
cb1f3b0
Rename millisLagSinceCoordinatorBecomesLeaderBeforeCanMarkAsUnusedOve…
leventov Apr 22, 2019
a5840b8
Fix tests, refactor DruidCluster creation in tests into DruidClusterB…
leventov Apr 26, 2019
f4283d2
Merge with master
leventov May 3, 2019
559f2c8
Merge remote-tracking branch 'upstream/master' into used-segments
leventov May 8, 2019
93c12cd
Merge remote-tracking branch 'upstream/master' into used-segments
leventov May 9, 2019
dfc0dbb
Merge remote-tracking branch 'upstream/master' into used-segments
leventov Jul 19, 2019
553b804
Merge remote-tracking branch 'upstream/master' into used-segments
leventov Jul 22, 2019
886e580
Fix inspections
leventov Jul 22, 2019
77fc5e8
Fix SQLMetadataSegmentManagerEmptyTest and rename it to SqlSegmentsMe…
leventov Jul 22, 2019
1258fa0
Rename SegmentsAndMetadata to SegmentsAndCommitMetadata to reduce the…
leventov Jul 23, 2019
15cb9ae
Merge remote-tracking branch 'upstream/master' into used-segments
leventov Jul 24, 2019
c2c7547
Rename DruidCoordinatorHelper to CoordinatorDuty, refactor DruidCoord…
leventov Jul 25, 2019
9cd239e
Merge remote-tracking branch 'upstream/master' into used-segments
leventov Jul 26, 2019
a723553
Unused import
leventov Jul 26, 2019
26a8381
Optimize imports
leventov Jul 29, 2019
afcd301
Merge remote-tracking branch 'upstream/master' into used-segments
leventov Jul 29, 2019
1557acc
Rename IndexerSQLMetadataStorageCoordinator.getDataSourceMetadata() t…
leventov Jul 29, 2019
96bcab7
Unused import
leventov Jul 29, 2019
d934853
Merge remote-tracking branch 'upstream/master' into used-segments
leventov Jul 31, 2019
c3e488e
Update terminology in datasource-view.tsx
leventov Jul 31, 2019
f65e654
Fix label in datasource-view.spec.tsx.snap
leventov Jul 31, 2019
5e8f3e4
Fix lint errors in datasource-view.tsx
leventov Aug 1, 2019
697f0d5
Doc improvements
leventov Aug 1, 2019
66a23f9
Another attempt to please TSLint
leventov Aug 2, 2019
d3882c4
Another attempt to please TSLint
leventov Aug 2, 2019
4b9d992
Merge remote-tracking branch 'upstream/master' into used-segments
leventov Aug 2, 2019
b77a327
Merge remote-tracking branch 'upstream/master' into used-segments
leventov Aug 8, 2019
d355aa6
Merge remote-tracking branch 'upstream/master' into used-segments
leventov Aug 12, 2019
718bb23
Merge remote-tracking branch 'upstream/master' into used-segments
leventov Aug 28, 2019
19b7f3a
Merge remote-tracking branch 'upstream/master' into used-segments
leventov Aug 28, 2019
2a6bcff
Merge remote-tracking branch 'upstream/master' into used-segments
leventov Sep 11, 2019
22e71d1
Merge remote-tracking branch 'upstream/master' into used-segments
leventov Nov 14, 2019
c575e6e
Merge remote-tracking branch 'upstream/master' into used-segments
leventov Nov 14, 2019
c5d22a0
Style fixes
leventov Nov 14, 2019
285ebc9
Fix IndexerSQLMetadataStorageCoordinator.createUsedSegmentsSqlQueryFo…
leventov Nov 14, 2019
53f572a
Try to fix docs build issue
leventov Nov 14, 2019
da7667c
Javadoc and spelling fixes
leventov Nov 15, 2019
91a5c8c
Merge remote-tracking branch 'upstream/master' into used-segments
leventov Nov 20, 2019
a566b6d
Merge remote-tracking branch 'upstream/master' into used-segments
leventov Nov 21, 2019
6da0b9d
Merge remote-tracking branch 'upstream/master' into used-segments
leventov Dec 8, 2019
d0bf20a
Rename SegmentsMetadata to SegmentsMetadataManager, address other com…
leventov Jan 22, 2020
10587c1
Merge remote-tracking branch 'upstream/master' into used-segments
leventov Jan 22, 2020
2ad6dd5
Address more comments
leventov Jan 23, 2020
6745f4a
Fix a bug in DataSourceOptimizer
leventov Jan 26, 2020
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
17 changes: 15 additions & 2 deletions .idea/inspectionProfiles/Druid.xml

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

Original file line number Diff line number Diff line change
Expand Up @@ -24,9 +24,9 @@
import org.apache.druid.client.DataSourcesSnapshot;
import org.apache.druid.jackson.DefaultObjectMapper;
import org.apache.druid.java.util.common.DateTimes;
import org.apache.druid.server.coordinator.helper.CompactionSegmentIterator;
import org.apache.druid.server.coordinator.helper.CompactionSegmentSearchPolicy;
import org.apache.druid.server.coordinator.helper.NewestSegmentFirstPolicy;
import org.apache.druid.server.coordinator.duty.CompactionSegmentIterator;
import org.apache.druid.server.coordinator.duty.CompactionSegmentSearchPolicy;
import org.apache.druid.server.coordinator.duty.NewestSegmentFirstPolicy;
import org.apache.druid.timeline.DataSegment;
import org.apache.druid.timeline.VersionedIntervalTimeline;
import org.apache.druid.timeline.partition.NumberedShardSpec;
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -22,7 +22,6 @@
import org.apache.druid.java.util.common.logger.Logger;

import javax.annotation.Nullable;

import java.lang.management.ManagementFactory;
import java.lang.management.ThreadMXBean;
import java.lang.reflect.Method;
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -19,6 +19,7 @@

package org.apache.druid.metadata;

import javax.annotation.Nullable;
import java.util.List;

/**
Expand All @@ -36,7 +37,11 @@ Void insertOrUpdate(
byte[] value
);

byte[] lookup(
/**
* Returns the value of the valueColumn when there is only one row matched to the given key.
* This method returns null if there is no such row and throws an error if there are more than one rows.
*/
@Nullable byte[] lookup(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would you add a javadoc saying when it can be null? Perhaps it would be

  /**
   * Returns the value of the valueColumn when there is only one row matched to the given key.
   * This method returns null if there is no such row and throws an error if there are more than one rows.
   */

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for suggestion, applied

String tableName,
String keyColumn,
String valueColumn,
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -21,6 +21,7 @@

import com.fasterxml.jackson.annotation.JsonProperty;
import org.apache.druid.java.util.common.StringUtils;

import java.util.Properties;

/**
Expand Down
16 changes: 7 additions & 9 deletions core/src/main/java/org/apache/druid/segment/SegmentUtils.java
Original file line number Diff line number Diff line change
Expand Up @@ -19,6 +19,7 @@

package org.apache.druid.segment;

import com.google.common.collect.Collections2;
import com.google.common.hash.HashFunction;
import com.google.common.hash.Hasher;
import com.google.common.hash.Hashing;
Expand All @@ -37,7 +38,6 @@
import java.util.Collection;
import java.util.Collections;
import java.util.List;
import java.util.stream.Collectors;

/**
* Utility methods useful for implementing deep storage extensions.
Expand Down Expand Up @@ -78,16 +78,14 @@ public static int getVersionFromDir(File inDir) throws IOException
}

/**
* Returns a String with identifiers of "segments" comma-separated. Useful for log messages. Not useful for anything
* else, because this doesn't take special effort to escape commas that occur in identifiers (not common, but could
* potentially occur in a datasource name).
* Returns an object whose toString() returns a String with identifiers of the given segments, comma-separated. Useful
* for log messages. Not useful for anything else, because this doesn't take special effort to escape commas that
* occur in identifiers (not common, but could potentially occur in a datasource name).
*/
public static String commaSeparateIdentifiers(final Collection<DataSegment> segments)
public static Object commaSeparatedIdentifiers(final Collection<DataSegment> segments)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, the method name doesn't look valid anymore. Maybe lazyCollectionOfSegmentIds?
Also why not having Collection<SegmentId> as the return type instead of Object?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The purpose is to make this method unusable for anything except logging. To obtain a collection of SegmentIds, people should usually use Stream API to create a non-lazy collection.

{
return segments
.stream()
.map(segment -> segment.getId().toString())
.collect(Collectors.joining(", "));
// Lazy, to avoid preliminary string creation if logging level is turned off
return Collections2.transform(segments, DataSegment::getId);
}

private SegmentUtils()
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -38,7 +38,6 @@
import java.util.Collection;
import java.util.Comparator;
import java.util.HashMap;
import java.util.HashSet;
import java.util.IdentityHashMap;
import java.util.Iterator;
import java.util.List;
Expand All @@ -56,25 +55,38 @@
/**
* VersionedIntervalTimeline is a data structure that manages objects on a specific timeline.
*
* It associates a jodatime Interval and a generically-typed version with the object that is being stored.
* It associates an {@link Interval} and a generically-typed version with the object that is being stored.
*
* In the event of overlapping timeline entries, timeline intervals may be chunked. The underlying data associated
* with a timeline entry remains unchanged when chunking occurs.
*
* After loading objects via the add() method, the lookup(Interval) method can be used to get the list of the most
* recent objects (according to the version) that match the given interval. The intent is that objects represent
* a certain time period and when you do a lookup(), you are asking for all of the objects that you need to look
* at in order to get a correct answer about that time period.
* After loading objects via the {@link #add} method, the {@link #lookup(Interval)} method can be used to get the list
* of the most recent objects (according to the version) that match the given interval. The intent is that objects
* represent a certain time period and when you do a {@link #lookup(Interval)}, you are asking for all of the objects
* that you need to look at in order to get a correct answer about that time period.
*
* The {@link #findFullyOvershadowed} method returns a list of objects that will never be returned by a call to lookup()
* because they are overshadowed by some other object. This can be used in conjunction with the add() and remove()
* methods to achieve "atomic" updates. First add new items, then check if those items caused anything to be
* overshadowed, if so, remove the overshadowed elements and you have effectively updated your data set without any user
* impact.
* The {@link #findFullyOvershadowed} method returns a list of objects that will never be returned by a call to {@link
* #lookup} because they are overshadowed by some other object. This can be used in conjunction with the {@link #add}
* and {@link #remove} methods to achieve "atomic" updates. First add new items, then check if those items caused
* anything to be overshadowed, if so, remove the overshadowed elements and you have effectively updated your data set
* without any user impact.
*/
public class VersionedIntervalTimeline<VersionType, ObjectType extends Overshadowable<ObjectType>>
implements TimelineLookup<VersionType, ObjectType>
{
public static VersionedIntervalTimeline<String, DataSegment> forSegments(Iterable<DataSegment> segments)
{
return forSegments(segments.iterator());
}

public static VersionedIntervalTimeline<String, DataSegment> forSegments(Iterator<DataSegment> segments)
{
final VersionedIntervalTimeline<String, DataSegment> timeline =
new VersionedIntervalTimeline<>(Comparator.naturalOrder());
addSegments(timeline, segments);
return timeline;
}

private final ReentrantReadWriteLock lock = new ReentrantReadWriteLock(true);

// Below timelines stores only *visible* timelineEntries
Expand All @@ -99,19 +111,6 @@ public VersionedIntervalTimeline(Comparator<? super VersionType> versionComparat
this.versionComparator = versionComparator;
}

public static VersionedIntervalTimeline<String, DataSegment> forSegments(Iterable<DataSegment> segments)
{
return forSegments(segments.iterator());
}

public static VersionedIntervalTimeline<String, DataSegment> forSegments(Iterator<DataSegment> segments)
{
final VersionedIntervalTimeline<String, DataSegment> timeline =
new VersionedIntervalTimeline<>(Comparator.naturalOrder());
addSegments(timeline, segments);
return timeline;
}

public static void addSegments(
VersionedIntervalTimeline<String, DataSegment> timeline,
Iterator<DataSegment> segments
Expand Down Expand Up @@ -151,6 +150,11 @@ public Collection<ObjectType> iterateAllObjects()
);
egor-ryashin marked this conversation as resolved.
Show resolved Hide resolved
}

public int getNumObjects()
{
return numObjects.get();
}

/**
* Computes a set with all objects falling within the specified interval which are at least partially "visible" in
* this interval (that is, are not fully overshadowed within this interval).
Expand Down Expand Up @@ -371,62 +375,69 @@ public Set<TimelineObjectHolder<VersionType, ObjectType>> findFullyOvershadowed(
lock.readLock().lock();
try {
// 1. Put all timelineEntries and remove all visible entries to find out only non-visible timelineEntries.
final Map<Interval, Map<VersionType, TimelineEntry>> overShadowed = new HashMap<>();
for (Map.Entry<Interval, TreeMap<VersionType, TimelineEntry>> versionEntry : allTimelineEntries.entrySet()) {
@SuppressWarnings("unchecked")
Map<VersionType, TimelineEntry> versionCopy = (TreeMap) versionEntry.getValue().clone();
overShadowed.put(versionEntry.getKey(), versionCopy);
}

for (Entry<Interval, TimelineEntry> entry : completePartitionsTimeline.entrySet()) {
Map<VersionType, TimelineEntry> versionEntry = overShadowed.get(entry.getValue().getTrueInterval());
if (versionEntry != null) {
versionEntry.remove(entry.getValue().getVersion());
if (versionEntry.isEmpty()) {
overShadowed.remove(entry.getValue().getTrueInterval());
}
}
}

for (Entry<Interval, TimelineEntry> entry : incompletePartitionsTimeline.entrySet()) {
Map<VersionType, TimelineEntry> versionEntry = overShadowed.get(entry.getValue().getTrueInterval());
if (versionEntry != null) {
versionEntry.remove(entry.getValue().getVersion());
if (versionEntry.isEmpty()) {
overShadowed.remove(entry.getValue().getTrueInterval());
}
}
}

final Set<TimelineObjectHolder<VersionType, ObjectType>> retVal = new HashSet<>();
for (Entry<Interval, Map<VersionType, TimelineEntry>> versionEntry : overShadowed.entrySet()) {
for (Entry<VersionType, TimelineEntry> entry : versionEntry.getValue().entrySet()) {
final TimelineEntry timelineEntry = entry.getValue();
retVal.add(timelineEntryToObjectHolder(timelineEntry));
}
}
final Map<Interval, Map<VersionType, TimelineEntry>> overshadowedPartitionsTimeline =
computeOvershadowedPartitionsTimeline();

final Set<TimelineObjectHolder<VersionType, ObjectType>> overshadowedObjects = overshadowedPartitionsTimeline
.values()
.stream()
.flatMap(
(Map<VersionType, TimelineEntry> entry) -> entry.values().stream().map(this::timelineEntryToObjectHolder)
)
.collect(Collectors.toSet());

// 2. Visible timelineEntries can also have overshadowed segments. Add them to the result too.
// 2. Visible timelineEntries can also have overshadowed objects. Add them to the result too.
for (TimelineEntry entry : incompletePartitionsTimeline.values()) {
final List<PartitionChunk<ObjectType>> entryOvershadowed = entry.partitionHolder.getOvershadowed();
if (!entryOvershadowed.isEmpty()) {
retVal.add(
final List<PartitionChunk<ObjectType>> overshadowedEntries = entry.partitionHolder.getOvershadowed();
if (!overshadowedEntries.isEmpty()) {
overshadowedObjects.add(
new TimelineObjectHolder<>(
entry.trueInterval,
entry.version,
new PartitionHolder<>(entryOvershadowed)
new PartitionHolder<>(overshadowedEntries)
)
);
}
}

return retVal;
return overshadowedObjects;
}
finally {
lock.readLock().unlock();
}
}

private Map<Interval, Map<VersionType, TimelineEntry>> computeOvershadowedPartitionsTimeline()
{
final Map<Interval, Map<VersionType, TimelineEntry>> overshadowedPartitionsTimeline = new HashMap<>();
allTimelineEntries.forEach((Interval interval, TreeMap<VersionType, TimelineEntry> versionEntry) -> {
@SuppressWarnings("unchecked")
Map<VersionType, TimelineEntry> versionEntryCopy = (TreeMap) versionEntry.clone();
overshadowedPartitionsTimeline.put(interval, versionEntryCopy);
});

for (TimelineEntry entry : completePartitionsTimeline.values()) {
overshadowedPartitionsTimeline.computeIfPresent(
entry.getTrueInterval(),
(Interval interval, Map<VersionType, TimelineEntry> versionEntry) -> {
versionEntry.remove(entry.getVersion());
return versionEntry.isEmpty() ? null : versionEntry;
}
);
}

for (TimelineEntry entry : incompletePartitionsTimeline.values()) {
overshadowedPartitionsTimeline.computeIfPresent(
entry.getTrueInterval(),
(Interval interval, Map<VersionType, TimelineEntry> versionEntry) -> {
versionEntry.remove(entry.getVersion());
return versionEntry.isEmpty() ? null : versionEntry;
}
);
}
return overshadowedPartitionsTimeline;
}

public boolean isOvershadowed(Interval interval, VersionType version, ObjectType object)
{
lock.readLock().lock();
Expand Down
Loading