-
Notifications
You must be signed in to change notification settings - Fork 3.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Reconcile terminology and method naming to 'used/unused segments'; Rename MetadataSegmentManager to MetadataSegmentsManager #7306
Conversation
…n't use terms 'enable/disable data source'; Rename MetadataSegmentManager to MetadataSegments; Make REST API methods which mark segments as used/unused to return server error instead of an empty response in case of error
…exerMetadataStorageCoordinator instead of hacking with Intervals.ETERNITY
This looks related and somewhat conflicting with the terminology discussion on #7233, where initial confusion over the meaning of I think if we go with where 7233 is going, the term for "used segments" would be "published segments". Fwiw, one reason I like this term because it lends itself well to being used as verbs in the following way:
Yeah, I definitely also believe these terms are not great, and are confusing for the reasons you mention. Here the "publish" and "unpublish" terms are a bit awkward though. It doesn't sound right to say "publish all segments in a datasource" instead of "enable a datasource". "Mark used" or "activate" sounds better. I'm not sure how to reconcile this with the fact that I do think "publish" sounds good & intuitive as a verb in the case where a task is initially writing a segment record to the metadata store. Maybe this is how:
|
I think it makes sense to keep terminology between virtual Druid's SQL table Can |
I don't think we need to be dogmatic about it. But if possible, it benefits everyone (users & developers alike) to have consistent terminology across the entire system.
Do you mean in the metadata store? I don't think it could be, since we don't have a way to automatically migrate already-existing metadata stores that use the name |
I'm not sure that a column in |
I think it's best if they are the same thing, since the point of system tables like If it turns out that for some reason, they can't be the same thing, then it makes sense to use two different names. But if they could be the same thing then it's probably best to do that. |
@gianm I didn't question that "published in It doesn't seem to me that this PR is conflicting with the rename. I think it's supportive to the rename because in this PR I did some nasty work at finding and sweeping old names and variants of naming that were looking completely unrelated. Now for the rename, somebody needs to make just two full-text searches in the codebase on Github (I did many more when preparing this PR): Then visit all result files that look relevant manually and rename "used/unused segments" to "publish/published/unpublished". (Note that this search needs to be performed after this PR is merged, because Github indexes only master codebase for full-text search.) For this reason, I tag this PR
Oh no, I think we must go all-in with publish/unpublish/published/unpublished terminology. So we should call these actions "mark (as) published", however awkward it sounds, and "unpublish" (I think "unpublish all segments in a data source" sounds fine), or, alternatively, "mark (as) unpublished". |
…tency with CompactionTask; ClientCompactQueryTuningConfig to ClientCompactionTaskQueryTuningConfig
For reviewers: in the latest commits I've made some more clarification renames:
I kept the old names of JSON properties because aliases (for backward compatibility) are only supported in Jackson 2.9, see #7152. DruidCoordinatorHelpers:
(Next steps will be: remove "DruidCoordinator" prefixes and make all class names noun phrases or verb phrases, but not a mix.) |
@@ -75,9 +76,9 @@ | |||
/** | |||
*/ | |||
@ManageLifecycle | |||
public class SQLMetadataSegmentManager implements MetadataSegmentManager | |||
public class SqlMetadataSegments implements MetadataSegments |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
SegmentsStorage
? ...Segments
make me think about a factory class, while this one also makes update and delete operations.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also, it's more like segmentsMetadata
we work with here, not ...Segments
directly, meaning, can we name it SegmentsMetadataStorage
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you explain the factory association? I don't see a link.
To me, "Storage" doesn't contribute to meaning, as well as "Manager" which I removed.
SegmentsSqlMetadata
may be a better name than SqlMetadataSegments
though.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Renamed the interface to SegmentsMetadata
and the implementation class to SqlSegmentsMetadata
respectively.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would like to see classes that contain DB API calls - to identify themselves in naming the consistent way across the project. So do you propose to start replacing Manager
to Sql
or it will be a single stray class like that?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't replace Manager
with Sql
- SQL
was already in the name. I just removed Manager
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Got it. But what about I would like to see classes that contain DB API calls - to identify themselves in naming the consistent way across the project.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think Sql
is that identifier.
* This method's name starts with "prepare" for the same reason as {@link | ||
* #prepareImmutableDataSourceWithUsedSegments}. | ||
*/ | ||
Collection<ImmutableDruidDataSource> prepareImmutableDataSourcesWithAllUsedSegments(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wonder if All
is necessary?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you think is unnecessary?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
All
makes sense in a specific use case, ie If there are two methods like:
getAll
getRandomSample
otherwise All
is just redundant.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This rename is already done and merged in #7653.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm, it's confusing to see it here then, I spent time reading through those changes.
|
||
void stop(); | ||
|
||
boolean tryMarkAsUsedAllSegmentsInDataSource(String dataSource); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wonder if All
is necessary?
* This method is not removed because {@link org.apache.druid.server.http.DataSourcesResource#removeSegment} | ||
* uses it and if it migrates to {@link #tryMarkSegmentAsUnused(SegmentId)} the performance will be worse. | ||
*/ | ||
boolean tryMarkSegmentAsUnused(String dataSource, String segmentId); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wonder if As
is necessary?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This question is missed, I guess.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This rename is already done and merged in #7653. This PR doesn't rename methods in MetadataSegmentManager
anymore, it only renames the class itself.
server/src/main/java/org/apache/druid/metadata/MetadataSegments.java
Outdated
Show resolved
Hide resolved
@@ -48,21 +49,21 @@ | |||
{ | |||
public static final String CONFIG_KEY = "coordinator.config"; | |||
|
|||
private final long millisToWaitBeforeDeleting; | |||
private final long millisLagSinceCoordinatorBecomesLeaderBeforeCanMarkAsUnusedOvershadowedSegments; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For me it's much easier/naturally to find a description of the method/variable then to read so long name without spaces between words.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you think it's easier for everyone, for example, people who are not Druid developers themselves?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I also think that name is too long, suggest initialDropOvershadowedWaitMillis
instead, with additional clarification in docs
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Or markOvershadowedAsUnusedInitialWaitMillis
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't like the loss of meaning or even confusion in the "initial" adjective. Also, other configuration parameter names in this class don't drop the "Segments" part, except for replicantLifetime
and replicationThrottleLimit
which are bad names anyway. I also wanted to keep the "MarkAsUnused" part together, because it appears in different places in the code. What about leadingTimeMillisBeforeCanMarkAsUnusedOvershadowedSegments
?
server/src/main/java/org/apache/druid/server/coordinator/CoordinatorDynamicConfig.java
Show resolved
Hide resolved
server/src/main/java/org/apache/druid/server/coordinator/CoordinatorDynamicConfig.java
Outdated
Show resolved
Hide resolved
server/src/main/java/org/apache/druid/server/coordinator/CoordinatorDynamicConfig.java
Outdated
Show resolved
Hide resolved
@@ -299,7 +314,8 @@ public boolean equals(Object o) | |||
|
|||
CoordinatorDynamicConfig that = (CoordinatorDynamicConfig) o; | |||
|
|||
if (millisToWaitBeforeDeleting != that.millisToWaitBeforeDeleting) { | |||
if (millisLagSinceCoordinatorBecomesLeaderBeforeCanMarkAsUnusedOvershadowedSegments != | |||
that.millisLagSinceCoordinatorBecomesLeaderBeforeCanMarkAsUnusedOvershadowedSegments) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Whoa, it takes me a deal of concentration just to check those two variables represent the same property. Now I know when the bigger is not better regarding the variable naming.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you have some better names in mind?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
millisLagBeforeMarkingUnusedOvershadowed
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This property is already renamed to leadingTimeMillisBeforeCanMarkAsUnusedOvershadowedSegments
in #7653.
docs/content/configuration/index.md
Outdated
@@ -1023,7 +1023,7 @@ Worker select strategies control how Druid assigns tasks to middleManagers. | |||
|
|||
###### Equal Distribution | |||
|
|||
Tasks are assigned to the middleManager with the most available capacity at the time the task begins running. This is | |||
Tasks are assigned to the middleManager with the most spare capacity at the time the task begins running. This is |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"capacity" implies the maximum limit
with max free task slots
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't know.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's do Tasks are assigned to the middleManager with the most free slots at the time
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also, let's write Middle Manager
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Applied the first suggestion. I didn't replace the term "middleManager" (one word) with "Middle Manager" because the first version is used throughout the document in many places. I'm also not sure that the separate writing is better for searching in the docs. If you feel that the replacement would be an improvement, please open an issue.
docs/content/operations/metrics.md
Outdated
@@ -205,8 +205,8 @@ These metrics are for the Druid Coordinator and are reset each time the Coordina | |||
|`segment/loadQueue/failed`|Number of segments that failed to load.|server.|0| | |||
|`segment/loadQueue/count`|Number of segments to load.|server.|Varies.| | |||
|`segment/dropQueue/count`|Number of segments to drop.|server.|Varies.| | |||
|`segment/size`|Size in bytes of available segments.|dataSource.|Varies.| | |||
|`segment/count`|Number of available segments.|dataSource.|< max| | |||
|`segment/size`|Sum of sizes of all used segments belonging to a data source. Emitted only for data sources to which at least one used segment belongs.|dataSource.|Varies.| |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Total size of used segments
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't understand what do you mean by this question.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's do Sum of sizes of all used segments
-> Total size of used segments
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Changed
public Collection<ObjectType> iterateAllObjects() | ||
{ | ||
return CollectionUtils.createLazyCollectionFromStream( | ||
() -> allTimelineEntries |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
allTimelineEntries
makes me think there's another collection with a subset of "all", but it isn't.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It follows the naming of "allTimelineEntries". I think the logic behind this naming is that it includes overshadowed objects. I've extended the Javadoc of iterateAllObjects()
to make it clearer.
core/src/main/java/org/apache/druid/timeline/VersionedIntervalTimeline.java
Show resolved
Hide resolved
This pull request introduces 1 alert when merging 6da0b9d into 441515c - view on LGTM.com new alerts:
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@leventov, I apologize for the very delayed review and thank you for your patience. Also thank you for adding a bunch of missing javadocs and comments. I left a couple of comments.
It doesn't seem to me that this PR is conflicting with the rename. I think it's supportive to the rename because in this PR I did some nasty work at finding and sweeping old names and variants of naming that were looking completely unrelated. Now for the rename, somebody needs to make just two full-text searches in the codebase on Github (I did many more when preparing this PR):
"used segment"
"unused segment"Then visit all result files that look relevant manually and rename "used/unused segments" to "publish/published/unpublished". (Note that this search needs to be performed after this PR is merged, because Github indexes only master codebase for full-text search.)
I'm wondering you're planning to rename "used/unused segments" to "published/unpublished segments" in the near future. If not, I think this PR should change those names properly at least for the UI. If you do, I promise to review your follow-up PR as soon as possible.
@@ -36,7 +37,7 @@ Void insertOrUpdate( | |||
byte[] value | |||
); | |||
|
|||
byte[] lookup( | |||
@Nullable byte[] lookup( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would you add a javadoc saying when it can be null? Perhaps it would be
/**
* Returns the value of the valueColumn when there is only one row matched to the given key.
* This method returns null if there is no such row and throws an error if there are more than one rows.
*/
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for suggestion, applied
*/ | ||
public static String commaSeparateIdentifiers(final Collection<DataSegment> segments) | ||
public static Object commaSeparatedIdentifiers(final Collection<DataSegment> segments) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm, the method name doesn't look valid anymore. Maybe lazyCollectionOfSegmentIds
?
Also why not having Collection<SegmentId>
as the return type instead of Object
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The purpose is to make this method unusable for anything except logging. To obtain a collection of SegmentIds, people should usually use Stream API to create a non-lazy collection.
@@ -89,22 +89,22 @@ | |||
|
|||
// (start partitionId, end partitionId) -> minorVersion -> atomicUpdateGroup | |||
private final TreeMap<RootPartitionRange, Short2ObjectSortedMap<AtomicUpdateGroup<T>>> standbyGroups; | |||
private final TreeMap<RootPartitionRange, Short2ObjectSortedMap<AtomicUpdateGroup<T>>> visibleGroup; | |||
private final TreeMap<RootPartitionRange, Short2ObjectSortedMap<AtomicUpdateGroup<T>>> visibleGroups; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm, I intentionally used the singular form here to say that there is only one visibleGroup in any case. Perhaps the code should be refactored to be more clear.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I renamed to visibleGroupPerRange
, does it make more sense? Also added a comment as per #8788
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It sounds good to me. Thanks for adding a comment.
"used" segments (that is, the segments that *should* be loaded in the cluster) and the loading rules. | ||
|
||
Before any unassigned segments are serviced by Historical processes, the Historical processes for each tier are first | ||
sorted in terms of capacity, with least capacity servers having the highest priority. Unassigned segments are always |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Before any unassigned segments are serviced by Historical processes, the Historical processes for each tier are first
sorted in terms of capacity, with least capacity servers having the highest priority.
It seems like this statement was written in 2013.. I don't think this is true anymore. Maybe better to mention druid.coordinator.balancer.strategy
and link https://github.com/apache/druid/blob/master/docs/configuration/index.md#coordinator-operation.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think coordinator documentation should better be updated at once, There is #7201 tracking this
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sounds good to me.
docs/design/coordinator.md
Outdated
@@ -51,7 +64,12 @@ Segments can be automatically loaded and dropped from the cluster based on a set | |||
|
|||
### Cleaning up segments | |||
|
|||
Each run, the Druid coordinator compares the list of available database segments in the database with the current segments in the cluster. Segments that are not in the database but are still being served in the cluster are flagged and appended to a removal list. Segments that are overshadowed (their versions are too old and their data has been replaced by newer segments) are also dropped. | |||
On each run, the Druid Coordinator compares the set of used segments in the database with the segments served by some | |||
nodes in the cluster. Coordinator sends requests to Historical nodes to unload unused segments or segments the are |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Suggest some nodes
-> historical processes
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
changed to Historical nodes
because this term is also used in the section twice.
* given data source from the metadata store. | ||
* | ||
* Unlike other similar methods in this interface, this method doesn't accept a {@link Segments} "visibility" | ||
* parameter. This methods returns overshadowed segments, as if {@link Segments#INCLUDING_OVERSHADOWED} was passed. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it's more clear to say "The returned collection may include overshadowed segments and their created_dates, as if {@link Segments#INCLUDING_OVERSHADOWED} was passed. It's the responsibility of the caller to filter out overshadowed ones if needed`.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, changed.
* Delete pending segments created in the given interval belonging to the given data source from the pending segments | ||
* table. The {@code created_date} field of the pending segments table is checked to find segments to be deleted. | ||
* | ||
* Note that the semantic of the interval (for `created_date`s) is different from the semantic of the interva |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
typo: interva -> interval
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, fixed
|
||
/** | ||
* Delete all pending segments belonging to the given data source from the pending segments table. | ||
* The {@code created_date} field of the pending segments table is checked to find segments to be deleted. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This method doesn't seem to check the created_date field.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why? Looks it does:
druid/server/src/main/java/org/apache/druid/metadata/IndexerSQLMetadataStorageCoordinator.java
Lines 894 to 910 in 6da0b9d
@Override | |
public int deletePendingSegmentsCreatedInInterval(String dataSource, Interval deleteInterval) | |
{ | |
return connector.getDBI().inTransaction( | |
(handle, status) -> handle | |
.createStatement( | |
StringUtils.format( | |
"DELETE FROM %s WHERE datasource = :dataSource AND created_date >= :start AND created_date < :end", | |
dbTables.getPendingSegmentsTable() | |
) | |
) | |
.bind("dataSource", dataSource) | |
.bind("start", deleteInterval.getStart().toString()) | |
.bind("end", deleteInterval.getEnd().toString()) | |
.execute() | |
); | |
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This javadoc is for a different method: int deletePendingSegments(String dataSource);
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see now, fixed. Thanks.
@@ -908,7 +1009,7 @@ private boolean segmentExists(final Handle handle, final DataSegment segment) | |||
* Read dataSource metadata. Returns null if there is no metadata. | |||
*/ | |||
@Override | |||
public DataSourceMetadata getDataSourceMetadata(final String dataSource) | |||
public @Nullable DataSourceMetadata retrieveDataSourceMetadata(final String dataSource) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it's more popular to put annotations before the line where the method access modifier is (just like @Override
). Is this intended?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I prefer putting @Nullable
before the type because semantically it characterizes the returned type, not the method (there are also modifiers and other annotations in between).
import org.apache.druid.server.coordinator.DruidCoordinatorRuntimeParams; | ||
|
||
import javax.annotation.Nullable; | ||
|
||
/** | ||
* | ||
*/ | ||
public interface DruidCoordinatorHelper | ||
public interface CoordinatorDuty |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think duty
is more clear to me than helper
.
This pull request introduces 1 alert when merging 10587c1 into d541cbe - view on LGTM.com new alerts:
|
@jihoonson thank you very much for taking time to review this PR.
I don't plan to make a follow-up rename. I don't want to make partial rename in the UI because it will create an inconsistency between UI and official documentation. Then, if docs are changed, there is an inconsistency between docs and code. And so on. So I think there should be a one-time rename. In the part of my comment which you cited, I explained why I think that this PR in its current state is at least not a move in the wrong direction, in terms of rename, because it will greatly simplify the future rename, whoever decided to do it. |
@leventov thanks, it makes sense to me. |
dropReloadDatasource?: string; | ||
dropReloadAction: 'drop' | 'reload'; | ||
dropReloadInterval: string; | ||
dataSourceToMarkSegmentsByIntervalIn?: string; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a great rename, it is good to have the UI aligned with the docs. Could you please make sure that we treat "datasource" as one word I have been consistently using "datasource" for the Druid table thing and "data source" for the source of data like S3 or HDFS (in the data loader).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So this should be datasourceToMarkSegmentsByIntervalIn
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Renamed
confirmButtonText="Drop data" | ||
successText="Data drop request acknowledged, next time the coordinator runs data will be dropped" | ||
failText="Could not drop data" | ||
confirmButtonText="Mark as unused all segments in data source" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is way too long to be a button. Maybe just call it "Mark as unused" and make sure to specify what is going on in the text of the dialog (which you are mostly doing already)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I shortened confirm button texts but just removed "in data source" parts, IMO even shorter names are too ambiguous.
This pull request introduces 1 alert when merging 2ad6dd5 into 83ddc8d - view on LGTM.com new alerts:
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The latest changes LGTM. @vogievetsky do you have more comments? @leventov sorry to say, but would you mind fixing the conflict one more time?
Web console changes LGTM 👍 |
An interesting lucky occasion: the conflict was in the specific file and very close to the line where I actually inserted a bug, which was not caught by tests. (Raised #9257) |
@jihoonson @vogievetsky thanks for reviews! |
This pull request introduces 1 alert when merging 6745f4a into 19b427e - view on LGTM.com new alerts:
|
The line where LGTM complains about is to verify two static variables have the same value which should have been the same variable but stored separately because of the module dependency. So, the LGTM failure doesn't look legit in this case. I'm not sure whether we can skip the LGTM inspection for some particular lines. I'm merging this PR since it has been open for too long time. We can address this issue later in a follow-up PR. |
Description
There are several related sets of changes:
Reconcile terminology and method naming to 'used/unused segments'
This naming stems from the
used=0/1
column in the database table. Other names that used to be used in the codebase areYou can see how the third one, "available segments" has emerged, but the result is totally misleading and confusing with segments which are available aka served by some data nodes.
Don't use terms 'enable/disable data source'
"Enable data source" actually means "try to mark as 'used' all segments in the database that belong to a data source". "Disable data source" actually means "try to mark as unused all segments in the database that belong to a data source". Enable/disable terminology might make users think that Druid maintains a separate toggle with the notion of an individual segment's
used
flag, which is false. Disabling a data source and then enabling it again loses information about the subset of segments that may have been unused in a data source.Similarly, uses terms 'Used/unused data source' instead of 'enabled/disabled data source' to differentiate between data sources to which at least one used segment belongs and to which none used segments belong.
Rename
MetadataSegmentManager
toSegmentsMetadataManager
To make less aliasing and confusion with
SegmentsManager
, a class used on Historical nodes. Also renamed residual occurrences in variable names of "DatabaseSegmentManager", which is the former name ofMetadataSegmentManager
.Rename
DruidCoordinatorHelper
toCoordinatorDuty
Also, renamed the implementations:
DruidCoordinatorBalancer
->BalanceSegments
DruidCoordinatorSegmentCompactor
->CompactSegments
DruidCoordinatorLogger
->EmitClusterStatsAndMetrics
DruidCoordinatorSegmentKiller
->KillUnusedSegments
DruidCoordinatorCleanupPendingSegments
->KillStalePendingSegments
DruidCoordinatorSegmentInfoLoader
->LogUsedSegments
DruidCoordinatorCleanupOvershadowed
->MarkAsUnusedOvershadowedSegments
DruidCoordinatorCleanupUnneeded
->UnloadUnusedSegments
DruidCoordinatorRuleRunner
->RunRules
Also, renamed
CoordinatorIndexingServiceHelper
annotation toCoordinatorIndexingServiceDuty
.Rename
IndexerSQLMetadataStorageCoordinator
'sget*
methods toretrieve*
Because they all access the metadata store, i. e. not cheap.
getUsedSegmentsForInterval
->retrieveUsedSegmentsForInterval
getAllUsedSegments
->retrieveAllUsedSegments
getUsedSegmentsAndCreatedDates
->retrieveUsedSegmentsAndCreatedDates
getUsedSegmentsForIntervals
->retrieveUsedSegmentsForIntervals
getveUnusedSegmentsForInterval
->retrieveUnusedSegmentsForInterval
getDataSourceMetadata
->retrieveDataSourceMetadata
As well as a related method
UsedSegmentLister.getUsedSegmentsForIntervals
->retrieveUsedSegmentsForIntervals
.See also #8187 and #8188.
Rename
SegmentsAndMetadata
class toSegmentsAndCommitMetadata
This is done to increase the difference from
SegmentsMetadataManager
(formerMetadataSegmentManager
).Rename several indexing service task-related classes
KillTask
->KillUnusedSegmentsTask
ClientQuery
->ClientTaskQuery
ClientKillQuery
->ClientKillUnusedSegmentsTaskQuery
ClientCompactQuery
->ClientCompactionTaskQuery
Refactoring of
DruidCoordinator
Refactored
DruidCoordinator
, removed unnecessaryCoordinatorRunnable
hierarchy.Future work
MetadataRuleManager
toRulesMetadataManager
to align withSegmentsMetadataManager
./druid/coordinator/v1/datasources/{dataSourceName}/segments/{segmentId}
to mark as used only non-overshadowed segments?Key files changed in this PR are
datasource-view.tsx
andDruidCoordinator
.