Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

SQL: Add is_active to sys.segments, update examples and docs. #11550

Merged
merged 6 commits into from
May 19, 2022
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
25 changes: 13 additions & 12 deletions docs/querying/sql-metadata-tables.md
Original file line number Diff line number Diff line change
Expand Up @@ -127,20 +127,22 @@ Segments table provides details on all Druid segments, whether they are publishe
|version|STRING|Version string (generally an ISO8601 timestamp corresponding to when the segment set was first started). Higher version means the more recently created segment. Version comparing is based on string comparison.|
|partition_num|LONG|Partition number (an integer, unique within a datasource+interval+version; may not necessarily be contiguous)|
|num_replicas|LONG|Number of replicas of this segment currently being served|
|num_rows|LONG|Number of rows in current segment, this value could be null if unknown to Broker at query time|
|is_published|LONG|Boolean is represented as long type where 1 = true, 0 = false. 1 represents this segment has been published to the metadata store with `used=1`. See the [Architecture page](../design/architecture.md#segment-lifecycle) for more details.|
|is_available|LONG|Boolean is represented as long type where 1 = true, 0 = false. 1 if this segment is currently being served by any process(Historical or realtime). See the [Architecture page](../design/architecture.md#segment-lifecycle) for more details.|
|is_realtime|LONG|Boolean is represented as long type where 1 = true, 0 = false. 1 if this segment is _only_ served by realtime tasks, and 0 if any historical process is serving this segment.|
|is_overshadowed|LONG|Boolean is represented as long type where 1 = true, 0 = false. 1 if this segment is published and is _fully_ overshadowed by some other published segments. Currently, is_overshadowed is always false for unpublished segments, although this may change in the future. You can filter for segments that "should be published" by filtering for `is_published = 1 AND is_overshadowed = 0`. Segments can briefly be both published and overshadowed if they were recently replaced, but have not been unpublished yet. See the [Architecture page](../design/architecture.md#segment-lifecycle) for more details.|
|shard_spec|STRING|JSON-serialized form of the segment `ShardSpec`|
|num_rows|LONG|Number of rows in this segment, or zero if the number of rows is not known.<br /><br />This row count is gathered by the Broker in the background. It will be zero if the Broker has not gathered a row count for this segment yet. For segments ingested from streams, the reported row count may lag behind the result of a `count(*)` query because the cached `num_rows` on the Broker may be out of date. This will settle shortly after new rows stop being written to that particular segment.|
|is_active|LONG|True for segments that represent the latest state of a datasource.<br /><br />Equivalent to `(is_published = 1 AND is_overshadowed = 0) OR is_realtime = 1`. In steady state, when no ingestion or data management operations are happening, `is_active` will be equivalent to `is_available`. However, they may differ from each other when ingestion or data management operations have executed recently. In these cases, Druid will load and unload segments appropriately to bring actual availability in line with the expected state given by `is_active`.|
Copy link

@loquisgon loquisgon May 19, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Very minor nit: ... At the end of this great explanation, just to repeat it so it sticks: "given by is_active. In other words, a segment that is in the is_active state may not be available, not queryable, yet, but it will be in the near future".

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"might be"? I guess it is possible that due to some other activities (segment was overshadowed before being available for instance) a segment in is_active may never make it to is_available....

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah: there's a couple reasons a segment in is_active state won't eventually become is_available. Maybe it's dropped before that happens. Or maybe something is broken. In the interest of keeping the doc from getting too long I'm thinking to leave it as-is. But I invite follow-up patches that improve things 🙂

|is_published|LONG|Boolean represented as long type where 1 = true, 0 = false. 1 if this segment has been published to the metadata store and is marked as used. See the [segment lifecycle documentation](../design/architecture.md#segment-lifecycle) for more details.|
|is_available|LONG|Boolean represented as long type where 1 = true, 0 = false. 1 if this segment is currently being served by any data serving process, like a Historical or a realtime ingestion task. See the [segment lifecycle documentation](../design/architecture.md#segment-lifecycle) for more details.|
|is_realtime|LONG|Boolean represented as long type where 1 = true, 0 = false. 1 if this segment is _only_ served by realtime tasks, and 0 if any Historical process is serving this segment.|
|is_overshadowed|LONG|Boolean represented as long type where 1 = true, 0 = false. 1 if this segment is published and is _fully_ overshadowed by some other published segments. Currently, `is_overshadowed` is always 0 for unpublished segments, although this may change in the future. You can filter for segments that "should be published" by filtering for `is_published = 1 AND is_overshadowed = 0`. Segments can briefly be both published and overshadowed if they were recently replaced, but have not been unpublished yet. See the [segment lifecycle documentation](../design/architecture.md#segment-lifecycle) for more details.||shard_spec|STRING|JSON-serialized form of the segment `ShardSpec`|
|dimensions|STRING|JSON-serialized form of the segment dimensions|
|metrics|STRING|JSON-serialized form of the segment metrics|
|last_compaction_state|STRING|JSON-serialized form of the compaction task's config (compaction task which created this segment). May be null if segment was not created by compaction task.|

For example to retrieve all segments for datasource "wikipedia", use the query:
For example, to retrieve all currently active segments for datasource "wikipedia", use the query:

```sql
SELECT * FROM sys.segments WHERE datasource = 'wikipedia'
SELECT * FROM sys.segments
WHERE datasource = 'wikipedia'
AND is_active = 1
```

Another example to retrieve segments total_size, avg_size, avg_num_rows and num_segments per datasource:
Expand All @@ -153,6 +155,7 @@ SELECT
CASE WHEN SUM(num_rows) = 0 THEN 0 ELSE SUM("num_rows") / (COUNT(*) FILTER(WHERE num_rows > 0)) END AS avg_num_rows,
COUNT(*) AS num_segments
FROM sys.segments
WHERE is_active = 1
GROUP BY 1
ORDER BY 2 DESC
```
Expand Down Expand Up @@ -180,17 +183,15 @@ ORDER BY 1
If you want to retrieve segment that was compacted (ANY compaction):

```sql
SELECT * FROM sys.segments WHERE last_compaction_state is not null
SELECT * FROM sys.segments WHERE is_active = 1 AND last_compaction_state IS NOT NULL
```

or if you want to retrieve segment that was compacted only by a particular compaction spec (such as that of the auto compaction):

```sql
SELECT * FROM sys.segments WHERE last_compaction_state == 'SELECT * FROM sys.segments where last_compaction_state = 'CompactionState{partitionsSpec=DynamicPartitionsSpec{maxRowsPerSegment=5000000, maxTotalRows=9223372036854775807}, indexSpec={bitmap={type=roaring, compressRunOnSerialization=true}, dimensionCompression=lz4, metricCompression=lz4, longEncoding=longs, segmentLoader=null}}'
SELECT * FROM sys.segments WHERE is_active = 1 AND last_compaction_state = 'CompactionState{partitionsSpec=DynamicPartitionsSpec{maxRowsPerSegment=5000000, maxTotalRows=9223372036854775807}, indexSpec={bitmap={type=roaring, compressRunOnSerialization=true}, dimensionCompression=lz4, metricCompression=lz4, longEncoding=longs, segmentLoader=null}}'
```

*Caveat:* Note that a segment can be served by more than one stream ingestion tasks or Historical processes, in that case it would have multiple replicas. These replicas are weakly consistent with each other when served by multiple ingestion tasks, until a segment is eventually served by a Historical, at that point the segment is immutable. Broker prefers to query a segment from Historical over an ingestion task. But if a segment has multiple realtime replicas, for e.g.. Kafka index tasks, and one task is slower than other, then the sys.segments query results can vary for the duration of the tasks because only one of the ingestion tasks is queried by the Broker and it is not guaranteed that the same task gets picked every time. The `num_rows` column of segments table can have inconsistent values during this period. There is an open [issue](https://github.com/apache/druid/issues/5915) about this inconsistency with stream ingestion tasks.

### SERVERS table

Servers table lists all discovered servers in the cluster.
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -9,6 +9,7 @@
"partition_num": 0,
"num_replicas": 1,
"num_rows": 4462111,
"is_active": 1,
"is_published": 1,
"is_available": 1,
"is_realtime": 0,
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -123,6 +123,8 @@ public class SystemSchema extends AbstractSchema
* where 1 = true and 0 = false to make it easy to count number of segments
* which are published, available etc.
*/
private static final long IS_ACTIVE_FALSE = 0L;
private static final long IS_ACTIVE_TRUE = 1L;
private static final long IS_PUBLISHED_FALSE = 0L;
private static final long IS_PUBLISHED_TRUE = 1L;
private static final long IS_AVAILABLE_TRUE = 1L;
Expand All @@ -140,6 +142,7 @@ public class SystemSchema extends AbstractSchema
.add("partition_num", ColumnType.LONG)
.add("num_replicas", ColumnType.LONG)
.add("num_rows", ColumnType.LONG)
.add("is_active", ColumnType.LONG)
.add("is_published", ColumnType.LONG)
.add("is_available", ColumnType.LONG)
.add("is_realtime", ColumnType.LONG)
Expand Down Expand Up @@ -313,7 +316,10 @@ public Enumerable<Object[]> scan(DataContext root)
(long) segment.getShardSpec().getPartitionNum(),
numReplicas,
numRows,
IS_PUBLISHED_TRUE, //is_published is true for published segments
//is_active is true for published segments that are not overshadowed
val.isOvershadowed() ? IS_ACTIVE_FALSE : IS_ACTIVE_TRUE,
Copy link

@loquisgon loquisgon May 19, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

mmm.. isn't it a requirement for being active that is_overshadow and is_publish both be true? Oh...got it. We already know that it is published if we are here. So it is fine...never mind.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, that's the idea. The branch is for published segments only.

//is_published is true for published segments
IS_PUBLISHED_TRUE,
isAvailable,
isRealtime,
val.isOvershadowed() ? IS_OVERSHADOWED_TRUE : IS_OVERSHADOWED_FALSE,
Expand Down Expand Up @@ -350,8 +356,10 @@ public Enumerable<Object[]> scan(DataContext root)
(long) val.getValue().getSegment().getShardSpec().getPartitionNum(),
numReplicas,
val.getValue().getNumRows(),
IS_PUBLISHED_FALSE,
// is_active is true for unpublished segments iff they are realtime
val.getValue().isRealtime() /* is_active */,
// is_published is false for unpublished segments
IS_PUBLISHED_FALSE,
// is_available is assumed to be always true for segments announced by historicals or realtime tasks
IS_AVAILABLE_TRUE,
val.getValue().isRealtime(),
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -520,7 +520,7 @@ public void testGetTableMap()
final RelDataType rowType = segmentsTable.getRowType(new JavaTypeFactoryImpl());
final List<RelDataTypeField> fields = rowType.getFieldList();

Assert.assertEquals(17, fields.size());
Assert.assertEquals(18, fields.size());

final SystemSchema.TasksTable tasksTable = (SystemSchema.TasksTable) schema.getTableMap().get("tasks");
final RelDataType sysRowType = tasksTable.getRowType(new JavaTypeFactoryImpl());
Expand Down Expand Up @@ -708,14 +708,15 @@ private void verifyRow(
Assert.assertEquals(partitionNum, row[6]);
Assert.assertEquals(numReplicas, row[7]);
Assert.assertEquals(numRows, row[8]);
Assert.assertEquals(isPublished, row[9]);
Assert.assertEquals(isAvailable, row[10]);
Assert.assertEquals(isRealtime, row[11]);
Assert.assertEquals(isOvershadowed, row[12]);
Assert.assertEquals((((isPublished == 1) && (isOvershadowed == 0)) || (isRealtime == 1)) ? 1L : 0L, row[9]);
Assert.assertEquals(isPublished, row[10]);
Assert.assertEquals(isAvailable, row[11]);
Assert.assertEquals(isRealtime, row[12]);
Assert.assertEquals(isOvershadowed, row[13]);
if (compactionState == null) {
Assert.assertNull(row[16]);
Assert.assertNull(row[17]);
} else {
Assert.assertEquals(mapper.writeValueAsString(compactionState), row[16]);
Assert.assertEquals(mapper.writeValueAsString(compactionState), row[17]);
}
}

Expand Down
1 change: 1 addition & 0 deletions website/.spelling
Original file line number Diff line number Diff line change
Expand Up @@ -530,6 +530,7 @@ error_msg
exprs
group_id
interval_expr
is_active
is_available
is_leader
is_overshadowed
Expand Down