-
Notifications
You must be signed in to change notification settings - Fork 3.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Relocating Table Schema Building: Shifting from Brokers to Coordinator for Improved Efficiency #14985
Relocating Table Schema Building: Shifting from Brokers to Coordinator for Improved Efficiency #14985
Conversation
This reverts commit d1c0dca.
…actor on broker side
*/ | ||
public DruidServerMetadata pickOne() | ||
{ | ||
synchronized (this) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
hmm, this still isn't quite right i think, everything needs to be synchronized if we are dropping the concurrent set. It seems simpler to add back the concurrent set, I don't think the synchronized was necessary. Sets.newConcurrentHashSet
is backed by the keyset of a ConcurrentHashMap
. Per javadocs,
Most concurrent Collection implementations (including most Queues) also differ from the usual java.util conventions in that their Iterators and Spliterators provide weakly consistent rather than fast-fail traversal:
* they may proceed concurrently with other operations
* they will never throw ConcurrentModificationException
* they are guaranteed to traverse elements as they existed upon construction exactly once, and may (but are not guaranteed to) reflect any modifications subsequent to construction.
So I think we can just iterate to pick one...
I don't think we need to do anything like guard from the DruidServerMetadata
being removed from the SegmentLoadInfo
, but if we do, then something should probably be pushing a computation into this class to do the thing so that we can do the whole operation under lock...
server/src/main/java/org/apache/druid/client/coordinator/CoordinatorClientImpl.java
Outdated
Show resolved
Hide resolved
server/src/main/java/org/apache/druid/segment/metadata/AbstractSegmentMetadataCache.java
Outdated
Show resolved
Hide resolved
{ | ||
final PhysicalDatasourceMetadata oldTable = tables.put(dataSource, physicalDatasourceMetadata); | ||
if (oldTable == null || !oldTable.getRowSignature().equals(physicalDatasourceMetadata.getRowSignature())) { | ||
log.info("[%s] has new signature: %s.", dataSource, physicalDatasourceMetadata.getRowSignature()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I haven't quite figured it out yet, or if it is related to the refactor in this PR or some pre-existing problem, but might be worth looking further into anyway...
while bouncing a historical repeatedly, so that datasources are going offline and online periodically, i keep finding log messages where it gets into a state that it is printing an empty row signature.
2023-10-28T03:33:46,084 INFO [DruidSchema-Cache-0] org.apache.druid.sql.calcite.schema.BrokerSegmentMetadataCache - [wikipedia-schemaless] has new signature: {}.
which I think must mean that it knows the table needs refreshed, but has not yet obtained any segment metadata. I'll try to see if i can determine how this happens
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When the historical went offline, the table would have been removed from the cache, when it again came up, the datasource schema is built and the message is logged since this condition oldTable == null
is met.
RowSignature
being empty implies that SMQ queries succeeded but returned empty rowSignature
for the segments? is that possible?
Not sure if this got introduced by the new changes, since we are just fetching the schema from Coordinator (which is having empty rowSignature
) and the flow on Coordinator is same as what used to be on the Broker.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Apart from the minor comments, the changes LGTM.
server/src/main/java/org/apache/druid/client/SegmentLoadInfo.java
Outdated
Show resolved
Hide resolved
*/ | ||
public DruidServerMetadata pickOne() | ||
{ | ||
synchronized (this) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes I kind of agree that we might hit 'IndexOutOfBoundsExceptions`
From concurrentHashMap java docs:
However, iterators are designed to be used by only one thread at a time.
* Bear in mind that the results of aggregate status methods including
* {@code size}, {@code isEmpty}, and {@code containsValue} are typically
* useful only when a map is not undergoing concurrent updates in other threads.
* Otherwise the results of these methods reflect transient states
* that may be adequate for monitoring or estimation purposes, but not
* for program control.
but since you can iterator the elements safely what you could do is
- get the position of the element that you want to compute using size checks.
- create iterator
- and then the following pseudo code
int prev_element;
for each (val:iterator){
if(index==pos){
return val;
}
prev_element=val;
index++;
}
return prev_element;
Integer replicationFactor = isOvershadowed ? (Integer) 0 | ||
: coordinator.getReplicationFactor(segment.getId()); | ||
// conditionally add realtime segments information | ||
if (null != includeRealtimeSegments && null != coordinatorSegmentMetadataCache) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since the realtime flag only works if the centralizedSchemaManagement
is enabled lets document this in user facing docs and well as in the code.
In the code we should mention that we did not want to increase the payload of the broker to coordinator communication if this feature is disabled and we did not want to introduce a new feature flag on the broker so this was a trade off done. This api will be deprecated once we fully move over to coordinator to power sys.segments q's from the broker.
sql/src/main/java/org/apache/druid/sql/calcite/schema/MetadataSegmentView.java
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍
Thanks you for the contribution @findingrish. |
…datasources (#15355) In pull request #14985, a bug was introduced where periodic refresh would skip rebuilding a datasource's schema after encountering a non-existent datasource. This resulted in remaining datasources having stale schema information. This change addresses the bug and adds a unit test to validate the refresh mechanism's behaviour when a datasource is removed, and other datasources have schema changes.
…r for Improved Efficiency (apache#14985) In the current design, brokers query both data nodes and tasks to fetch the schema of the segments they serve. The table schema is then constructed by combining the schemas of all segments within a datasource. However, this approach leads to a high number of segment metadata queries during broker startup, resulting in slow startup times and various issues outlined in the design proposal. To address these challenges, we propose centralizing the table schema management process within the coordinator. This change is the first step in that direction. In the new arrangement, the coordinator will take on the responsibility of querying both data nodes and tasks to fetch segment schema and subsequently building the table schema. Brokers will now simply query the Coordinator to fetch table schema. Importantly, brokers will still retain the capability to build table schemas if the need arises, ensuring both flexibility and resilience.
…datasources (apache#15355) In pull request apache#14985, a bug was introduced where periodic refresh would skip rebuilding a datasource's schema after encountering a non-existent datasource. This resulted in remaining datasources having stale schema information. This change addresses the bug and adds a unit test to validate the refresh mechanism's behaviour when a datasource is removed, and other datasources have schema changes.
…datasources (apache#15355) In pull request apache#14985, a bug was introduced where periodic refresh would skip rebuilding a datasource's schema after encountering a non-existent datasource. This resulted in remaining datasources having stale schema information. This change addresses the bug and adds a unit test to validate the refresh mechanism's behaviour when a datasource is removed, and other datasources have schema changes.
…h Schema for Seamless Coordinator Updates (#15475) The initial step in optimizing segment metadata was to centralize the construction of datasource schema in the Coordinator (#14985). Subsequently, our goal is to eliminate the requirement for regularly executing queries to obtain segment schema information. This task encompasses addressing both realtime and finalized segments. This modification specifically addresses the issue with realtime segments. Tasks will now routinely communicate the schema for realtime segments during the segment announcement process. The Coordinator will identify the schema alongside the segment announcement and subsequently update the schema for realtime segments in the metadata cache.
…rce Schema Building (#15817) Issue: #14989 The initial step in optimizing segment metadata was to centralize the construction of datasource schema in the Coordinator (#14985). Thereafter, we addressed the problem of publishing schema for realtime segments (#15475). Subsequently, our goal is to eliminate the requirement for regularly executing queries to obtain segment schema information. This is the final change which involves publishing segment schema for finalized segments from task and periodically polling them in the Coordinator.
Description
Original proposal
Issue: #14989
In the current design, brokers query both data nodes and tasks to fetch the schema of the segments they serve. The table schema is then constructed by combining the schemas of all segments within a datasource. However, this approach leads to a high number of segment metadata queries during broker startup, resulting in slow startup times and various issues outlined in the design proposal.
To address these challenges, we propose centralizing the table schema management process within the coordinator. This change is the first step in that direction. In the new arrangement, the coordinator will take on the responsibility of querying both data nodes and tasks to fetch segment schema and subsequently building the table schema. Brokers will now simply query the Coordinator to fetch table schema. Importantly, brokers will still retain the capability to build table schemas if the need arises, ensuring both flexibility and resilience.
Design
Coordinator changes
These changes enhance the coordinator's capabilities, enabling it to manage table schema and provide schema related information through its APIs. As of now, the segment metadata cache structure will be maintained on both leader and follower processes, this would help in leader failover scenario.
SegmentMetadataCache refactor: We introduce a new class,
CoordinatorSegmentMetadataCache
, extendingAbstractSegmentMetadataCache
which manages the cache and forms the core of the schema management process.We also add an implementation of
QuerySegmentWalker
, which is specific to segment metadata queries executed during segment refresh mechanism.Changes to Coordinator timeline:
CoordinatorServerView
has minor changes to register a timeline callback. This is required for CoordinatorSegmentMetadata to update its cache using the timeline.API Changes:
We introduce a new container class,
DataSourceInformation
, for handlingRowSignature
information. An API is exposed to retrieve datasource information, checkMetadataResource#getDataSourceInformation
.-- Request Path:
/druid/coordinator/v1/metadata/dataSourceInformation
-- HTTP Method: POST
-- Request Body: Set of dataSource names
-- Query params: No query params
-- Response: List of
DataSourceInformation
object.The existing api
MetadataResource#getAllUsedSegments
, responsible for returning used segments is modified (backward compatible though) to accept a new query parameter calledincludeRealtimeSegments
. When set, this parameter will include realtime segments in the result. Additionally, we enhance the return objectSegmentStatusInCluster
, to provide additional information regardingrealtime
andnumRows
.-- Request Path:
/druid/coordinator/v1/metadata/segments
-- HTTP Method: GET
-- Request Body: No changes
-- Query params: Added new query parameter
includeRealtimeSegments
, when set returns realtime segments as well-- Response:
SegmentStatusInCluster
object has two new fields,realtime
&numRows
Binding for Query Modules: We add essential bindings to run segment metadata query in
CliCoordinator
.Broker changes
Broker now defaults to querying the Coordinator for fetching table schema, eliminating the need for querying data nodes and tasks for segment schema. Additionally, there are changes in the logic for building sys segments table.
Broker-side SegmentMetadataCache: We introduce a new implementation of
AbstractSegmentMetadataCache
calledBrokerSegmentMetadataCache
. This implementation queries the coordinator for table schema and falls back to running a segment metadata query when needed. It also assumes the responsibility for buildingPhysicalDataSourceMetadata
, for which a new class is introduced.System schema changes: In
MetadataSegmentView
, requests for realtime segments are now included when polling the coordinator. The segment table building logic is updated accordingly.Testing
The changes have been tested locally with the wikipedia dataset.
Unit test has been added.
All of the existing integration tests have been tested with feature enabled (39fb248 & eb3e3c1 & 30438f4).
Integration test with the group name
centralized-table-schema
has been added.The changes have also been tested in a druid cluster with,
Notable observations,
Potential side effects
With this change, brokers will now poll the datasource schema from the coordinator, ensuring that all brokers consistently use the same datasource schema.
Known issues
Upgrade considerations
The general upgrade order should be followed. The new code is behind a feature flag, so it is compatible with existing setups. Brokers with the new changes can communicate with old Coordinator without issues.
Release note
This feature is experimental and addresses multiple challenges outlined in the description. To enable it, set
druid.coordinator.centralizedTableSchema.enabled
in Coordinator configurations.As part of enabling queries on the coordinator, following config classes are available with the specified base property names.
druid.coordinator.segmentMetadata
->SegmentMetadataQueryConfig
druid.coordinator.internal.query.config
->InternalQueryConfig
druid.coordinator.query.retryPolicy
->RetryQueryRunnerConfig
druid.coordinator.query.scheduler
->QuerySchedulerProvider
druid.coordinator.query.default
->DefaultQueryConfig
This PR has: