Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Enhancement]: Support Streaming Service in Milvus #33285

Open
36 of 46 tasks
chyezh opened this issue May 22, 2024 · 6 comments
Open
36 of 46 tasks

[Enhancement]: Support Streaming Service in Milvus #33285

chyezh opened this issue May 22, 2024 · 6 comments
Assignees
Labels
feature/streaming node streaming node feature kind/enhancement Issues or changes related to enhancement
Milestone

Comments

@chyezh
Copy link
Contributor

chyezh commented May 22, 2024

Motivation

  • Enhance cloud-native pooling capabilities: decouple I/O and computation, separate batch and stream data processing into different components.
  • Enhance write path functionality: WAL pre-write data checking and deduplication, even post-write data secondary index synchronization. The flushing and syncing process will be simplified in the pipeline.
  • Optimize Timetick allocation: only ensure channel dimension monotonically increases.
  • Embedded pub-sub log service:
    • Merge and dispatch channel data managed unifiedly.
    • Streaming API only provides Vchannel granularity operations, pchannel becomes an internal concept.
    • Reduce dependency on third-party message queues, KV storage or distributed file systems can both be used for log storage, increasing channel upper limit possibility.
    • Internal message optimization: caching messages to reduce remote fetch operations, merging of empty window timetick messages reduces persistent data, improves log data catch up read speed, and reduces recovery time.

Architecture

image
The following changes will be made:

  • Write with pub API of stream node instead of MQ In proxy
  • Don’t process TimeTick in the proxy communicate with RootCoord
  • Allocate segment ID in stream node instead of request segment ID allocator in proxy with Datacoord
  • Subscribe stream node instead MQ in QueryNode
  • The flush function will be moved to the stream node, the Indexnode will be merge into Datanode

Components Responsibility

  • StreamCoord :
    • Meta manager for segments and pchannels, with the partial function being akin to the datacoord segment manager.
    • Pchannel holder is responsible for assigning pchannels to stream nodes or balancing pchannels on stream nodes.
    • Manage Sessions for all stream nodes
  • StreamNode
    • Provide a pub-sub API for producing or consuming a Write-Ahead Log (WAL) that includes consuming from the latest position or a specific position.
    • WAL persistence process flow with pre- and post-hooks interface.
    • Subscribe to WAL and flush data to object storage
    • Query stream data in the future
  • DataCoord
    • Coordinate and allocate tasks to computing nodes, such as indexing, importing, and compaction tasks
    • task meta manager
    • Manage sessions for all computing nodes•
  • DataNode
    • Execute Task
  • QueryCoord/ Querynode
    • Keep to the same with 2.4 version

Goals

  • Track1: pub-sub WAL service, support rocksmq/puslar/kafka as WAL data storage.
  • Track2: support flushing in stream node, merge indexnode into datanode.

RoadMap

Streaming Service Implementation

Use Streaming Service To Produce

Use Streaming Service To Consume In Query

Use Streaming Service To Consume In Flush

Utility

Rolling upgrade

Incoming

Limitation

  • Only one pchannel can be assigned to a single stream node, implying that pchannel supports only single point write.
@chyezh chyezh added the kind/enhancement Issues or changes related to enhancement label May 22, 2024
sre-ci-robot pushed a commit that referenced this issue Jun 11, 2024
@xiaofan-luan
Copy link
Collaborator

what about name it as streaming service?

@chyezh chyezh changed the title [Enhancement]: Support Log Service in Milvus [Enhancement]: Support Streaming Service in Milvus Jun 19, 2024
@jaime0815 jaime0815 added this to the 2.5.0 milestone Jun 21, 2024
sre-ci-robot pushed a commit that referenced this issue Jun 24, 2024
jaime0815 pushed a commit that referenced this issue Jun 27, 2024
issue: #33285

- use reader but not consumer for pulsar
- advanced test framework
- move some streaming related package into pkg

---------

Signed-off-by: chyezh <[email protected]>
@jaime0815
Copy link
Contributor

jaime0815 commented Jul 1, 2024

Streaming Service Upgrading In Milvus 2.5

Dependency Specification

Version 2.4 relies on the pub/sub capability of MQ for both reading and writing paths to support data persistence and querying of streaming data respectively.

Write path
image

Read path
image

TimeTick lifetime
image

In version 2.5, the pub/sub API is provided by StreamNode, and MQ reading and writing are encapsulated within StreamNode.
Write path
image

Read path
image

TimeTick lifetime
image
Significant changes in dependency order between versions 2.4 and 2.5 imply that the existing upgrade plan cannot meet the requirements.

Upgrade plan

. [Plan 1] Upgrade with downtime

  1. Stop writing on the client side.
  2. Execute flushAll to trigger flushing all data from MQ to disk
  3. Stop the 2.4 version cluster
  4. Start the 2.5 version cluster

[Plan 2] Upgrade with no downtime

  1. Upgrade MixCoord, including RootCoord, QueryCoord, DataCoord, StreamCoord, at this time:

    • In the 2.5 version, RootCoord still needs to execute the TimeTick logic
    • After the upgrade, there are no changes in the read and write paths compared to the 2.4 version
  2. Stop dataNode, the flush process will be terminated, Proxy can still accept all requests.

  3. Start StreamNode, each pchannel will be allocated to the stream node and is prepared for subscription by the stream node client at this point.

  4. Upgrade QueryNode, the new QueryNode will subscribe to vchannel with the stream node client, while the old QueryNode will continue to consume streaming from the MQ client.

  5. Upgrade Proxy, once all proxies are upgraded:

    • Stop sending TT logic on the RootCoord
    • Enable the insertion of data by the stream node client on the Proxy
  6. Stop IndexNode

Pros and cons:

  • We can remove all deprecated codes once taking plan 1, even using local WAL implementation instead of rocksmq directly.
  • Plan 2 will be smoother than Plan 1. However, if a query node rolling upgrade takes a while and a significant number of write requests are received during this period, the growing segments will consume more memory. Consequently, additional memory might be required for the QueryNode.

Version 2.5 servers as a transitional version to 3.0, now we can take plan2 to ensure a smooth upgrade from 2.4, making code cleanup after upgrading to 3.0.

@xiaofan-luan
Copy link
Collaborator

Make sure upgrade can be smoothly is very important.

To simply the work we need to do, maybe we can keep delegator at querynode and do not move it to stream service.

One problem is how many stream node need to upgrade and it's size.

To upgrade smoothly, streaming node need to assign timestamp and try to merge data from TTstream and Proxy insert.

in 2.5 we can keep delegator still at querynode. and move delegator to streaming node at 3.0

jaime0815 pushed a commit that referenced this issue Jul 2, 2024
issue: #33285

- optimize the message package
- add interceptor package to achieve append operation intercepting.
- add timetick interceptor to attach timetick properties for message.
- add timetick background task to send timetick message.

Signed-off-by: chyezh <[email protected]>
yellow-shine pushed a commit to yellow-shine/milvus that referenced this issue Jul 2, 2024
yellow-shine pushed a commit to yellow-shine/milvus that referenced this issue Jul 2, 2024
yellow-shine pushed a commit to yellow-shine/milvus that referenced this issue Jul 2, 2024
issue: milvus-io#33285

- use reader but not consumer for pulsar
- advanced test framework
- move some streaming related package into pkg

---------

Signed-off-by: chyezh <[email protected]>
yellow-shine pushed a commit to yellow-shine/milvus that referenced this issue Jul 2, 2024
issue: milvus-io#33285

- optimize the message package
- add interceptor package to achieve append operation intercepting.
- add timetick interceptor to attach timetick properties for message.
- add timetick background task to send timetick message.

Signed-off-by: chyezh <[email protected]>
jaime0815 pushed a commit that referenced this issue Jul 4, 2024
issue: #33285

- add adaptor to implement walimpls into wal interface.
- implement timetick sorted and filtering scanner.
- add test for wal.

---------

Signed-off-by: chyezh <[email protected]>
czs007 pushed a commit that referenced this issue Nov 21, 2024
issue: #33285
pr: #37722

- move most cgo opeartions related to search/query into segcore package
for reusing for streamingnode.
- add go unittest for segcore operations.

Signed-off-by: chyezh <[email protected]>
bigsheeper pushed a commit to bigsheeper/milvus that referenced this issue Nov 25, 2024
issue: milvus-io#33285
pr: milvus-io#37722

- move most cgo opeartions related to search/query into segcore package
for reusing for streamingnode.
- add go unittest for segcore operations.

Signed-off-by: chyezh <[email protected]>
sre-ci-robot pushed a commit that referenced this issue Nov 26, 2024
issue: #33285
pr: #37985

- Add switch for local rpc

---------

Signed-off-by: chyezh <[email protected]>
sre-ci-robot pushed a commit that referenced this issue Nov 26, 2024
issue: #33285

- Add switch for local rpc

---------

Signed-off-by: chyezh <[email protected]>
JsDove pushed a commit to JsDove/milvus that referenced this issue Nov 26, 2024
issue: milvus-io#33285

- Add switch for local rpc

---------

Signed-off-by: chyezh <[email protected]>
sre-ci-robot pushed a commit that referenced this issue Nov 29, 2024
issue: #33285

- move most cgo opeartions related to search/query into segcore package
for reusing for streamingnode.
- add go unittest for segcore operations.

Signed-off-by: chyezh <[email protected]>
czs007 pushed a commit that referenced this issue Dec 5, 2024
… or mixcoord (#38246)

issue: #33285
pr: #37815

- remove the rpc layer of coordinator when enabling standalone or
mixcoord
- move health check into init

---------

Signed-off-by: chyezh <[email protected]>
Copy link

stale bot commented Dec 20, 2024

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.
Rotten issues close after 30d of inactivity. Reopen the issue with /reopen.

@stale stale bot added the stale indicates no udpates for 30 days label Dec 20, 2024
@yanliang567 yanliang567 modified the milestones: 2.5.0, 2.5.1 Dec 24, 2024
@stale stale bot removed stale indicates no udpates for 30 days labels Dec 24, 2024
@yanliang567 yanliang567 modified the milestones: 2.5.1, 2.5.2 Dec 30, 2024
@yanliang567 yanliang567 modified the milestones: 2.5.2, 2.5.3 Jan 6, 2025
@yanliang567 yanliang567 modified the milestones: 2.5.3, 2.5.4 Jan 16, 2025
@yanliang567 yanliang567 modified the milestones: 2.5.4, 2.5.5 Jan 24, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature/streaming node streaming node feature kind/enhancement Issues or changes related to enhancement
Projects
None yet
Development

No branches or pull requests

5 participants