fix(meta): avoid removing control stream node when failed to send request #18767
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
I hereby agree to the terms of the RisingWave Labs, Inc. Contributor License Agreement.
What's changed and what's your intention?
Fix recovery test panic.
In
ControlStreamManager
, we always maintain that workers in fieldnodes
are consistent to workers inresponse_streams
. However, in methodremove_partial_graph
, we useretain
when trying to send request, and will remove the node when failing to send the request, but we didn't remove the corresponding stream inresponse_streams
, which causes inconsistency and causes subsequent panics. In this PR, we fix this inconsistency by storing the response_stream together with the request sender in theControlStreamNode
. Previously we have to introduce aFutureUnordered
to concurrently poll the several response streams. In this PR, we change to implement the concurrent poll logic on our own bypoll_fn
, so that we don't need to separate the request sender and response stream.Checklist
./risedev check
(or alias,./risedev c
)Documentation
Release note
If this PR includes changes that directly affect users or other significant modifications relevant to the community, kindly draft a release note to provide a concise summary of these changes. Please prioritize highlighting the impact these changes will have on users.