-
Notifications
You must be signed in to change notification settings - Fork 43
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Refactor Connection
to a synchronous state machine
#142
Conversation
bbd360a
to
e41e838
Compare
poll
-style codeConnection
to a synchronous state machine
I am sorry for the delay here. I won't get to reviewing this until after libp2p day.
In my eyes, that would be a great simplification. |
No worries.
Cool! I am thinking of making a separate PR that adds more tests so I can refactor with a bit more confidence. Once that is merged, we could perhaps also remove the control API in this PR? It would make some of the internals vastly simpler. Plus, anyone can build a control style API on top of one with multiple poll functions at any point. |
272ddf9
to
33c7683
Compare
9500fb2
to
598a0aa
Compare
598a0aa
to
b54df4c
Compare
We can do this by having a centralised place to send messages and shoving them into this buffer in all other places.
To reduce the duplication we split out a test harness.
This allows us to have a proper place for our test-harness.
b54df4c
to
a0ba23b
Compare
I've rebased on top of master to allow for an easier patch-by-patch review! |
Let me know if you disagree with the workspace structure. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Direction looks good to me. Couple of comments, nothing big.
In my eyes this pull request combines many unrelated changes. Thus ideally this would be split into many pull requests. That said, I think it is fine to proceed here, especially as this repository is not very active and thus conflicts are not probable.
match self.socket.poll_flush_unpin(cx)? { | ||
Poll::Ready(()) => {} | ||
Poll::Pending => {} | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am wondering whether we should flush
so early in the poll
method, or whether it shouldn't be one of the last actions. Rational being that frequent flushing hurts performance, especially in case one can increase the batch instead.
Just a thought. Needs more thought and potentially data to back it up.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I couldn't find any consistent performance improvement in my benchmarks when moving this block of code up or down.
However, this got me thinking: We do we communicate via channels between the connection and the stream for writing but we use shared buffers when reading? We could just as easily have a buffer of frames in Shared
and wake the Connection
whenever we write any frames to that. This would allow us to drain the buffer of all streams in one go, without having to receive individual frames over a channel.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I couldn't find any consistent performance improvement in my benchmarks when moving this block of code up or down.
Thanks for testing. Let's keep as is.
However, this got me thinking: We do we communicate via channels between the connection and the stream for writing but we use shared buffers when reading? We could just as easily have a buffer of frames in
Shared
and wake theConnection
whenever we write any frames to that. This would allow us to drain the buffer of all streams in one go, without having to receive individual frames over a channel.
I am decisive whether the connection should communicate with the stream via a channel or via plain Mutex
and Waker
. Whatever change we want to make, I think it should not happen within this pull request.
src/connection/closing.rs
Outdated
/// A [`Future`] that gracefully closes the yamux connection. | ||
#[must_use] | ||
pub struct Closing<T> { | ||
state: State, | ||
control_receiver: mpsc::Receiver<ControlCommand>, | ||
stream_receiver: mpsc::Receiver<StreamCommand>, | ||
pending_frames: VecDeque<Frame<()>>, | ||
socket: Fuse<frame::Io<T>>, | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Question not suggestion: Why deliberately implement this as a state machine instead of a procedural async/await
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So that it can be named without boxing it up.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What would be the drawback of boxing it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What would be the drawback of boxing it?
- Performance
- We have to decide whether we add the
Send
bound. In the current design, we get to delete theYamuxLocal
stuff inrust-libp2p
because theSend
bound is inferred.
I don't feel strongly about either but it felt like a nice improvement as I went along. Once we get "impl Trait in type-alias" in Rust at least the boxing would go away.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What would be the drawback of boxing it?
* Performance
Is there any benchmark proving this? Is performance relevant when closing a connection?
We have to decide whether we add the
Send
bound. In the current design, we get to delete theYamuxLocal
stuff inrust-libp2p
because theSend
bound is inferred.
The infectious-send-when-boxing problem is reason enough to not box in my eyes 👍
I agree that the size is not ideal. I did however find it quite difficult to refactor this piece-wise into something that can be merged independently without leaving master in a weird state from a design perspective. Best I could do was to make small commits but I don't think a particular subset of those is worth merging independently :) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It retains the Control API but layers it completely on top of the existing Connection. This allows us to do this refactoring without touching any of the tests. In a future step, we can port all the tests to the new poll-based API and potentially remove the Control API.
I suggest we deprecate or remove the Control
API in a follow-up pull request. What do you think @thomaseizinger?
This is a large change potentially resulting in subtle changes in behavior breaking upper layers. What is the best strategy to test this patch in the wild? I suggest we ask community members to run this in their production environments. Should we cut an alpha
release for it, or rather have them test based on a hash?
Yes, the tests need refactoring before we can remove it.
I'd suggest:
|
I've bumped the version and changelog. I am intending to merge this in the upcoming days. |
Sounds good to me.
👍 I can cut a release right after. |
This PR refactors
Connection
to a synchronous state machine.It retains the
Control
API but layers it completely on top of the existingConnection
. This allows us to do this refactoring without touching any of the tests. In a future step, we can port all the tests to the newpoll
-based API and potentially remove theControl
API.All commits:
It should be possible to review this PR patch-by-patch. You may find it easier though to first have a look at the end-result by checking out the code and navigating around to see how things work now.
It would be great if we could merge #145 first. That would allow us to share a few bits between the two test suites.