Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Refactor Connection to a synchronous state machine #142

Merged
merged 41 commits into from
Nov 25, 2022

Conversation

thomaseizinger
Copy link
Contributor

@thomaseizinger thomaseizinger commented Oct 8, 2022

This PR refactors Connection to a synchronous state machine.

It retains the Control API but layers it completely on top of the existing Connection. This allows us to do this refactoring without touching any of the tests. In a future step, we can port all the tests to the new poll-based API and potentially remove the Control API.

All commits:

  • Are properly formatted
  • Have no clippy warnings
  • Pass the test suite

It should be possible to review this PR patch-by-patch. You may find it easier though to first have a look at the end-result by checking out the code and navigating around to see how things work now.

It would be great if we could merge #145 first. That would allow us to share a few bits between the two test suites.

@thomaseizinger thomaseizinger changed the title Refactor towards poll-style code Refactor Connection to a synchronous state machine Oct 12, 2022
@mxinden
Copy link
Member

mxinden commented Oct 17, 2022

I am sorry for the delay here. I won't get to reviewing this until after libp2p day.

  • Remove Control API?

In my eyes, that would be a great simplification.

@thomaseizinger
Copy link
Contributor Author

I am sorry for the delay here. I won't get to reviewing this until after libp2p day.

No worries.

  • Remove Control API?

In my eyes, that would be a great simplification.

Cool!

I am thinking of making a separate PR that adds more tests so I can refactor with a bit more confidence.

Once that is merged, we could perhaps also remove the control API in this PR? It would make some of the internals vastly simpler. Plus, anyone can build a control style API on top of one with multiple poll functions at any point.

@thomaseizinger
Copy link
Contributor Author

On top of being (hopefully) easier to understand, the new implementation proves to be a lot more performant too!

For all slow networks, there is no change but as we get to more high-speed networks, we are seeing improvements of up to 38%!

image

@thomaseizinger
Copy link
Contributor Author

I've rebased on top of master to allow for an easier patch-by-patch review!

@thomaseizinger
Copy link
Contributor Author

Let me know if you disagree with the workspace structure.

Copy link
Member

@mxinden mxinden left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Direction looks good to me. Couple of comments, nothing big.

In my eyes this pull request combines many unrelated changes. Thus ideally this would be split into many pull requests. That said, I think it is fine to proceed here, especially as this repository is not very active and thus conflicts are not probable.

yamux/src/lib.rs Outdated Show resolved Hide resolved
yamux/src/connection/cleanup.rs Show resolved Hide resolved
src/connection.rs Outdated Show resolved Hide resolved
src/connection.rs Outdated Show resolved Hide resolved
Comment on lines +423 to +426
match self.socket.poll_flush_unpin(cx)? {
Poll::Ready(()) => {}
Poll::Pending => {}
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am wondering whether we should flush so early in the poll method, or whether it shouldn't be one of the last actions. Rational being that frequent flushing hurts performance, especially in case one can increase the batch instead.

Just a thought. Needs more thought and potentially data to back it up.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I couldn't find any consistent performance improvement in my benchmarks when moving this block of code up or down.

However, this got me thinking: We do we communicate via channels between the connection and the stream for writing but we use shared buffers when reading? We could just as easily have a buffer of frames in Shared and wake the Connection whenever we write any frames to that. This would allow us to drain the buffer of all streams in one go, without having to receive individual frames over a channel.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I couldn't find any consistent performance improvement in my benchmarks when moving this block of code up or down.

Thanks for testing. Let's keep as is.

However, this got me thinking: We do we communicate via channels between the connection and the stream for writing but we use shared buffers when reading? We could just as easily have a buffer of frames in Shared and wake the Connection whenever we write any frames to that. This would allow us to drain the buffer of all streams in one go, without having to receive individual frames over a channel.

I am decisive whether the connection should communicate with the stream via a channel or via plain Mutex and Waker. Whatever change we want to make, I think it should not happen within this pull request.

src/connection.rs Outdated Show resolved Hide resolved
src/connection.rs Outdated Show resolved Hide resolved
src/pause.rs Show resolved Hide resolved
Comment on lines 13 to 21
/// A [`Future`] that gracefully closes the yamux connection.
#[must_use]
pub struct Closing<T> {
state: State,
control_receiver: mpsc::Receiver<ControlCommand>,
stream_receiver: mpsc::Receiver<StreamCommand>,
pending_frames: VecDeque<Frame<()>>,
socket: Fuse<frame::Io<T>>,
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Question not suggestion: Why deliberately implement this as a state machine instead of a procedural async/await?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So that it can be named without boxing it up.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What would be the drawback of boxing it?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What would be the drawback of boxing it?

  • Performance
  • We have to decide whether we add the Send bound. In the current design, we get to delete the YamuxLocal stuff in rust-libp2p because the Send bound is inferred.

I don't feel strongly about either but it felt like a nice improvement as I went along. Once we get "impl Trait in type-alias" in Rust at least the boxing would go away.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What would be the drawback of boxing it?

* Performance

Is there any benchmark proving this? Is performance relevant when closing a connection?

We have to decide whether we add the Send bound. In the current design, we get to delete the YamuxLocal stuff in rust-libp2p because the Send bound is inferred.

The infectious-send-when-boxing problem is reason enough to not box in my eyes 👍

@thomaseizinger
Copy link
Contributor Author

In my eyes this pull request combines many unrelated changes. Thus ideally this would be split into many pull requests. That said, I think it is fine to proceed here, especially as this repository is not very active and thus conflicts are not probable.

I agree that the size is not ideal. I did however find it quite difficult to refactor this piece-wise into something that can be merged independently without leaving master in a weird state from a design perspective.

Best I could do was to make small commits but I don't think a particular subset of those is worth merging independently :)

Copy link
Member

@mxinden mxinden left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It retains the Control API but layers it completely on top of the existing Connection. This allows us to do this refactoring without touching any of the tests. In a future step, we can port all the tests to the new poll-based API and potentially remove the Control API.

I suggest we deprecate or remove the Control API in a follow-up pull request. What do you think @thomaseizinger?


This is a large change potentially resulting in subtle changes in behavior breaking upper layers. What is the best strategy to test this patch in the wild? I suggest we ask community members to run this in their production environments. Should we cut an alpha release for it, or rather have them test based on a hash?

@thomaseizinger
Copy link
Contributor Author

It retains the Control API but layers it completely on top of the existing Connection. This allows us to do this refactoring without touching any of the tests. In a future step, we can port all the tests to the new poll-based API and potentially remove the Control API.

I suggest we deprecate or remove the Control API in a follow-up pull request. What do you think @thomaseizinger?

Yes, the tests need refactoring before we can remove it.

This is a large change potentially resulting in subtle changes in behavior breaking upper layers. What is the best strategy to test this patch in the wild? I suggest we ask community members to run this in their production environments. Should we cut an alpha release for it, or rather have them test based on a hash?

I'd suggest:

  1. Merge this
  2. Release yamux
  3. Update libp2p-yamux on a branch based on the 0.49 release
  4. Cut an alpha for libp2p-yamux
  5. Have people drop that alpha into their dependency-tree. As long as they are on 0.49, that should make things compatible and there are no behaviour changes other than this patch.

@thomaseizinger
Copy link
Contributor Author

I've bumped the version and changelog. I am intending to merge this in the upcoming days.

@mxinden
Copy link
Member

mxinden commented Nov 24, 2022

I'd suggest:

1. Merge this

2. Release `yamux`

3. Update `libp2p-yamux` on a branch based on the 0.49 release

4. Cut an alpha for `libp2p-yamux`

5. Have people drop that alpha into their dependency-tree. As long as they are on 0.49, that should make things compatible and there are no behaviour changes other than this patch.

Sounds good to me.

I've bumped the version and changelog. I am intending to merge this in the upcoming days.

👍 I can cut a release right after.

@thomaseizinger thomaseizinger merged commit e59d8a5 into libp2p:master Nov 25, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants