Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Authenticate Libp2p connections #3534

Merged
merged 15 commits into from
Aug 7, 2024
Merged

Authenticate Libp2p connections #3534

merged 15 commits into from
Aug 7, 2024

Conversation

rob-maron
Copy link
Collaborator

@rob-maron rob-maron commented Aug 2, 2024

Closes #3490

This PR:

Implements authentication against the stake table for Libp2p connections. It does so by introducing a mutual, bidirectional authentication handshake when a new connection is established.

Here's how it works:

  • On a new connection, we require the peer to send a signed message of both their stake table public key and their PeerId.
  • This message is first checked for a proper signature, and then the public key is checked against the stake table. To prevent replay attacks, the signed PeerId is then checked against the remote peer's PeerId, which Libp2p has already negotiated.
  • The handshake flow depends on whether or not you were the connection initiator. This allows both the connecting peer and the accepting peer to "agree" on the proper flow, but both will authenticate with each other.

The authentication message is pre-signed to minimize scope of the private key in networking segments of the code. The mitigation for replay attacks is the above PeerId comparison.

It also:

  • Has sane timeouts and a max message size defaults to prevent DoS attacks at this level
  • Breaks interoperability with older versions of HS. We may end up doing an upgrade path where it only begins checking after a certain view, but I haven't decided

This PR does not:

  • Implement DHT authentication

Key places to review:

stake_table_transport.rs - almost all of the logic is here

How to test this PR:

Since the examples use from_config, you can test in the examples. Try signing the wrong thing and see what happens.

Copy link
Contributor

@jparr721 jparr721 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking great so far. One suggestion

crates/libp2p-networking/src/network/transport.rs Outdated Show resolved Hide resolved
@rob-maron rob-maron marked this pull request as ready for review August 5, 2024 18:58
@lukaszrzasik lukaszrzasik self-assigned this Aug 6, 2024
Copy link
Collaborator

@bfish713 bfish713 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks good logically and really happy that it's something that works. I had a few questions/ideas how we could maybe pair down the Transport impl. I'm most curious about having to do a bunch of stuff in poll and if that can be removed

Comment on lines 508 to 512
let mut message = vec![0u8; len];
stream
.read_exact(&mut message)
.await
.with_context(|| "Failed to read message")?;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we fail gracefully here (return Error) if the len encoded in the messages is incorrect. I.e. if messages says it's 8 bytes but is actually only 4 bytes will we just get `Error("failed to readd message")? same question if the len is 4 bytes but the message is 8, what happens?

Copy link
Collaborator Author

@rob-maron rob-maron Aug 6, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If the message is longer and we only ask for the first 4 bytes, deserialization of the actual underlying message (in this case the auth_message) will fail

If the message is shorter, we will time out after the 5 seconds and fail negotiation

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great just wanted to be sure, that's what I expect to happen

) -> std::task::Poll<libp2p::core::transport::TransportEvent<Self::ListenerUpgrade, Self::Error>>
{
match self.as_mut().project().inner.poll(cx) {
Poll::Ready(event) => Poll::Ready(match event {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is it not sufficient here to to just wrap the inner poll (and maybe do the event translation) and handle all the auth stuff in the dial and dial_as_listener functions?

Copy link
Collaborator Author

@rob-maron rob-maron Aug 6, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The majority of the time poll is the one dealing with the actual listening logic and handling new incoming connections. dial_as_listener, while I never hit it in testing, is likely used for some sort of NATing

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh that's really weird/unexpected

Comment on lines 358 to 394
let auth_upgrade = Box::pin(async move {
// Perform the inner upgrade
let mut stream = upgrade.await?;

// Time out the authentication block
async_timeout(AUTH_HANDSHAKE_TIMEOUT, async {
// Open a substream for the handshake
let mut substream =
poll_fn(|cx| stream.as_connection().poll_inbound_unpin(cx)).await?;

// (inbound) Verify the remote peer's authentication
verify_peer_authentication(
&mut substream,
stake_table,
stream.as_peer_id(),
)
.await
.map_err(|e| {
warn!("Failed to verify remote peer: {:?}", e);
std::io::Error::new(std::io::ErrorKind::Other, e)
})?;

// (outbound) Authenticate with the remote peer
authenticate_with_remote_peer(&mut substream, auth_message)
.await
.map_err(|e| {
warn!("Failed to authenticate with remote peer: {:?}", e);
std::io::Error::new(std::io::ErrorKind::Other, e)
})?;

Ok::<(), T::Error>(())
})
.await
.map_err(|e| {
warn!("Timed out performing authentication handshake: {:?}", e);
std::io::Error::new(std::io::ErrorKind::TimedOut, e)
})??;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is the same as the code in dial_as_listener right? can we just make this a function or remove it here as I indicated

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah it is similar, let me see if I can make it less duplicated

Comment on lines 251 to 264
authenticate_with_remote_peer(&mut substream, auth_message)
.await
.map_err(|e| {
warn!("Failed to authenticate with remote peer: {:?}", e);
std::io::Error::new(std::io::ErrorKind::Other, e)
})?;

// (inbound) Verify the remote peer's authentication
verify_peer_authentication(&mut substream, stake_table, stream.as_peer_id())
.await
.map_err(|e| {
warn!("Failed to verify remote peer: {:?}", e);
std::io::Error::new(std::io::ErrorKind::Other, e)
})?;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure how the substream here works but we effectively just always do things in this order (write our auth message, then wait for the peers and verify it) or does it some how mess up? Just thinking that this logic is the same everywhere and would be nice if it were just one function called like `auth_with_peer(stream

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This one is actually the only one in this order, the others perform the reverse. We want to synchronously wait and make the handshake order explicit because if the peer is not actively receiving our message it will fail (e.g. if they are both sending data)

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok that's what I was afraid of. I knew this one was the opposite order of the listening side so I wanted to be sure it had to be that order. My thought was that sending could always happen first since both nodes need to get the auth message. I was thinking they could both send at the same time, then both receive after sending but I guess one send might fail in that case

Comment on lines 45 to 49
pub stake_table: Arc<Option<HashSet<S>>>,

/// A pre-signed message that we send to the remote peer for authentication
pub auth_message: Arc<Option<Vec<u8>>>,

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it Optional because of upgrading, or some other reason?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah it is for upgrading, but I am open to making it non-optional

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Long run non optional what we want, but for now it makes sense

@bfish713
Copy link
Collaborator

bfish713 commented Aug 6, 2024

I can approve after you do the de-duplication

@rob-maron
Copy link
Collaborator Author

I can approve after you do the de-duplication

Done! Hopefully this made things look a little nicer

@rob-maron rob-maron requested a review from bfish713 August 7, 2024 13:16
Copy link
Collaborator

@bfish713 bfish713 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me, it's a little confusing to me with all the trait functions (e.g. dial ) returning and passing in futures but we can't change that

@rob-maron rob-maron merged commit 8020807 into main Aug 7, 2024
35 checks passed
@rob-maron rob-maron deleted the rm/authenticate-libp2p branch August 7, 2024 17:14
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[Libp2p] Authentication
4 participants