-
Notifications
You must be signed in to change notification settings - Fork 53
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Authenticate Libp2p connections #3534
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looking great so far. One suggestion
Co-authored-by: Jarred Parr <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks good logically and really happy that it's something that works. I had a few questions/ideas how we could maybe pair down the Transport impl. I'm most curious about having to do a bunch of stuff in poll
and if that can be removed
let mut message = vec![0u8; len]; | ||
stream | ||
.read_exact(&mut message) | ||
.await | ||
.with_context(|| "Failed to read message")?; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we fail gracefully here (return Error) if the len
encoded in the messages is incorrect. I.e. if messages says it's 8 bytes but is actually only 4 bytes will we just get `Error("failed to readd message")? same question if the len is 4 bytes but the message is 8, what happens?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If the message is longer and we only ask for the first 4 bytes, deserialization of the actual underlying message (in this case the auth_message
) will fail
If the message is shorter, we will time out after the 5 seconds and fail negotiation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great just wanted to be sure, that's what I expect to happen
) -> std::task::Poll<libp2p::core::transport::TransportEvent<Self::ListenerUpgrade, Self::Error>> | ||
{ | ||
match self.as_mut().project().inner.poll(cx) { | ||
Poll::Ready(event) => Poll::Ready(match event { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why is it not sufficient here to to just wrap the inner poll (and maybe do the event translation) and handle all the auth stuff in the dial
and dial_as_listener
functions?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The majority of the time poll
is the one dealing with the actual listening logic and handling new incoming connections. dial_as_listener
, while I never hit it in testing, is likely used for some sort of NATing
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh that's really weird/unexpected
let auth_upgrade = Box::pin(async move { | ||
// Perform the inner upgrade | ||
let mut stream = upgrade.await?; | ||
|
||
// Time out the authentication block | ||
async_timeout(AUTH_HANDSHAKE_TIMEOUT, async { | ||
// Open a substream for the handshake | ||
let mut substream = | ||
poll_fn(|cx| stream.as_connection().poll_inbound_unpin(cx)).await?; | ||
|
||
// (inbound) Verify the remote peer's authentication | ||
verify_peer_authentication( | ||
&mut substream, | ||
stake_table, | ||
stream.as_peer_id(), | ||
) | ||
.await | ||
.map_err(|e| { | ||
warn!("Failed to verify remote peer: {:?}", e); | ||
std::io::Error::new(std::io::ErrorKind::Other, e) | ||
})?; | ||
|
||
// (outbound) Authenticate with the remote peer | ||
authenticate_with_remote_peer(&mut substream, auth_message) | ||
.await | ||
.map_err(|e| { | ||
warn!("Failed to authenticate with remote peer: {:?}", e); | ||
std::io::Error::new(std::io::ErrorKind::Other, e) | ||
})?; | ||
|
||
Ok::<(), T::Error>(()) | ||
}) | ||
.await | ||
.map_err(|e| { | ||
warn!("Timed out performing authentication handshake: {:?}", e); | ||
std::io::Error::new(std::io::ErrorKind::TimedOut, e) | ||
})??; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is the same as the code in dial_as_listener
right? can we just make this a function or remove it here as I indicated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah it is similar, let me see if I can make it less duplicated
authenticate_with_remote_peer(&mut substream, auth_message) | ||
.await | ||
.map_err(|e| { | ||
warn!("Failed to authenticate with remote peer: {:?}", e); | ||
std::io::Error::new(std::io::ErrorKind::Other, e) | ||
})?; | ||
|
||
// (inbound) Verify the remote peer's authentication | ||
verify_peer_authentication(&mut substream, stake_table, stream.as_peer_id()) | ||
.await | ||
.map_err(|e| { | ||
warn!("Failed to verify remote peer: {:?}", e); | ||
std::io::Error::new(std::io::ErrorKind::Other, e) | ||
})?; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure how the substream
here works but we effectively just always do things in this order (write our auth message, then wait for the peers and verify it) or does it some how mess up? Just thinking that this logic is the same everywhere and would be nice if it were just one function called like `auth_with_peer(stream
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This one is actually the only one in this order, the others perform the reverse. We want to synchronously wait and make the handshake order explicit because if the peer is not actively receiving our message it will fail (e.g. if they are both sending data)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok that's what I was afraid of. I knew this one was the opposite order of the listening side so I wanted to be sure it had to be that order. My thought was that sending could always happen first since both nodes need to get the auth message. I was thinking they could both send at the same time, then both receive after sending but I guess one send might fail in that case
pub stake_table: Arc<Option<HashSet<S>>>, | ||
|
||
/// A pre-signed message that we send to the remote peer for authentication | ||
pub auth_message: Arc<Option<Vec<u8>>>, | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it Optional because of upgrading, or some other reason?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah it is for upgrading, but I am open to making it non-optional
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Long run non optional what we want, but for now it makes sense
I can approve after you do the de-duplication |
Done! Hopefully this made things look a little nicer |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good to me, it's a little confusing to me with all the trait functions (e.g. dial
) returning and passing in futures but we can't change that
Closes #3490
This PR:
Implements authentication against the stake table for Libp2p connections. It does so by introducing a mutual, bidirectional authentication handshake when a new connection is established.
Here's how it works:
PeerId
.PeerId
is then checked against the remote peer'sPeerId
, which Libp2p has already negotiated.The authentication message is pre-signed to minimize scope of the private key in networking segments of the code. The mitigation for replay attacks is the above
PeerId
comparison.It also:
This PR does not:
Key places to review:
stake_table_transport.rs
- almost all of the logic is hereHow to test this PR:
Since the examples use
from_config
, you can test in the examples. Try signing the wrong thing and see what happens.