-
Notifications
You must be signed in to change notification settings - Fork 144
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add SDK Router message handling #316
Conversation
b8dcf99
to
8943b79
Compare
99fac4c
to
d448bbc
Compare
d448bbc
to
7f8afae
Compare
7f8afae
to
91c34d2
Compare
assert.ErrorIs(t, p2p.ErrUnrequestedResponse, clientNetwork.AppResponse(context.Background(), nodeID, requestID, gossipMsg)) | ||
assert.ErrorIs(t, p2p.ErrUnrequestedResponse, clientNetwork.AppResponse(context.Background(), nodeID, requestID, requestMessage)) | ||
assert.ErrorIs(t, p2p.ErrUnrequestedResponse, clientNetwork.AppResponse(context.Background(), nodeID, requestID, garbageResponse)) | ||
assert.ErrorIs(t, p2p.ErrUnrequestedResponse, clientNetwork.AppResponse(context.Background(), nodeID, requestID, emptyResponse)) | ||
assert.ErrorIs(t, p2p.ErrUnrequestedResponse, clientNetwork.AppResponse(context.Background(), nodeID, requestID, nilResponse)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This test was written to ensure that an invalid message NEVER triggers an unintentional fatal error, so it seems a bit weird to change it in this way.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah I was unsure if it made more sense to just remove these test cases or just test for the sdk error. I guess now the property that we have is that invalid messages are always forwarded into the router, so we can either check for this error, get rid these tests, or write a Router
interface that p2p.Router
implements but maybe that's overkill.
peer/network.go
Outdated
@@ -365,15 +365,16 @@ func (n *network) AppRequest(ctx context.Context, nodeID ids.NodeID, requestID u | |||
// If the response handler returns an error it is propagated as a fatal error. | |||
func (n *network) AppResponse(ctx context.Context, nodeID ids.NodeID, requestID uint32, response []byte) error { | |||
n.lock.Lock() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
since closed
is atomic, could we move this lock to directly before
handler, exists := n.markRequestFulfilled(requestID)
?
Should we move it to inside markRequestFulfilled
instead?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah that makes much more sense than what I was currently doing.
peer/network.go
Outdated
n.lock.Lock() | ||
defer n.lock.Unlock() | ||
|
||
func (n *network) AppResponse(ctx context.Context, nodeID ids.NodeID, requestID uint32, response []byte) error { | ||
if n.closed.Get() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a very subtle change. So we should be terrified to make it.
I believe we must hold the lock when checking for n.closed
on all of the inbound response
+ requestFailed
flows to avoid a potential panic due to a write to a closed channel (or closing a channel twice).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As discussed offline I'll see if I can make a separate regression test for this invariant as a follow-up to this PR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I documented the invariant on n.closed
as well 👍
peer/network.go
Outdated
|
||
if n.closed.Get() { | ||
n.lock.Unlock() | ||
return nil |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are we okay with dropping responses (and timeouts) that should have been sent to the SDK router after we close the network?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we are not okay with this - then I think we'll need to be fairly careful around what gets passed through to the router.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We chatted offline but agreed it was cleanest to leave the Router
code as-is, and just drop the closed
check on the server-end because we're guaranteed that the outstanding request is empty on shutdown because we empty it on Shutdown
and stop sending requests after the flag is set.
* Add SDK Router message handling (#316) Co-authored-by: Stephen Buttolph <[email protected]> * revert avago version bump * Fix hanging requests after Shutdown (#326) (#859) * Fix hanging requests after Shutdown (#326) * fix requests hanging after shutdown * fix build --------- Signed-off-by: Stephen Buttolph <[email protected]> Co-authored-by: Stephen Buttolph <[email protected]> * Bump avago rc (#860) * Update to 1.10.10-rc.2 (#328) * update to avalanchego 1.10.10-rc.2 * nits * nit * add batchsize * increase timeout dynamically --------- Co-authored-by: Joshua Kim <[email protected]> --------- ---------
* Add SDK Router message handling (#316) Co-authored-by: Stephen Buttolph <[email protected]> * Fix hanging requests after Shutdown (#326) * fix requests hanging after shutdown * fix build --------- Signed-off-by: Stephen Buttolph <[email protected]> Co-authored-by: Stephen Buttolph <[email protected]> * Update to 1.10.10-rc.2 (#328) * update to avalanchego 1.10.10-rc.2 * nits * nit * Add P2P SDK Pull Gossip (#318) * add batchsize * sync changes * Drop outbound gossip for non vdrs (#862) * Drop outbound gossip requests for non-validators (#334) * drop outbound gossip requests for non validators * nit * nit * sync changes --------- Co-authored-by: Joshua Kim <[email protected]> --------- Co-authored-by: Joshua Kim <[email protected]>
Why this should be merged
Adds the P2P SDK's router to handle incoming sdk messages
How this works
Forwards messages that fail unmarshaling against the codec into the SDK router
How this was tested
Added a unit test