Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Resolve open Peer Starving TODO #6344

Closed
1 task
Tracked by #6277
mpguerra opened this issue Mar 16, 2023 · 5 comments · Fixed by #6488
Closed
1 task
Tracked by #6277

Resolve open Peer Starving TODO #6344

mpguerra opened this issue Mar 16, 2023 · 5 comments · Fixed by #6488
Assignees
Labels
A-network Area: Network protocol updates or fixes C-audit Category: Issues arising from audit findings

Comments

@mpguerra
Copy link
Contributor

mpguerra commented Mar 16, 2023

Details

The following open TODO item from src/peer/connection.rs is noted:

// CORRECTNESS
//
// Currently, select prefers the first future if multiple
// futures are ready.
//
// The peer can starve client requests if it sends an
// uninterrupted series of messages. But this is unlikely in
// practice, due to network delays.
//
// If both futures are ready, there's no particular reason
// to prefer one over the other.
//
// TODO: use `futures::select!`, which chooses a ready future
// at random, avoiding starvation
// (To use `select!`, we'll need to map the different
// results to a new enum types.)

This would seem to be an important item to prioritize for resolution. While NCC Group concurs that it is unlikely in practice, it nevertheless could compound with other factors to strengthen network-level attacks. It is further noted that in the case of attackers with extremely influential positions on the network, mitigating factors such as network latency
may have less of a mitigating influence than originally expected.

It is noted that if the great majority of outstanding messages originate from attackers, then randomly chosen messages will likely be attacker-sent, and will still be prioritized over honest ones; it may be preferable to respond to requests in the order they are received, in order to ensure all requests are (eventually) answered. This may result in worse behavior in
the case where the incoming message rate exceeds the message processing rate and a backlog develops (since in this case response times would be monotonically increasing); however, the solution to this is likely not random choice but rather rate-limiting.

Resolution

  • Update existing comment to specify that the current existing behaviour is what is desired
@mpguerra mpguerra added this to Zebra Mar 16, 2023
@github-project-automation github-project-automation bot moved this to 🆕 New in Zebra Mar 16, 2023
@mpguerra mpguerra added S-needs-triage Status: A bug report needs triage P-Optional ✨ A-network Area: Network protocol updates or fixes C-audit Category: Issues arising from audit findings labels Mar 16, 2023
@teor2345
Copy link
Contributor

teor2345 commented Mar 21, 2023

This TODO is about the prioritisation of a single peer's messages:

  • outbound requests from Zebra services to each individual peer
  • inbound requests from the peer to Zebra

It is noted that if the great majority of outstanding messages originate from attackers, then randomly chosen messages will likely be attacker-sent, and will still be prioritized over honest ones; it may be preferable to respond to requests in the order they are received, in order to ensure all requests are (eventually) answered.

This is what we do in the existing implementation: requests from different peers are processed concurrently, multiple requests from the same peer are processed in order.

This may result in worse behavior in the case where the incoming message rate exceeds the message processing rate and a backlog develops (since in this case response times would be monotonically increasing); however, the solution to this is likely not random choice but rather rate-limiting.

In the existing implementation, if a peer does this, it will fill its individual message pipeline, and its keepalive or other
messages will time out, so Zebra will disconnect that peer.

Messages to other peers will continue to be processed concurrently, but individual services might be delayed slightly while the peer times out, if a request to that peer is sent by the service, and the service blocks until the request completes (or times out).

@teor2345
Copy link
Contributor

teor2345 commented Mar 29, 2023

I think the current behaviour is what we want, and we should update the TODO comment to say that.

The reverse behaviour would be worse, because a peer sending an endless stream of messages would never time out its keep alive task.

@teor2345
Copy link
Contributor

teor2345 commented Apr 3, 2023

There's also another reason to prefer the current implementation: If an inbound peer message arrives at a ready peer that is just being sent a request from Zebra, we want to process the peer's message first.

If we process the Zebra request first:

  • we could misinterpret the inbound peer message as a response when we process it
  • if we put the peer in the "awaiting response" state if the peer message is a request to Zebra, then we'll correctly ignore the simultaneous Zebra request (Zebra services make multiple requests or do retries, so this is ok)

@teor2345 teor2345 self-assigned this Apr 10, 2023
@mpguerra
Copy link
Contributor Author

Sounds like the fix for this is to just update the comment to remove the TODO and justify the current behaviour?

@mpguerra
Copy link
Contributor Author

@teor2345 can you please add a size estimate for this issue?

@mpguerra mpguerra removed the S-needs-triage Status: A bug report needs triage label Apr 19, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-network Area: Network protocol updates or fixes C-audit Category: Issues arising from audit findings
Projects
Archived in project
Development

Successfully merging a pull request may close this issue.

2 participants