Security: Limit reconnection rate to each individual peer address #1848

teor2345 · 2021-03-05T02:04:27Z

Is your feature request related to a problem? Please describe.

Zebra doesn't limit its outbound connection rate to individual Failed peers. This is a distributed denial of service risk.

Describe the solution you'd like

This fix depends on #1849.

filter the connection peers by recent attempt, success or failed times
- if all those times are None, the peer is NeverAttempted..., and it's safe to connect to it
rename fields and functions, and update documentation to say that we're filtering Responded and Failed peers with a cutoff time

Describe alternatives you've considered

This is a critical security issue, so we must do something.

We could attempt to limit the connection rate in other parts of the network stack, but that risks future refactors or bugs causing a DDoS.

Additional context

zcashd does not have this issue.

The text was updated successfully, but these errors were encountered:

Track multiple last used times for each peer: - Add separate untrusted_last_seen, attempt, success, and failed time fields (#1868, #1876, #1848) - Add the new fields to the peer states, so they only appear in states where they are valid - Insert initial DNS seeder peers in the AddressBook in the correct states Create a new MetaAddrChange type for AddressBook changes: - Ignore invalid state changes - Ignore updates to the untrusted last seen time (but update the services field) - If we get a gossiped or alternate change for a seed peer, use the last seen and services info - Once a peer has responded, don't go back to the NeverResponded... states - Update the address book metrics - Optimise getting the next connection address from the address book

Track multiple last used times for each peer: - Add separate untrusted_last_seen, attempt, success, and failed time fields (#1868, #1876, #1848) - Add the new fields to the peer states, so they only appear in states where they are valid - Insert initial seed peers in the AddressBook in the correct states Create a new MetaAddrChange type for AddressBook changes: - Ignore invalid state changes - Ignore updates to the untrusted last seen time (but update the services field) - If we get a gossiped or alternate change for a seed peer, use the last seen and services info - Once a peer has responded, don't go back to the NeverResponded... states - Update the address book metrics - Optimise getting the next connection address from the address book

This fix prevents hangs and deadlocks during initialization, particularly when there are a small number of valid peers in the initial peer config (or from the DNS seeders). Security: Correctly handle the minimum peer connection interval Previously, if we hadn't had a connection for a while, we'd allow a lot of connections all at once, until we'd caught up. Security: sleep MIN_PEER_CONNECTION_INTERVAL between initial handshakes This prevents denial of service if the local network is constrained, and the seeders return a large number of peers. Only wait for ready handshakes Drain all waiting handshakes when enough have succeeded Refactor MetaAddr to enable security fixes Track multiple last used times for each peer: - Add separate untrusted_last_seen, attempt, success, and failed time fields (#1868, #1876, #1848) - Add the new fields to the peer states, so they only appear in states where they are valid - Insert initial seed peers in the AddressBook in the correct states Create a new MetaAddrChange type for AddressBook changes: - Ignore invalid state changes - Ignore updates to the untrusted last seen time (but update the services field) - If we get a gossiped or alternate change for a seed peer, use the last seen and services info - Once a peer has responded, don't go back to the NeverResponded... states - Update the address book metrics - Optimise getting the next connection address from the address book Do an extra crawl for each handshake on startup And whenever there aren't many recently live peers. Remove duplicate initial crawl code This change uses the candidate set for initial seed peers, gossiped peers, and alternate peers. It significantly reduces the complexity of the initialization code. (By about 200 lines.) Apply readiness timeout to each fanout Also get the fanout limit from the number of recently live peers. Launch each CandidateSet fanout in its own task Spawn each `CandidateSet::update` in its own task Move `CandidateSet::next` into the handshake task Move all crawler awaits and threaded locks into spawned tasks In this commit: - Move sending PeerSet changes into a spawned task - Move the locking in `CandidateSet::report_failed` into a spawned task Increase the peer set buffer size for concurrent fanouts Launch sync fanouts concurrently, with peer set readiness timeouts Wait for seed peers before the first crawl WIP: Add a timeout to crawl addr requests This is a workaround for a zcashd response rate-limit. Move AddressBook::lock() onto a blocking thread Process all ready timestamp changes each time the task runs Wait for the initial crawl before launching the syncer Security: Limit unverified blocks to avoid memory DoS Also document the security implications of changing these limits. Drop early inbound requests to avoid load shedding during network setup Stop closing connections when the inbound service is overloaded SECURITY: Make buffer sizes dynamically depend on the config This change significantly increases the inbound buffer size, increasing memory denial of service risks. However, users can reduce the buffer size using existing related config options. These risks are documented under the relevant configs. Treat `TryRecvError::Closed` in `Inbound::poll_ready` as a fatal error Also: - handle errors in service readiness the same as errors in requests Closes #1655

Security: Spawn a separate task for each initial handshake This fix prevents hangs and deadlocks during initialization, particularly when there are a small number of valid peers in the initial peer config (or from the DNS seeders). Security: Correctly handle the minimum peer connection interval Previously, if we hadn't had a connection for a while, we'd allow a lot of connections all at once, until we'd caught up. Security: sleep MIN_PEER_CONNECTION_INTERVAL between initial handshakes This prevents denial of service if the local network is constrained, and the seeders return a large number of peers. Only wait for ready handshakes Drain all waiting handshakes when enough have succeeded Refactor MetaAddr to enable security fixes Track multiple last used times for each peer: - Add separate untrusted_last_seen, attempt, success, and failed time fields (#1868, #1876, #1848) - Add the new fields to the peer states, so they only appear in states where they are valid - Insert initial seed peers in the AddressBook in the correct states Create a new MetaAddrChange type for AddressBook changes: - Ignore invalid state changes - Ignore updates to the untrusted last seen time (but update the services field) - If we get a gossiped or alternate change for a seed peer, use the last seen and services info - Once a peer has responded, don't go back to the NeverResponded... states - Update the address book metrics - Optimise getting the next connection address from the address book Do an extra crawl for each handshake on startup And whenever there aren't many recently live peers. Remove duplicate initial crawl code This change uses the candidate set for initial seed peers, gossiped peers, and alternate peers. It significantly reduces the complexity of the initialization code. (By about 200 lines.) Apply readiness timeout to each fanout Also get the fanout limit from the number of recently live peers. Launch each CandidateSet fanout in its own task Spawn each `CandidateSet::update` in its own task Move `CandidateSet::next` into the handshake task Move all crawler awaits and threaded locks into spawned tasks In this commit: - Move sending PeerSet changes into a spawned task - Move the locking in `CandidateSet::report_failed` into a spawned task Increase the peer set buffer size for concurrent fanouts Launch sync fanouts concurrently, with peer set readiness timeouts Wait for seed peers before the first crawl WIP: Add a timeout to crawl addr requests This is a workaround for a zcashd response rate-limit. Move AddressBook::lock() onto a blocking thread Process all ready timestamp changes each time the task runs Wait for the initial crawl before launching the syncer Security: Limit unverified blocks to avoid memory DoS Also document the security implications of changing these limits. Drop early inbound requests to avoid load shedding during network setup Stop closing connections when the inbound service is overloaded SECURITY: Make buffer sizes dynamically depend on the config This change significantly increases the inbound buffer size, increasing memory denial of service risks. However, users can reduce the buffer size using existing related config options. These risks are documented under the relevant configs. Treat `TryRecvError::Closed` in `Inbound::poll_ready` as a fatal error Also: - handle errors in service readiness the same as errors in requests Closes #1655

teor2345 added this to the 2021 Sprint 4 milestone Mar 5, 2021

teor2345 mentioned this issue Mar 5, 2021

CandidateSet: Document Listener Port Connections #1852

Merged

mpguerra modified the milestones: 2021 Sprint 4, 2021 Sprint 5 Mar 8, 2021

mpguerra removed the S-needs-triage Status: A bug report needs triage label Mar 8, 2021

mpguerra modified the milestones: 2021 Sprint 5, 2021 Sprint 6 Mar 22, 2021

teor2345 modified the milestones: 2021 Sprint 6, 2021 Sprint 7 Mar 31, 2021

teor2345 modified the milestones: 2021 Sprint 7, 2021 Sprint 8 Apr 15, 2021

teor2345 modified the milestones: 2021 Sprint 8 , 2021 Sprint 9, 2021 Sprint 10 Apr 26, 2021

mpguerra mentioned this issue May 7, 2021

Security: Zebra should stop gossiping unreachable addresses to other nodes, Action: re-deploy all nodes #1867

Closed

11 tasks

mpguerra modified the milestones: 2021 Sprint 10, 2021 Sprint 11 - Zcon2 May 7, 2021

teor2345 modified the milestones: 2021 Sprint 11 - Zcon2, 2021 Sprint 9 May 17, 2021

teor2345 changed the title ~~Limit reconnection rate to each individual peer address~~ Security: Limit reconnection rate to each individual peer address May 17, 2021

teor2345 mentioned this issue May 17, 2021

Security: MetaAddr refactor, gossip time, connection order, reconnection rate fixes (see tickets) #2160

Closed

2 tasks

mpguerra linked a pull request May 17, 2021 that will close this issue

Security: MetaAddr refactor, gossip time, connection order, reconnection rate fixes (see tickets) #2160

Closed

2 tasks

teor2345 modified the milestones: 2021 Sprint 9, 2021 Sprint 12 May 27, 2021

mpguerra modified the milestones: 2021 Sprint 12, 2021 Sprint 14 May 27, 2021

teor2345 mentioned this issue Jun 10, 2021

Security: Limit reconnection rate to individual peers #2275

Merged

3 tasks

mpguerra modified the milestones: 2021 Sprint 14, 2021 Sprint 11 - Zcon2 Jun 14, 2021

mpguerra removed a link to a pull request Jun 14, 2021

Security: MetaAddr refactor, gossip time, connection order, reconnection rate fixes (see tickets) #2160

Closed

2 tasks

mpguerra linked a pull request Jun 14, 2021 that will close this issue

Security: Limit reconnection rate to individual peers #2275

Merged

3 tasks

mpguerra assigned teor2345 Jun 14, 2021

teor2345 mentioned this issue Jun 15, 2021

Security: Stop disconnecting from nodes that send unexpected messages, to prevent disconnection attacks, Credit: Equilibrium #2107

Closed

3 tasks

jvff closed this as completed in #2275 Jun 18, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Security: Limit reconnection rate to each individual peer address #1848

Security: Limit reconnection rate to each individual peer address #1848

teor2345 commented Mar 5, 2021 •

edited

Loading

Security: Limit reconnection rate to each individual peer address #1848

Security: Limit reconnection rate to each individual peer address #1848

Comments

teor2345 commented Mar 5, 2021 • edited Loading

teor2345 commented Mar 5, 2021 •

edited

Loading