Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Halborn 2023 02 20 #6297

Merged
merged 8 commits into from
Mar 13, 2023
Merged

Halborn 2023 02 20 #6297

merged 8 commits into from
Mar 13, 2023

Conversation

dconnolly
Copy link
Contributor

@dconnolly dconnolly commented Mar 13, 2023

Motivation

This PR fixes some issues we found when investigating the Halborn disclosures from February 2023, and other related network protocol memory usage issues.

The security researchers looked at the zebrad codebase and said their exploit will not work against it as-is.

Specifications

Most of these network protocol limits are undocumented, see the code comments for specific links to zcashd source code.

Complex Code or Requirements

These are network protocol changes, they are compatible with other implementations under normal usage.

There are no consensus-critical or concurrent code changes.

Solution

This PR contains the simplest possible fixes to these issues.

Halborn:

  1. limit inventory registry size to ~16 MB, currently unlimited (but dropped after 2 minutes)
    1. limit registry hashes to 1000, much larger than the typical number of pending gossiped blocks and mempool transactions
      • also limit inventory hashes in each pending inventory update to 1000
    2. limit peers per hash to 70, slightly lower than our default peer set size
      • this avoids large behavior differences around the peer limit
      • for example, if an inventory is globally missing, we could always query 1 peer if we have 74 peers, but we can't query any if we have 75
    3. remove the oldest hash or peer when the limit is reached, for simplicity
      • we might want to change to weighted random later, to avoid inventory flooding/removal attacks
    4. don't delay inventory rotation under load
      • instead, do the current rotation now, and the next rotation earlier to catch up

Testing:

  • registry going over the limit
    • inv limit
    • addr per inv limit

Related - similar attack:

  1. Limit version user agents to 256 bytes, currently the 2 MB message limit
    • since we store one version message per connected peer, an attack can use up to 400 MB of memory

Testing:

  • version user agent over the limit

Less urgent

These fixes are not urgent, but we'll do them at the same time anyway:

  1. Limit reject messages to 128 bytes, currently the default 2 MB message limit
    • we already drop these messages soon after receiving them, but they could be stored in logs and fill up the disk
  2. Don't print every gossiped transaction ID to the logs by default
    • an attacker sending a lot of tiny transactions to the mempool could fill up the disk with logs

Testing:

  • reject command and reject reason
    • over the limit
    • at or under the limit (rejects are rare on the network, so our integration tests don't cover them)

Doesn't need to be in the initial fix or release:

  1. Limit inv messages to 50,000 hashes, currently ~58,000 (the default 2 MB message limit)
    • minimal impact, 1.8 MB vs 2 MB, we already drop these messages soon after receiving them

Review

Reviewed by several people privately already.

Reviewer Checklist

  • Does the code do what the ticket and PR says?
    • Does it change concurrent code, unsafe code, or consensus rules?
    • Are there any security risks remaining, or any new security risks introduced?
  • Do the security docs explain and justify the code changes?
  • How do you know it works? Does it have tests?
    • Existing tests pass (except for CI setup issues)
    • New tests pass
    • Manual sync tests pass

Follow Up Work

@dconnolly dconnolly requested review from a team as code owners March 13, 2023 14:19
@dconnolly dconnolly added P-Critical 🚑 C-security Category: Security issues A-rust Area: Updates to Rust code A-network Area: Network protocol updates or fixes labels Mar 13, 2023
@codecov
Copy link

codecov bot commented Mar 13, 2023

Codecov Report

Merging #6297 (6321302) into main (a76597e) will decrease coverage by 0.04%.
The diff coverage is 89.31%.

Additional details and impacted files
@@            Coverage Diff             @@
##             main    #6297      +/-   ##
==========================================
- Coverage   77.84%   77.80%   -0.04%     
==========================================
  Files         304      304              
  Lines       39366    39481     +115     
==========================================
+ Hits        30643    30719      +76     
- Misses       8723     8762      +39     

mergify bot added a commit that referenced this pull request Mar 13, 2023
@mergify mergify bot merged commit 47cf0f4 into main Mar 13, 2023
@mergify mergify bot deleted the halborn-2023-02-20 branch March 13, 2023 18:10
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-network Area: Network protocol updates or fixes A-rust Area: Updates to Rust code C-security Category: Security issues
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants