Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix(network): Ignore out of order Address Book changes, unless they are concurrent #6717

Merged
merged 13 commits into from
May 24, 2023

Conversation

teor2345
Copy link
Contributor

@teor2345 teor2345 commented May 18, 2023

Motivation

We want to ignore earlier address book changes which are delayed, so we don't accidentally reset the peer state to an earlier value. But some changes can be concurrent, so the change application order doesn't always match the network order.

Close #6672

Specifications

There is no formal specification for peer network behaviour, but in general, the peer's internal address book state should reflect the actual behaviour of its peers.

Complex Code or Requirements

This code runs concurrently, it is accessed from both the address book updater task and the candidate set peer dialler task.

Changes can arrive earlier or later than their network order because of:

  • waiting on the address book mutex
  • async runtime or OS thread scheduling
  • monotonic clock resolution, or OS or hardware time bugs
  • wall clock time updates

The application can also independently initiate a change to the connection state, which can be simultaneous (or before) a change from the peer. In this case, the application-provided change needs to take priority. (Because it is typically failing the connection.)

Solution

Fix the code that applies changes to the address book, so that:

  • Concurrent changes are applied if they progress the connection state, but ignored if they revert it
  • Much earlier changes are ignored, even if they progress the connection state

Refactor the code that applies changes so it is easier to read.

Add TODOs for further changes that are not required to fix this security issue.

Testing

  • Add tests for the failure and success cases for the new checks, the extra success cases are:
    • Much later changes are applied, even if they revert the connection state
  • Allow tests to provide synthetic times for address book updates
    • This resolves some technical debt caused by a partial refactor
    • Calling time APIs less will also improve performance on some systems

There are existing tests for the other kinds of changes.

Review

This is one of the more complex audit changes that we have left to do. I'm happy to do a video review if that helps!

Reviewer Checklist

  • Will the PR name make sense to users?
    • Does it need extra CHANGELOG info? (new features, breaking changes, large changes)
  • Are the PR labels correct?
  • Does the code do what the ticket and PR says?
    • Does it change concurrent code, unsafe code, or consensus rules?
  • How do you know it works? Does it have tests?

Follow Up Work

Make it much harder for remote peers to delay changes by flooding the change channel:

@teor2345 teor2345 added P-Medium ⚡ C-security Category: Security issues I-invalid-data Zebra relies on invalid or untrusted data, or sends invalid data A-network Area: Network protocol updates or fixes A-concurrency Area: Async code, needs extra work to make it work properly. C-audit Category: Issues arising from audit findings C-tech-debt Category: Code maintainability issues I-remote-trigger Remote nodes can make Zebra do something bad labels May 18, 2023
@teor2345 teor2345 self-assigned this May 18, 2023
@teor2345 teor2345 requested review from a team as code owners May 18, 2023 07:00
@teor2345 teor2345 requested review from arya2 and removed request for a team May 18, 2023 07:00
@github-actions github-actions bot added C-bug Category: This is a bug C-enhancement Category: This is an improvement C-feature Category: New features labels May 18, 2023
@codecov
Copy link

codecov bot commented May 18, 2023

Codecov Report

Merging #6717 (826395e) into main (fadddae) will decrease coverage by 0.08%.
The diff coverage is 89.15%.

Additional details and impacted files
@@            Coverage Diff             @@
##             main    #6717      +/-   ##
==========================================
- Coverage   78.14%   78.07%   -0.08%     
==========================================
  Files         310      308       -2     
  Lines       40874    40958      +84     
==========================================
+ Hits        31942    31978      +36     
- Misses       8932     8980      +48     

Copy link
Contributor

@arya2 arya2 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is looking great, thank you for fixing this.

I like the workaround for the imprecise timestamps (though I'd like to review it more thoroughly).

zebra-network/src/meta_addr.rs Outdated Show resolved Hide resolved
@teor2345 teor2345 force-pushed the order-address-book-updates branch from 77e88e7 to cf7f618 Compare May 18, 2023 23:41
@teor2345 teor2345 added C-bug Category: This is a bug and removed C-bug Category: This is a bug C-enhancement Category: This is an improvement C-feature Category: New features labels May 19, 2023
@mpguerra mpguerra requested a review from oxarbitrage May 22, 2023 20:36
arya2
arya2 previously approved these changes May 23, 2023
Copy link
Contributor

@arya2 arya2 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you!

zebra-network/src/meta_addr.rs Outdated Show resolved Hide resolved
zebra-network/src/meta_addr.rs Show resolved Hide resolved
zebra-network/src/meta_addr/tests/vectors.rs Outdated Show resolved Hide resolved
@github-actions github-actions bot added C-enhancement Category: This is an improvement C-feature Category: New features labels May 23, 2023
@teor2345 teor2345 removed C-enhancement Category: This is an improvement C-feature Category: New features labels May 24, 2023
@teor2345 teor2345 requested a review from arya2 May 24, 2023 20:26
@teor2345
Copy link
Contributor Author

@oxarbitrage I'm going to go ahead and merge this PR now to avoid merge conflicts.

Feel free to review it or skip the review, whatever you work out with Pili.

mergify bot added a commit that referenced this pull request May 24, 2023
@mergify mergify bot merged commit 8af4e57 into main May 24, 2023
@mergify mergify bot deleted the order-address-book-updates branch May 24, 2023 23:53
@mpguerra mpguerra mentioned this pull request May 26, 2023
36 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-concurrency Area: Async code, needs extra work to make it work properly. A-network Area: Network protocol updates or fixes C-audit Category: Issues arising from audit findings C-bug Category: This is a bug C-security Category: Security issues C-tech-debt Category: Code maintainability issues I-invalid-data Zebra relies on invalid or untrusted data, or sends invalid data I-remote-trigger Remote nodes can make Zebra do something bad
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[NCC-E005955-7DU] zebra-network: Fragile State Transition During Address Book Update
2 participants