Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update documentation to clarify terminology around block verification #6681

Closed
Tracked by #6277
mpguerra opened this issue May 15, 2023 · 8 comments · Fixed by #7061
Closed
Tracked by #6277

Update documentation to clarify terminology around block verification #6681

mpguerra opened this issue May 15, 2023 · 8 comments · Fixed by #7061
Assignees
Labels
A-docs Area: Documentation C-audit Category: Issues arising from audit findings S-blocked Status: Blocked on other tasks

Comments

@mpguerra
Copy link
Contributor

mpguerra commented May 15, 2023

Motivation

The verification parts of the request, state service, and state consensus rule checks use incorrect terms that are very confusing for developers and auditors.

The storage parts use those terms correctly and should be left alone.

See the comments of this ticket for potential changes.

Background Notes

Here is some of the confusion in this auditor note that we should resolve in our documentation:

Checkpoint vs Finalized

Note that a checkpointed/finalized block can be a settled network upgrade or a block beyond the rollback/reorg limit. The state service processes the finalized block and queues it to be committed to state

This is the core of the confusion:

  • a settled network upgrade is specified as a network upgrade that has "social consensus": it has enough confirmations on enough nodes that it can't be rolled back
  • checkpointed blocks are a Zebra implementation detail: every release, we generate checkpoints for all finalized blocks in the chain, because those blocks can't be rolled back by Zebra, and are very unlikely to be be rolled back by zcashd. This includes settled network upgrades, but it also goes beyond them to include almost all settled blocks.
  • finalized blocks can come from checkpoints, or from a contextually verified block which is beyond the rollback limit
  • the state service processes:
    • checkpoint verified blocks direct to the finalized state
    • contextually verified blocks into the non-finalized state, and then into the finalized state once they reach the rollback limit (or if they are on a side chain that will never be committed to the finalized state, they get dropped)

Verified Block Cache Behaviour

(finalized blocks can be optionally saved to disk.)

All valid finalized blocks are saved to disk, that's what finalization means in Zebra. Saving to disk is not under user control, and it's not a choice the application makes automatically dynamically. The only thing that can stop a finalized block being saved to disk is invalid data, which causes a verification failure in the block commitment, authorization, or chain history roots.

Non-finalized blocks are saved to disk if they are part of the best chain, and beyond the rollback limit. If they are not part of the best chain and beyond the rollback limit, they are dropped.

User Confusion about Caching

There's also some potential user confusion here between:

  • thin clients: programs that don't download or store the whole chain, and skip some validation
  • full nodes: programs that download the whole chain, fully validate it, and store it

We've had multiple requests for Zebra to become a thin client, so it's important for us to avoid confusion in our user documentation.

The ephemeral config only controls state cleanup behaviour, it doesn't impact block validation or storage. Regardless of its setting, Zebra still functions correctly as a fully validating node. (It just doesn't have a cache when it next starts up, which is an entirely valid state.)

Semanically Verified vs Non-Finalized

the state service will validate and commit the non-finalized queued blocks whose parents have recently arrived

To do contextual validation, the state service will:

  • queue blocks whose ancestors have not arrived in the state yet
  • send queued blocks to the block write task after all their ancestors have been sent
  • validate blocks as they arrive in the block write task, as long as their ancestors were valid
  • commit valid blocks to the non-finalized state

Breadth-first order is not guaranteed, because a lower height block from another fork can arrive at any time. Forks are allowed from the finalized tip, and any non-finalized block.

For the first non-finalized block to be validated there must be at least 28 blocks in the finalized state.

The 28 most recent ancestor blocks must have been validated and committed by the state, regardless of whether they are finalized or non-finalized.

If an honest node fails to obtain enough finalized blocks from its peers during syncing, an attacker could send it a non-finalized block to verify, which will fail this assertion and crash the Zebra node.

This is incorrect, due to the confusion above.

Verification Order

The Zebra implementation chooses to verify blocks using checkpoints or semantic/contextual verification, based on their heights and its checkpoints. Checkpoint verification happens in strict height order, in both zebra-consensus and zebra-state. There is only one chain fork when checkpointing.

Semantic verification happens in parallel in the zebra-consensus verifiers, but contextual verification happens in chain order in zebra-state. Each chain fork is verified in height order, different forks can have different tip heights. (Different chain forks could also be contextually verified in parallel, but Zebra doesn't implement that.)

Guaranteed Chain Context

So blocks are not validated by the state until all their ancestors have been validated. If a contextually verified block's ancestors aren't validated, it will wait in the queue and timeout, or be rejected with an error when it is received by the block write task. So validate_and_commit_non_finalized() won't be called unless a block has at least 28 ancestors, because it is only called during contextual validation. And before contextual validation starts, Zebra commits at least 1 million blocks using checkpoints.

No Cross-Module Dependencies

In most cases, the syncer or gossip downloaders will reject blocks that are too far in front of the tip. This prevents CPU and memory denial of service. But peers can't make Zebra panic by submitting blocks out of order, even if they convince these services to submit those blocks by the state. If that happens, the state will timeout or reject the block.

Originally posted by @teor2345 in #6620 (comment)

@mpguerra mpguerra added this to Zebra May 15, 2023
@github-project-automation github-project-automation bot moved this to 🆕 New in Zebra May 15, 2023
@mpguerra mpguerra added A-docs Area: Documentation P-Medium ⚡ C-audit Category: Issues arising from audit findings labels May 15, 2023
@mpguerra
Copy link
Contributor Author

@teor2345
Copy link
Contributor

@mpguerra I think these changes will be easier after the renames in #6680, they are also likely to cause merge conflicts if we do them at the same time.

@mpguerra mpguerra added the S-blocked Status: Blocked on other tasks label May 16, 2023
@mpguerra
Copy link
Contributor Author

mpguerra commented Jun 7, 2023

What do we actually need to do for this issue?

@teor2345
Copy link
Contributor

teor2345 commented Jun 7, 2023

What do we actually need to do for this issue?

I think we should finish the missed renames in #6793 first, then add the information in this ticket to the state requests/service/queues/non-finalized state/finalized state. (Wherever each paragraph is most relevant.)

@teor2345
Copy link
Contributor

teor2345 commented Jun 11, 2023

  • This documentation on CheckpointVerifiedBlock is incorrect, the note below it lists the checks that are actually required:

    /// A block ready to be committed directly to the finalized state with
    /// no checks.

  • Similarly, the zebra_consensus::checkpoint module docs need to be updated: most checks are skipped, but some are still needed:

    //! The checkpoint verifier queues pending blocks. Once there is a
    //! chain from the previous checkpoint to a target checkpoint, it
    //! verifies all the blocks in that chain, and sends accepted blocks to
    //! the state service as finalized chain state, skipping contextual
    //! verification checks.

@teor2345
Copy link
Contributor

teor2345 commented Jun 12, 2023

The documentation and variable names in these types and their methods:

  • Request
  • StateService (and service.rs)
  • queued_blocks.rs
  • check.rs and check/*.rs

Uses:

  • "non-finalized block verifier/verification" where it should use "semantic block verifier/verification"
  • "non-finalized block[s]" where it should use "semantically-verified block[s]"
  • "finalized block" where it should use "semantically-verified block"
  • "finalized block[s]" where it should use "checkpoint-verified block[s]"
  • finalized or finalized_block (by itself as a variable name) where it should use checkpoint_verified
  • prepared or prepared_block (by itself as a variable name) where it should use semantically_verified

Some uses are correct, if they are referring to a block that has actually been committed to the state, a queue, a chain, or a state. But most of them need to be manually updated.

@teor2345
Copy link
Contributor

What do we actually need to do for this issue?

The verification parts of the request, state service, and consensus rule checks use incorrect terms that are very confusing for developers and auditors. The storage parts use those terms correctly and should be left alone.

@teor2345
Copy link
Contributor

The documentation and variable names in these types and their methods:

@arya2 can you do a quick check of the list in this comment:
#6681 (comment)

Feel free to change what we should rename, or what the new names should be.

If you think any of these renames can be done automatically, feel free to add them to #6793. But the old names have to be unique enough that there's no mistakes, and the replacement must be universal.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-docs Area: Documentation C-audit Category: Issues arising from audit findings S-blocked Status: Blocked on other tasks
Projects
Archived in project
Development

Successfully merging a pull request may close this issue.

3 participants