Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make sync, inbound, and block verifier check if a block hash is in any chain or any queue #862

Closed
14 tasks
Tracked by #3322 ...
teor2345 opened this issue Aug 10, 2020 · 14 comments · Fixed by #6335
Closed
14 tasks
Tracked by #3322 ...
Assignees
Labels
A-rust Area: Updates to Rust code C-bug Category: This is a bug I-slow Problems with performance or responsiveness I-usability Zebra is hard to understand or use

Comments

@teor2345
Copy link
Contributor

teor2345 commented Aug 10, 2020

Motivation

In the sync service (#853), inbound downloader, and block verifier, we are using the GetDepth request to check if a block is present in the state:
https://github.com/ZcashFoundation/zebra/pull/853/files#r467684617

But the sync, inbound, and block verifier actually need to know if a block hash is present in any chain. (This is a bug.)

They also don't need a depth, or any other extra data.

Scheduling

This risk is acceptable for the stable release, but we need to fix it before we support lightwalletd.

We should also fix this bug if Zebra continues to hang, after we fix known hang bugs.

Solution

Each section should be implemented in a separate PR.

A. Add a new "contains" block hash request:

  • add an additional state service request, which checks if a block is in:
    • any chain
    • the pending non-finalized state queue
    • the pending finalized state queue
  • use it in the sync service to ignore block downloads (or check for genesis)
  • use it in the inbound downloader to ignore block downloads
  • make it a HashSet of block hashes, to reduce the number of state requests

B. We want to do these checks:

For example, see this code:

if self.mem.any_chain_contains(&prepared.hash) || self.disk.hash(prepared.height).is_some()
{
let (rsp_tx, rsp_rx) = oneshot::channel();
let _ = rsp_tx.send(Err("block is already committed to the state".into()));
return rsp_rx;
}

D. Check if the blocks are already waiting in a checkpoint verifier queue:

  • change the request type for the checkpoint verifier into an enum
  • check if blocks are already waiting in the checkpoint verifier queue

E. De-duplicate the sync and inbound block downloaders:

  • Use an enum to distinguish sync and inbound requests
  • Don't apply the inbound limits to sync blocks
  • Use a buffer or channels to get responses back to the right service?

Alternatives

We could implement the new request as a wrapper function around a more specific query, which discards any extra data. But there aren't any requests that find blocks in any chain.

We might also want to return success on duplicate blocks, rather than an error. But sync restarts should be even rarer once we fix this bug.

Related

The state request for this could support more detailed errors for the submitblock method (#5487) if it specifies where the block is in the state (i.e. best chain, side chain, or queued).

@teor2345 teor2345 added Poll::Ready A-rust Area: Updates to Rust code C-cleanup Category: This is a cleanup labels Aug 10, 2020
@teor2345
Copy link
Contributor Author

See fd61aa9 and PR #853 for an example wrapper function.

We could move this wrapper function to zebra-state to resolve this issue.

@teor2345 teor2345 changed the title Add a state service request that checks if a block hash is in any chain Add a state service request to check if a block hash is in the state Aug 10, 2020
@teor2345 teor2345 changed the title Add a state service request to check if a block hash is in the state Add a request that checks if a block hash is in the state Aug 10, 2020
@hdevalence
Copy link
Contributor

What's the advantage of having a separate request type for this, compared to checking whether the returned depth is Some or None?

@teor2345
Copy link
Contributor Author

Depth accesses the height to hash map.

I don't know if it's cheaper to check if a hash is present in the hash to block map, but I can't imagine it matters that much, as long as we don't actually retrieve the block itself.

So I think moving this wrapper function to the state service would be the best choice here:
https://github.com/ZcashFoundation/zebra/pull/853/files#r467684617

(See also #865, where we remove as many of these state checks as we can.)

@teor2345 teor2345 changed the title Add a request that checks if a block hash is in the state Add a function that checks if a block hash is in the state Aug 12, 2020
@teor2345
Copy link
Contributor Author

teor2345 commented Aug 12, 2020

@yaahc asked in #853 (comment)

Then we don't have to worry about the exact depth and race conditions.

That's the thing I'm not seeing though, what race conditions are there and in what situations does the exact depth matter?

The exact depth does not matter, and the interfaces we use should reflect that fact.

These race conditions do not matter for this particular caller, but they do exist:

  • depth and adding additional blocks on the best tip, or
  • depth and changing the best tip.

It's also unclear what depth means for a block that's on a side-chain.

In this ticket, we resolve these issues by:

  • providing an interface to the state service which explicitly checks for the presence of a block in any chain, and
  • making any remaining issues an implementation detail inside the state.

And it's clear that modules which use this new interface don't rely on anything else about the block or state.

@hdevalence
Copy link
Contributor

To be more precise, there's not exactly a race condition, just the potential for TOCTOU issues where another part of the software requests information about what's in the state, acts on it, and the state is updated in the meantime. But this is the case for any state query, and I think that the solution is the direction we're already going (where all state updates are permissioned by the state itself, which can do synchronous checks). So I don't see the difference between returning the depth or not, in either case the information returned by the state can become stale.

It seems like the only problem is that you want to check whether a block hash is in any chain, while the current API checks whether a block hash is in the best chain.

@teor2345
Copy link
Contributor Author

Functionally, that's the only change - but I think a specific function for checking a hash is also useful for readability and modularity. (And potentially future optimisations.)

It's also worth noting the two different kinds of TOCTOU issues:

  • a block is not in the state, but is later downloaded or verified
  • a block is in the state, but is later pruned from the state as a side-chain (if we implement pruning)

@yaahc
Copy link
Contributor

yaahc commented Aug 13, 2020

It seems like the only problem is that you want to check whether a block hash is in any chain

This seems like a reasonable default for block lookups.

while the current API checks whether a block hash is in the best chain.

😅 s/the best chain/the only chain/

@mpguerra mpguerra mentioned this issue Jan 5, 2021
44 tasks
@mpguerra mpguerra removed this from the Sync and Network milestone Jan 5, 2021
@teor2345 teor2345 added C-bug Category: This is a bug P-Medium and removed C-cleanup Category: This is a cleanup E-easy labels Feb 9, 2021
@teor2345 teor2345 changed the title Add a function that checks if a block hash is in the state Make sync and inbound check if a block hash is in any chain Feb 9, 2021
@teor2345 teor2345 added the I-slow Problems with performance or responsiveness label Mar 4, 2021
@teor2345 teor2345 changed the title Make sync and inbound check if a block hash is in any chain Make sync and inbound check if a block hash is in any chain or any queue Mar 4, 2021
@mpguerra
Copy link
Contributor

mpguerra commented Jul 7, 2021

Is this something that needs to be done as part of #2224 ?

@teor2345
Copy link
Contributor Author

This might be needed to fix syncing bugs.

@teor2345
Copy link
Contributor Author

teor2345 commented Mar 1, 2022

This is a real bug, but it doesn't seem to cause that many problems in practice.

@teor2345 teor2345 closed this as completed Mar 1, 2022
@teor2345 teor2345 closed this as not planned Won't fix, can't repro, duplicate, stale Sep 6, 2022
@dconnolly dconnolly reopened this Mar 8, 2023
@dconnolly
Copy link
Contributor

dconnolly commented Mar 8, 2023

@arya2 I think you mentioned seeing this issue in the block verifier as well? Can you populate some of that context here? Gracias

@mpguerra mpguerra added the C-audit Category: Issues arising from audit findings label Mar 9, 2023
@mpguerra
Copy link
Contributor

mpguerra commented Mar 9, 2023

As part of resolving this we should remember to update the related TODO in

/// TODO BUG: check if the hash is in any chain (#862)

@arya2 arya2 changed the title Make sync and inbound check if a block hash is in any chain or any queue Make sync, inbound, and block verifier check if a block hash is in any chain or any queue Mar 11, 2023
@mpguerra
Copy link
Contributor

mpguerra commented Mar 14, 2023

The issue highlighted by the audit was that this issue was closed while there was still a TODO comment referring to it.

So, by re-opening this issue, we have already "fixed" the issue highlighted by the audit (#6281).

The next question is: how important is the actual issue to fix right now?

@teor2345
Copy link
Contributor Author

teor2345 commented Mar 24, 2023

We handled this issue by checking for blocks that are already in side-chains in PR #6335.

We closed PR #6397 because it didn't work, and we'd run out of time to fix it. The fix is optional because duplicate queued blocks are already handled in the syncer and inbound downloader, we just don't handle duplicates across both of them. Which is ok for now.

@mergify mergify bot closed this as completed in #6335 Mar 24, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-rust Area: Updates to Rust code C-bug Category: This is a bug I-slow Problems with performance or responsiveness I-usability Zebra is hard to understand or use
Projects
None yet
6 participants