Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: addresses mmr find_peaks bug #5182

Merged
merged 10 commits into from
Feb 21, 2023

Conversation

jorgeantonio21
Copy link
Contributor

Description

Currently, the Tari MMR repo contains a bug on the find_peaks function, in base_layer/mmr/src/common.rs. The source of the problem arises from the fact that the current implementation does not consider possible Merkle Mountain Ranges, in which all nodes are filled to the possible top (that is, every possible spawned peak exists in the MMR). Moreover, the current function implementation handles this issue by returning an empty vec![]. This works as a 'silent error' being passed upstream. In order to handle this problem, we impose that the height to check that a new peak exists only decreases if we are guaranteed that every peak at the same height does in fact not exists. In this way, proofs can be created for every MMR size.

PS: Want to add further unit tests.

Motivation and Context

How Has This Been Tested?

More extensive unit testing.

Copy link
Member

@sdbondi sdbondi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the problem was missing checks in MerkleProof::generate_proof and MerkleProof::verify, suggest we return an option from find_peaks instead of an empty vec! so that any caller of find_peaks is forced to do the check.

This is important because we will be verifying untrusted data, so a malicious entity can set mmr_size in the merkle proof to whatever they want.

Comment on lines 80 to 83
// 2
// / \
// 0 1 3 4
// where we have two peaks at height 0.
Copy link
Member

@sdbondi sdbondi Feb 15, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah! So the reason for the vec![] return is to say that the mmr size cannot exist.

Say we have an MMR of size 5 (nodes not leaf nodes), which would look like this

   2
 /   \
0    1  3    4

However that isnt a valid MMR size because it should look like this (7 nodes)

        //        6
        //      /    \
        //    2      5
        //   / \    /  \
        //  0   1   3   4

As expected, find_peaks(7) == [6]

So perhaps we should return an Option here instead of an empty vec to force the caller to handle the case of an invalid size. And add a comment explaining what None means.

Then in MerkleProof::generate_proof and MerkleProof::verify, there was a missing check, you would do something like

let peaks = find_peaks(self.mmr_size).ok_or(MerkleProofError::InvalidMmrSize)?;

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it's more intuitive (and useful) to always return the peak that would be created.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the first example could make sense and might well represent a Merkle Mountain Range of size 5, and for that reason, we should be able to generate proofs for it. The implementations I have seen so far deal with any size MMR. But happy to make the find_peaks return an option in case, the size doesn't correspond to a fully generated MMR.

Copy link
Member

@sdbondi sdbondi Feb 15, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A merkle mountain range cannot be size 5 because then it is malformed. In the above example, nodes 3 and 4 must be hashed to 5 and 2 and 5 must be hashed to 6 or the MMR is not a valid MMR.

Copy link
Collaborator

@SWvheerden SWvheerden Feb 15, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The fact that it might break on find_peaks on invalid numbers, might point out that we might have an issue with the mmr,
I agree here with @sdbondi an MMR cannot be for example size 5,
And MMR containing 5 elements would be of size 8.

         6
      /     \
    2        5
   / \      /  \
  0   1    3   4   7

It might be that our MMR implementation does not cap the peaks correctly

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree with the above comments. Let me refactor the code to output a None in case the MMR is not fully generated/complete/valid and check its usage later on. @SWvheerden in the example above, the size of the MMR is 8, spanning a 5 element set, I think.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah! So the reason for the vec![] return is to say that the mmr size cannot exist.

However that isnt a valid MMR size because it should look like this (7 nodes)

        //        6
        //      /    \
        //    2      5
        //   / \    /  \
        //  0   1   3   4

As expected, find_peaks(7) == [6]

So perhaps we should return an Option here instead of an empty vec to force the caller to handle the case of an invalid size. And add a comment explaining what None means.

Yes, this. find_peaks should really be private and documented such that it should not be called before "bagging" the MMR. Returning an Option is ok too. I'll bet this wasn't done on day 1 on performance grounds and the assumption that it gets called after bagging (but should have been made private to make that assumption a contract). But @stringhandler and I discussed this and found that Option overheads aren't that high.

@@ -137,7 +137,7 @@ impl MerkleProof {

// Get the peaks of the merkle trees, which are bagged together to form the root
// For the proof, we must leave out the local root for the candidate node
let peaks = find_peaks(mmr_size);
let peaks = find_peaks(mmr_size).ok_or(MerkleMountainRangeError::InvalidMmrSize)?;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FYI If the MMR has been bagged, find_peaks will not error

@CjS77
Copy link
Collaborator

CjS77 commented Feb 15, 2023

ACK

@jorgeantonio21 jorgeantonio21 marked this pull request as ready for review February 16, 2023 12:26
Copy link
Collaborator

@SWvheerden SWvheerden left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

utACK

Comment on lines 80 to 83
// 2
// / \
// 0 1 3 4
// where we have two peaks at height 0.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it's more intuitive (and useful) to always return the peak that would be created.

@stringhandler stringhandler merged commit ee55e84 into tari-project:development Feb 21, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants