Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: addresses mmr find_peaks bug #5182

Merged
merged 10 commits into from
Feb 21, 2023
38 changes: 35 additions & 3 deletions base_layer/mmr/src/common.rs
Original file line number Diff line number Diff line change
Expand Up @@ -75,7 +75,15 @@ pub fn find_peaks(size: usize) -> Vec<usize> {
sum_prev_peaks += peak_size;
num_left -= peak_size;
}
peak_size >>= 1;
// need to verify if other peaks exist on the same height, in which case
// the size doesn't generate a complete mmr, e.g.
// 2
// / \
// 0 1 3 4
// where we have two peaks at height 0.
Copy link
Member

@sdbondi sdbondi Feb 15, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah! So the reason for the vec![] return is to say that the mmr size cannot exist.

Say we have an MMR of size 5 (nodes not leaf nodes), which would look like this

   2
 /   \
0    1  3    4

However that isnt a valid MMR size because it should look like this (7 nodes)

        //        6
        //      /    \
        //    2      5
        //   / \    /  \
        //  0   1   3   4

As expected, find_peaks(7) == [6]

So perhaps we should return an Option here instead of an empty vec to force the caller to handle the case of an invalid size. And add a comment explaining what None means.

Then in MerkleProof::generate_proof and MerkleProof::verify, there was a missing check, you would do something like

let peaks = find_peaks(self.mmr_size).ok_or(MerkleProofError::InvalidMmrSize)?;

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it's more intuitive (and useful) to always return the peak that would be created.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the first example could make sense and might well represent a Merkle Mountain Range of size 5, and for that reason, we should be able to generate proofs for it. The implementations I have seen so far deal with any size MMR. But happy to make the find_peaks return an option in case, the size doesn't correspond to a fully generated MMR.

Copy link
Member

@sdbondi sdbondi Feb 15, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A merkle mountain range cannot be size 5 because then it is malformed. In the above example, nodes 3 and 4 must be hashed to 5 and 2 and 5 must be hashed to 6 or the MMR is not a valid MMR.

Copy link
Collaborator

@SWvheerden SWvheerden Feb 15, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The fact that it might break on find_peaks on invalid numbers, might point out that we might have an issue with the mmr,
I agree here with @sdbondi an MMR cannot be for example size 5,
And MMR containing 5 elements would be of size 8.

         6
      /     \
    2        5
   / \      /  \
  0   1    3   4   7

It might be that our MMR implementation does not cap the peaks correctly

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree with the above comments. Let me refactor the code to output a None in case the MMR is not fully generated/complete/valid and check its usage later on. @SWvheerden in the example above, the size of the MMR is 8, spanning a 5 element set, I think.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah! So the reason for the vec![] return is to say that the mmr size cannot exist.

However that isnt a valid MMR size because it should look like this (7 nodes)

        //        6
        //      /    \
        //    2      5
        //   / \    /  \
        //  0   1   3   4

As expected, find_peaks(7) == [6]

So perhaps we should return an Option here instead of an empty vec to force the caller to handle the case of an invalid size. And add a comment explaining what None means.

Yes, this. find_peaks should really be private and documented such that it should not be called before "bagging" the MMR. Returning an Option is ok too. I'll bet this wasn't done on day 1 on performance grounds and the assumption that it gets called after bagging (but should have been made private to make that assumption a contract). But @stringhandler and I discussed this and found that Option overheads aren't that high.

if num_left < peak_size {
peak_size >>= 1;
}
}
if num_left > 0 {
return vec![];
Expand Down Expand Up @@ -247,10 +255,34 @@ mod test {
fn peak_vectors() {
assert_eq!(find_peaks(0), Vec::<usize>::new());
assert_eq!(find_peaks(1), vec![0]);
assert_eq!(find_peaks(2), vec![0, 1]);
assert_eq!(find_peaks(3), vec![2]);
assert_eq!(find_peaks(4), vec![2, 3]);
assert_eq!(find_peaks(5), vec![2, 3, 4]);
assert_eq!(find_peaks(6), vec![2, 5]);
assert_eq!(find_peaks(7), vec![6]);
assert_eq!(find_peaks(8), vec![6, 7]);
assert_eq!(find_peaks(9), vec![6, 7, 8]);
assert_eq!(find_peaks(10), vec![6, 9]);
assert_eq!(find_peaks(11), vec![6, 9, 10]);
assert_eq!(find_peaks(12), vec![6, 9, 10, 11]);
assert_eq!(find_peaks(13), vec![6, 9, 12]);
assert_eq!(find_peaks(14), vec![6, 13]);
assert_eq!(find_peaks(15), vec![14]);
assert_eq!(find_peaks(16), vec![14, 15]);
assert_eq!(find_peaks(17), vec![14, 15, 16]);
assert_eq!(find_peaks(18), vec![14, 17]);
assert_eq!(find_peaks(19), vec![14, 17, 18]);
assert_eq!(find_peaks(20), vec![14, 17, 18, 19]);
assert_eq!(find_peaks(21), vec![14, 17, 20]);
assert_eq!(find_peaks(22), vec![14, 21]);
assert_eq!(find_peaks(23), vec![14, 21, 22]);
assert_eq!(find_peaks(24), vec![14, 21, 22, 23]);
assert_eq!(find_peaks(28), vec![14, 21, 24, 27]);
assert_eq!(find_peaks(56), vec![30, 45, 52, 55]);
assert_eq!(find_peaks(60), vec![30, 45, 52, 59]);
assert_eq!(find_peaks(123), vec![62, 93, 108, 115, 122]);
assert_eq!(find_peaks(130), vec![126, 129]);
}

#[test]
Expand Down Expand Up @@ -352,9 +384,9 @@ mod test {
fn find_peaks_when_num_left_gt_zero() {
assert!(find_peaks(0).is_empty());
assert_eq!(find_peaks(1), vec![0]);
assert!(find_peaks(2).is_empty());
assert_eq!(find_peaks(2), vec![0, 1]);
assert_eq!(find_peaks(3), vec![2]);
assert_eq!(find_peaks(usize::MAX), [18446744073709551614].to_vec());
assert_eq!(find_peaks(usize::MAX - 1).len(), 0);
assert_eq!(find_peaks(usize::MAX - 1).len(), 2);
}
}