Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Where SSZ and merkle partials meet. #1128

Closed
protolambda opened this issue May 27, 2019 · 1 comment
Closed

Where SSZ and merkle partials meet. #1128

protolambda opened this issue May 27, 2019 · 1 comment
Labels
scope:SSZ Simple Serialize

Comments

@protolambda
Copy link
Collaborator

So, I have been thinking about implementing SSZ in Go. And given that we already have a bunch (old ZRNT, Firefly, Prysm) with old features, I started thinking of new designs.

Opportunities to combine

Currently, generalized indices and (multi-)proofs are only thought of as future light-client facilitating tech. But what if a SSZ implementation actually worked well with it?

There are a few opportunities in each of the areas, outlined below.

Hash-tree-root caching

It's essentially the same as proofing. When you (multi-)proof, you hit roots of the branch, and stop hashing actual data underneath. The only difference between a (normal) proof is that a proof is unstructured (a multiproof is not). And you check the root, whereas with a cache you are getting a trusted root. It would be nice if you can detach the cache completely, and hit/update it by generalized index. And when you want to verify a part of your state, you run hash-tree-root with the proof data as cache, then verify the root.

Serialization

When you serialize data, you may only be interested in a specific part to serve to a light-client, based on general indices. So you may as well skip data that's not requested. This touches on the Null/Option discussion in SSZ. Alternative to extending SSZ with options/nulls, we could define a way to read SSZ like normal, but filtered. I.e. one or more generalized indices describe the part of the SSZ value actually encoded. The filtering idea is described in more detail below. (there's multiple variants too)

Deserialization

Writing a decoder for ZRNT is another goal of mine. However, if you're a light-client, and just need to know about a change in a subsection of a much bigger object, without knowing the location, it would be nice to update it easily.
The idea is that one can do the reverse of filtered serialization: only decode into the generalized indices approved by a filter. This symmetric (with encoder) ignorance of arbitrary ssz data effectively is a more powerful form of Null/Options. (too much?)

Filters

Another interesting thing with generalized indices + partials that I didn't find anywhere yet: we can sort them, and binary-search through them to efficiently hit a yes/no for location based queries. This would be the basis to compile a list of generalized indices (merkle partial) into generalized-filter.

Now, this filter can be applied to implement the different opportunities above:

def hash_tree_root(value, typ, filter, index):
    h, ok = filter(index)
    if ok:
      if typ == ...:
        return hash(value)
      if issubclass(typ, Container):
        # merkleize also hits the filter. Values are supplied in lambdas to lazily evaluate/skip.
        # when recursively calling into deeper merkle levels, pass the (adjusted) index we're at. And the count (depth can be inferred from it) when type information doesn't suffice (e.g. list) / is not provided.
        return merkleize((lambda: hash_tree_root(field, field_typ, filter, index.deeper(field_index, typ.field_count)) for (field, field_typ, field_index) in typ.get_fields(), filter, index, typ.field_count)
      elif ...
      ...
    else:
      return h

def serialize(value, typ, filter, index):
    if filter(index):
      if typ == ...:
        return value.to_bytes()
      # Same idea with recursive propagation of filter + adjusted index here.
      elif ...
      ...
    else:
      # two options:
      # 1. encode as if a zero value (decoding just works)
      # 2. don't encode (decoder needs to have same filter applied)

def deserialize(data, typ, filter, index):
    if filter(index):
      if typ == ...:
        return bytes_to_foobar(data)
      # Same idea with recursive propagation of filter + adjusted index here.
      elif ...
      ...
    else:
      # two options:
      # 1. decode as if any value, just skip over it (standard encoding just works)
      # 2. don't skip over byte data, expect next data to be next filter hit (encoder needs to have same filter applied)

Note: this is just pseudocode, I would advise real implementations to read from a stream-like input, and to decode into values, instead of returning them (facilitating updates of previous values).

Although the zeroed-out data variant would be compatible over the current approach, it is a lot of redundant information, and we may not want to leak information of the size of the object in some ignored struct field.

Another open design issue is how to introduce and remove cache entries. When in the scope of a small object, you may not know the generalized index. (there are workarounds however)

@protolambda protolambda added the scope:SSZ Simple Serialize label May 27, 2019
@protolambda
Copy link
Collaborator Author

I think with #1180 and #1184 we are close enough with a full partials implementation to close this. Re-open if anyone wants to share new ideas.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
scope:SSZ Simple Serialize
Projects
None yet
Development

No branches or pull requests

1 participant