Where SSZ and merkle partials meet. #1128

protolambda · 2019-05-27T12:41:55Z

So, I have been thinking about implementing SSZ in Go. And given that we already have a bunch (old ZRNT, Firefly, Prysm) with old features, I started thinking of new designs.

Opportunities to combine

Currently, generalized indices and (multi-)proofs are only thought of as future light-client facilitating tech. But what if a SSZ implementation actually worked well with it?

There are a few opportunities in each of the areas, outlined below.

Hash-tree-root caching

It's essentially the same as proofing. When you (multi-)proof, you hit roots of the branch, and stop hashing actual data underneath. The only difference between a (normal) proof is that a proof is unstructured (a multiproof is not). And you check the root, whereas with a cache you are getting a trusted root. It would be nice if you can detach the cache completely, and hit/update it by generalized index. And when you want to verify a part of your state, you run hash-tree-root with the proof data as cache, then verify the root.

Serialization

When you serialize data, you may only be interested in a specific part to serve to a light-client, based on general indices. So you may as well skip data that's not requested. This touches on the Null/Option discussion in SSZ. Alternative to extending SSZ with options/nulls, we could define a way to read SSZ like normal, but filtered. I.e. one or more generalized indices describe the part of the SSZ value actually encoded. The filtering idea is described in more detail below. (there's multiple variants too)

Deserialization

Writing a decoder for ZRNT is another goal of mine. However, if you're a light-client, and just need to know about a change in a subsection of a much bigger object, without knowing the location, it would be nice to update it easily.
The idea is that one can do the reverse of filtered serialization: only decode into the generalized indices approved by a filter. This symmetric (with encoder) ignorance of arbitrary ssz data effectively is a more powerful form of Null/Options. (too much?)

Filters

Another interesting thing with generalized indices + partials that I didn't find anywhere yet: we can sort them, and binary-search through them to efficiently hit a yes/no for location based queries. This would be the basis to compile a list of generalized indices (merkle partial) into generalized-filter.

Now, this filter can be applied to implement the different opportunities above:

def hash_tree_root(value, typ, filter, index):
    h, ok = filter(index)
    if ok:
      if typ == ...:
        return hash(value)
      if issubclass(typ, Container):
        # merkleize also hits the filter. Values are supplied in lambdas to lazily evaluate/skip.
        # when recursively calling into deeper merkle levels, pass the (adjusted) index we're at. And the count (depth can be inferred from it) when type information doesn't suffice (e.g. list) / is not provided.
        return merkleize((lambda: hash_tree_root(field, field_typ, filter, index.deeper(field_index, typ.field_count)) for (field, field_typ, field_index) in typ.get_fields(), filter, index, typ.field_count)
      elif ...
      ...
    else:
      return h

def serialize(value, typ, filter, index):
    if filter(index):
      if typ == ...:
        return value.to_bytes()
      # Same idea with recursive propagation of filter + adjusted index here.
      elif ...
      ...
    else:
      # two options:
      # 1. encode as if a zero value (decoding just works)
      # 2. don't encode (decoder needs to have same filter applied)

def deserialize(data, typ, filter, index):
    if filter(index):
      if typ == ...:
        return bytes_to_foobar(data)
      # Same idea with recursive propagation of filter + adjusted index here.
      elif ...
      ...
    else:
      # two options:
      # 1. decode as if any value, just skip over it (standard encoding just works)
      # 2. don't skip over byte data, expect next data to be next filter hit (encoder needs to have same filter applied)

Note: this is just pseudocode, I would advise real implementations to read from a stream-like input, and to decode into values, instead of returning them (facilitating updates of previous values).

Although the zeroed-out data variant would be compatible over the current approach, it is a lot of redundant information, and we may not want to leak information of the size of the object in some ignored struct field.

Another open design issue is how to introduce and remove cache entries. When in the scope of a small object, you may not know the generalized index. (there are workarounds however)

protolambda · 2019-06-21T00:17:44Z

I think with #1180 and #1184 we are close enough with a full partials implementation to close this. Re-open if anyone wants to share new ideas.

protolambda added the scope:SSZ Simple Serialize label May 27, 2019

protolambda closed this as completed Jun 21, 2019

cdetrio mentioned this issue Jul 20, 2019

Proposed structure for SSZ partials #1303

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Where SSZ and merkle partials meet. #1128

Where SSZ and merkle partials meet. #1128

protolambda commented May 27, 2019

protolambda commented Jun 21, 2019

Where SSZ and merkle partials meet. #1128

Where SSZ and merkle partials meet. #1128

Comments

protolambda commented May 27, 2019

Opportunities to combine

Hash-tree-root caching

Serialization

Deserialization

Filters

protolambda commented Jun 21, 2019