Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Handle or migrate away from non-utf8 DealProposal Labels #69

Closed
Stebalien opened this issue Mar 9, 2022 · 9 comments
Closed

Handle or migrate away from non-utf8 DealProposal Labels #69

Stebalien opened this issue Mar 9, 2022 · 9 comments
Labels
Milestone

Comments

@Stebalien
Copy link
Member

See filecoin-project/FIPs#187 and https://github.com/filecoin-project/FIPs/blob/master/FIPS/fip-0027.md.

  • According to the FIP, these should be migrated to byte strings.
  • For now, we should accept non-utf8 strings. But that's... nasty.
  • An alternative is to just migrate labels to utf8.

At the moment, we reject non-utf8 labels. But we need to make a decision before we ship.

@jennijuju
Copy link
Member

(whether if this is a P0 for ntwk v16 is pending analysis.

@jennijuju
Copy link
Member

source

  • in actor, we can have a label struct that supports both string and bytes. Then we shall expand the cbor support so it understands both string and bytes during signature validation and such, then store everything on chain in bytes (convert string to label)
  • there will be a state migration over all clientdealproposal (PSD)
  • As of EOD, we figured out a workable solution that won't require a pause in Filecoin storage market (no need to ban PSD or worry about invalid in fly PSD around the upgrade), however, we also haven't quantified the work detailed enough to decide if this should be within the ntwk v16 scope.

To be continued tmr.

@raulk
Copy link
Member

raulk commented Mar 10, 2022

An alternative is to just migrate labels to utf8.

I would entirely discard this option. First, from a product perspective it makes adding binary metadata tricky. Given that Go accepts non-UTF-8 strings, and it's possible that those exist in state, we don't have an easy non-UTF-8 => UTF-8 migration path. We'd have to apply a transform which would manipulate the user-provided value, and that is definitely a no-go.


I think I've read all the past discussion across all links. I have one basic question. A byte array is a looser type than a string. Any string should be representable as a byte array. I'm assuming the bulk of the migration is to re-encode this type from a CBOR string (tags 2 and 3) to a byte slice (different tag). What exactly makes this migration difficult, and what's the reason to consider stopping the market temporarily?

@jennijuju
Copy link
Member

@raulk
exactly like you said! and that way we will not need to stop the market temporarily
image

@raulk
Copy link
Member

raulk commented Mar 10, 2022

@jennijuju Can someone please describe "the workable solution"?

@arajasek
Copy link
Contributor

arajasek commented Mar 10, 2022

@raulk

What exactly makes this migration difficult, and what's the reason to consider stopping the market temporarily?

The migration is trivial. The issue is that naively changing the Label field in the DealProposal from a string to bytes changes the serialization of the DealProposal. The current flow is as follows (some aspects of this flow aren't protocol-specific, but might as well be because they reflect the reality of dealmaking on Filecoin today):

  1. Client creates a DealProposal, signs it, wraps the proposal and signature into a ClientDealProposal, and sends that to the SP
  2. SP deserializes the ClientDealProposal, validates the signature, adds it to a batch, and eventually PublishStorageDeals (sending the serialized ClientDealProposal to the chain)
  3. Validators of the Filecoin blockchain:
  • deserialize the ClientDealProposal into a DealProposal and a signature
  • serialize the DealProposal
  • validates that the signature is valid for the serialized DealProposal

The issue is that all of the de/serializaton steps fail if they were done on opposite sides of the switch from string to bytes. When should clients start using bytes instead of strings? When should SPs start expecting clients to be using bytes instead of strings? What happens to all messages in the mempool that signed strings instead of bytes after the network upgrades?

Can someone please describe "the workable solution"?

The solution proposed by @Stebalien is to use a Union type, something like

struct Label {
  label_string string
  label_bytes []byte
}

We'd assert that exactly one of label_string and label_bytes is set, and de/serialize this object as though it were just one field -- "If label_string is set, serialize it as a string, if not, serialize it as bytes" (and the reverse for decoding). This is of course not the default CBOR encoding of such a struct, but that's fine.

Additionaly, we can start asserting that if label_string is set, that it be valid UTF-8. Other users can simply set label_bytes instead. The migration can be to copy all existing Labels into label_string iff they are valid UTF-8, and into label_bytes otherwise.

This solves most of the problems described above. A large refactor may still be needed for client implementations if they don't abstract over possible DealProposal, but at least there's no messy period over the upgrade where large swathes of PublishStorageDeals messages are failing on-chain and large numbers of clients are seeing SPs reject their deal proposals as invalid.

@raulk
Copy link
Member

raulk commented Mar 10, 2022

Thanks for the detailed proposal. This would work if the union type is entirely transparent, that is, if it's not tagged in itself, and instead we use the CBOR tag in the encoded tuple position to discriminate which type is present. Is that what's intended? Does our DAG-CBOR library support such a serde pattern? Does Rust's serde (which we need to cooperate with)?

@arajasek
Copy link
Contributor

@Stebalien would be the one to confirm, but I'm guessing it's supported in both places, since we have such types already in the HAMT used by Filecoin (see, for example, here).

@ZenGround0
Copy link
Contributor

Closed by FIP 0027

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
No open projects
Development

No branches or pull requests

5 participants