Handle or migrate away from non-utf8 DealProposal Labels #69

Stebalien · 2022-03-09T15:29:33Z

See filecoin-project/FIPs#187 and https://github.com/filecoin-project/FIPs/blob/master/FIPS/fip-0027.md.

According to the FIP, these should be migrated to byte strings.
For now, we should accept non-utf8 strings. But that's... nasty.
An alternative is to just migrate labels to utf8.

At the moment, we reject non-utf8 labels. But we need to make a decision before we ship.

jennijuju · 2022-03-09T17:51:47Z

(whether if this is a P0 for ntwk v16 is pending analysis.

jennijuju · 2022-03-10T03:15:18Z

source

in actor, we can have a label struct that supports both string and bytes. Then we shall expand the cbor support so it understands both string and bytes during signature validation and such, then store everything on chain in bytes (convert string to label)
there will be a state migration over all clientdealproposal (PSD)
As of EOD, we figured out a workable solution that won't require a pause in Filecoin storage market (no need to ban PSD or worry about invalid in fly PSD around the upgrade), however, we also haven't quantified the work detailed enough to decide if this should be within the ntwk v16 scope.

To be continued tmr.

raulk · 2022-03-10T14:38:54Z

An alternative is to just migrate labels to utf8.

I would entirely discard this option. First, from a product perspective it makes adding binary metadata tricky. Given that Go accepts non-UTF-8 strings, and it's possible that those exist in state, we don't have an easy non-UTF-8 => UTF-8 migration path. We'd have to apply a transform which would manipulate the user-provided value, and that is definitely a no-go.

I think I've read all the past discussion across all links. I have one basic question. A byte array is a looser type than a string. Any string should be representable as a byte array. I'm assuming the bulk of the migration is to re-encode this type from a CBOR string (tags 2 and 3) to a byte slice (different tag). What exactly makes this migration difficult, and what's the reason to consider stopping the market temporarily?

jennijuju · 2022-03-10T14:41:02Z

@raulk
exactly like you said! and that way we will not need to stop the market temporarily

raulk · 2022-03-10T14:42:41Z

@jennijuju Can someone please describe "the workable solution"?

arajasek · 2022-03-10T15:18:50Z

@raulk

What exactly makes this migration difficult, and what's the reason to consider stopping the market temporarily?

The migration is trivial. The issue is that naively changing the Label field in the DealProposal from a string to bytes changes the serialization of the DealProposal. The current flow is as follows (some aspects of this flow aren't protocol-specific, but might as well be because they reflect the reality of dealmaking on Filecoin today):

Client creates a DealProposal, signs it, wraps the proposal and signature into a ClientDealProposal, and sends that to the SP
SP deserializes the ClientDealProposal, validates the signature, adds it to a batch, and eventually PublishStorageDeals (sending the serialized ClientDealProposal to the chain)
Validators of the Filecoin blockchain:

deserialize the ClientDealProposal into a DealProposal and a signature
serialize the DealProposal
validates that the signature is valid for the serialized DealProposal

The issue is that all of the de/serializaton steps fail if they were done on opposite sides of the switch from string to bytes. When should clients start using bytes instead of strings? When should SPs start expecting clients to be using bytes instead of strings? What happens to all messages in the mempool that signed strings instead of bytes after the network upgrades?

Can someone please describe "the workable solution"?

The solution proposed by @Stebalien is to use a Union type, something like

struct Label {
  label_string string
  label_bytes []byte
}

We'd assert that exactly one of label_string and label_bytes is set, and de/serialize this object as though it were just one field -- "If label_string is set, serialize it as a string, if not, serialize it as bytes" (and the reverse for decoding). This is of course not the default CBOR encoding of such a struct, but that's fine.

Additionaly, we can start asserting that if label_string is set, that it be valid UTF-8. Other users can simply set label_bytes instead. The migration can be to copy all existing Labels into label_string iff they are valid UTF-8, and into label_bytes otherwise.

This solves most of the problems described above. A large refactor may still be needed for client implementations if they don't abstract over possible DealProposal, but at least there's no messy period over the upgrade where large swathes of PublishStorageDeals messages are failing on-chain and large numbers of clients are seeing SPs reject their deal proposals as invalid.

raulk · 2022-03-10T15:25:39Z

Thanks for the detailed proposal. This would work if the union type is entirely transparent, that is, if it's not tagged in itself, and instead we use the CBOR tag in the encoded tuple position to discriminate which type is present. Is that what's intended? Does our DAG-CBOR library support such a serde pattern? Does Rust's serde (which we need to cooperate with)?

arajasek · 2022-03-10T15:29:28Z

@Stebalien would be the one to confirm, but I'm guessing it's supported in both places, since we have such types already in the HAMT used by Filecoin (see, for example, here).

ZenGround0 · 2022-08-11T13:22:05Z

Closed by FIP 0027

Stebalien added area/actors P0 labels Mar 9, 2022

Stebalien added this to the Network v16 milestone Mar 9, 2022

jennijuju mentioned this issue Mar 9, 2022

Analyze migrate away from non-utf8 DealProposal Labels filecoin-project/lotus#8271

Closed

ZenGround0 closed this as completed Aug 11, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Handle or migrate away from non-utf8 DealProposal Labels #69

Handle or migrate away from non-utf8 DealProposal Labels #69

Stebalien commented Mar 9, 2022

jennijuju commented Mar 9, 2022

jennijuju commented Mar 10, 2022

raulk commented Mar 10, 2022

jennijuju commented Mar 10, 2022

raulk commented Mar 10, 2022

arajasek commented Mar 10, 2022 •

edited

Loading

raulk commented Mar 10, 2022

arajasek commented Mar 10, 2022

ZenGround0 commented Aug 11, 2022

Handle or migrate away from non-utf8 DealProposal Labels #69

Handle or migrate away from non-utf8 DealProposal Labels #69

Comments

Stebalien commented Mar 9, 2022

jennijuju commented Mar 9, 2022

jennijuju commented Mar 10, 2022

raulk commented Mar 10, 2022

jennijuju commented Mar 10, 2022

raulk commented Mar 10, 2022

arajasek commented Mar 10, 2022 • edited Loading

raulk commented Mar 10, 2022

arajasek commented Mar 10, 2022

ZenGround0 commented Aug 11, 2022

arajasek commented Mar 10, 2022 •

edited

Loading