Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: make sure that structs are serialized correctly #34

Merged
merged 3 commits into from
Apr 24, 2024

Conversation

vmx
Copy link
Member

@vmx vmx commented Apr 22, 2024

Rust structs have their entries ordered the same way as they are defined. When serialized to CBOR, they become maps by default. In
DAG-CBOR the maps need to have a specific order, hence the keys
might need to be re-ordered, so that they are independent of the order they were defined. This commit makes sure that it's actually the case.

Fixes #31.

Rust structs have their entries ordered the same way as they are
defined. When serialized to CBOR, they become maps by default. In
 DAG-CBOR the maps need to have a specific order, hence the keys
might need to be re-ordered, so that they are independent of the
order they were defined. This commit makes sure that it's actually
the case.

Fixes #31.
@vmx vmx requested review from Stebalien and rvagg April 22, 2024 23:28
// comparison gives us the right order as keys in DAG-CBOR are always (text) strings, hence
// have the same CBOR major type 3. The length of the string is encoded in the following
// bits. This means that a shorter string sorts before a longer string.
self.entries.sort_unstable();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think this is enough. We're stuck with the 7049 rules of length-first sorting. https://ipld.io/specs/codecs/dag-cbor/spec/#strictness

A good extension to your tests would be to add a new field to the struct "a1" which should come after "b".

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for pointing out and proposing better tests. Though I think it'ss correct. In a previous PR Steb asked me to clarify this a bit in a comment, but it still doesn't seem to be clear. When you've an idea to word it in a better way, please let me know.

As CBOR prefixes the strings with the length, shorter strings are sorted first. I even think this is the reason they came up with this (counter-intuitive) ordering. So that you can take the full encoded value (not just the string itself) and memcmp it for sorting.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Importantly, even though the length itself is variable length, the length itself is either:

  1. 0-23: inlined into the 5-bit additional data field.
  2. 24-27: 1-8 byte length.

Which means that lexicographical sorting actually works.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, right, I missed that this was sorting over encoded values; and yeah, neat that it actually works given the flexible prefix!

@@ -65,8 +65,8 @@ impl<'a, W: enc::Write> serde::Serializer for &'a mut Serializer<W> {
type SerializeTupleStruct = BoundedCollect<'a, W>;
type SerializeTupleVariant = BoundedCollect<'a, W>;
type SerializeMap = CollectMap<'a, W>;
type SerializeStruct = BoundedCollect<'a, W>;
type SerializeStructVariant = BoundedCollect<'a, W>;
type SerializeStruct = CollectMap<'a, W>;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you educate me on BoundedCollet vs CollectMap?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The names might not be great, but the idea now is that BoundedCollect does take the input in the same order as it comes in without further checks, the CollectMap does the whole sorting thing.

Those structs then implement specific traits so that they can be used with those type definitions.

@vmx vmx force-pushed the fix-struct-serialize branch from d685c6c to a2959cb Compare April 23, 2024 13:50
src/ser.rs Outdated Show resolved Hide resolved
Co-authored-by: Rod Vagg <[email protected]>
@vmx vmx merged commit 3464937 into master Apr 24, 2024
3 checks passed
@vmx vmx deleted the fix-struct-serialize branch April 24, 2024 10:56
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

non-canonical encoding
3 participants