Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Move away from serde for Canonical JSON #464

Open
kim opened this issue Dec 2, 2020 · 2 comments
Open

Move away from serde for Canonical JSON #464

kim opened this issue Dec 2, 2020 · 2 comments

Comments

@kim
Copy link
Contributor

kim commented Dec 2, 2020

serde requires canonicalisation to happen when domain types are already
erased. This creates several footguns:

  • String types may not compare equal after a roundtrip
  • Set types do not have a counterpart in the serde datamodel: they come out
    the same as arrays (i.e. insertion-ordered lists). What we want is to always
    have ordered-set semantics.
  • Floating point numbers (which are illegal in Canonical JSON) cause a runtime
    error

To remedy this, we should precisely constrain what types can be used to compose
a structure subject to canonicalisation.

The proposal is to always go through an intermediate representation (akin to
enum Value), which is already guaranteed to hold the canonical form by
construction. Type-directed conversion for std types may be provided. Literals
may be supported via proc-macros.

Reasoning:

  • It is no less memory-efficient than what we have now, as we need to buffer
    map-shaped objects before canonicalisation
  • It is less surprising (e.g. plain String or &str would just not be
    representable)
  • The actual encoding / decoding could be made much more efficient in terms of
    code size, compilation cost, dependencies, and possibly even runtime
    performance.
@FintanH FintanH mentioned this issue Aug 30, 2021
27 tasks
@FintanH
Copy link
Contributor

FintanH commented Sep 23, 2021

Would this essentially look like:

use crate::Cstring;
use std::collections::{BTreeMap, BTreeSet};

pub enum Value {
    Object(BTreeMap<Cstring, Value>),
    Array(BTreeSet<Value>),
    String(Cstring),
    Number(Number),
    Bool(bool),
    Null,
}

pub enum Number {
    U64(u64),
    I64(i64),
}

impl Canonical for Value {
/* left as an exercise to the implementor ^_^ */
}

Or is it more nuanced than that?

@kim
Copy link
Contributor Author

kim commented Sep 23, 2021

We’d probably want to provide derive macros for user-defined extension payloads.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants