Is it possible to encode recursive types? #35

artegoser · 2024-08-31T17:22:37Z

In serde version there is error: type changed

The text was updated successfully, but these errors were encountered:

finnbear · 2024-08-31T17:38:42Z

Recursive types are not implemented (edit: for Encode) and would be tricky to implement based on how [email protected] encodes nested values. You can use [email protected] instead, which works very differently and has a #[bitcode(recursive)] attribute for Encode.

Edit: for serde, see #35 (comment)

artegoser · 2024-08-31T18:25:36Z

Thx. So, is this feature not planned? Or will it be possible again someday?

finnbear · 2024-08-31T18:50:32Z

Not currently planned, but the documentation could be more clear about this so I'll leave the issue open.

caibear · 2024-09-01T04:09:03Z

Can use 0.6 with serde if you remove serde untagged from your enums. Serde untagged breaks most binary formats even if some don't complain right away.

artegoser · 2024-09-01T06:56:38Z

Yes, it works, only the purpose in reducing the amount of data is not fulfilled. In this case enum identifier is written, which significantly increases the file size. I compared with dlhn, however it cannot decode this. The size becomes larger than my daletpack format, which does not compress well. Is it possible to implement serde(untagged) support in your case?

finnbear · 2024-09-01T07:16:19Z

Is it possible to implement serde(untagged) support in your case?

No, because #[serde(untagged)] works by trying all possible variants until one deserializes. No version of bitcode is self-describing, so it's possible that the wrong variant would deserialize without an error but the data would be corrupt.

Consider the case of #[serde(untagged)] enum { Big(u16), Small(u8) }. If you serialized Small(5) with serde_json, it would deserialize as Big(5). In bitcode, #[serde(untagged)] enum { Int(usize), Str(String) } could have the worse problem that bytes from a string encoding could be interpreted as an integer. Data corruption is not something we tolerate in bitcode, although most likely the entire message would fail to deserialize.

Making bitcode self-describing (per-instance, to support #[serde(untagged)]) would probably cost much more data than the enum variants that #[serde(untagged)] tries to omit.

The #[serde(untagged)] feature is better for textual formats like JSON, where different types have disjoint prefixes like [ or {.

The size becomes larger than my daletpack format

If you can provide specific details (schema + encoding) of how you can outperform bitcode on size, that would be interesting 👀

artegoser · 2024-09-01T07:35:39Z

If you can provide specific details (schema + encoding) of how you can outperform bitcode on size, that would be interesting 👀

This format is specifically designed for my schema, but it doesn't compress well (I'm not very knowledgeable about how to optimize this), so I'm looking for solutions that are already out there. It's not for all data types, so it won't help you.

For my format, I had the idea of exposing each type (if it's an enum) and writing a separate identifier for each one.

For example

enum Enum {
  First(Other),
  Second(Other)
}

enum Other {
  String(String),
  Num(u8)
}

Enum in this case has 4 binary identifiers for each type

Enum::FirstString
Enum::FirstNum

Enum::SecondString
Enum::SecondNum

I don't know how procedurally realizable this is, but it should work. I don't know if this structure compresses well either.

I partially implemented this in the daletpack scheme (but still eventually would like to do it for all types and procedurally)

caibear · 2024-09-01T08:13:22Z

After running your benchmark it looks like various formats take ~300 bytes. I would recommend testing on a larger dataset if you want to accurately benchmark compression. Compression algorithms are generally inefficient in terms of size and speed on input smaller than tens of kilobytes.

Also zlib is just a wrapper of deflate.

DontBreakAlex · 2024-09-13T16:14:36Z

Hi ! I have a few types that contain serde_json::Valuess, and I also get a panic with type changed. I checked and it seems that serde_json is not using #[serde(untagged)]. What's going on ?

finnbear · 2024-09-13T16:18:35Z

I recently added a warning to the README about this; serde_json::Value has a custom Serialize implementation that is very similar to an untagged enum.

finnbear added the documentation Improvements or additions to documentation label Aug 31, 2024

caibear mentioned this issue Sep 24, 2024

Reimplement #[bitcode(with_serde)] #47

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Is it possible to encode recursive types? #35

Is it possible to encode recursive types? #35

artegoser commented Aug 31, 2024

finnbear commented Aug 31, 2024 •

edited

Loading

artegoser commented Aug 31, 2024

finnbear commented Aug 31, 2024

caibear commented Sep 1, 2024

artegoser commented Sep 1, 2024

finnbear commented Sep 1, 2024 •

edited

Loading

artegoser commented Sep 1, 2024 •

edited

Loading

caibear commented Sep 1, 2024 •

edited

Loading

DontBreakAlex commented Sep 13, 2024

finnbear commented Sep 13, 2024

Is it possible to encode recursive types? #35

Is it possible to encode recursive types? #35

Comments

artegoser commented Aug 31, 2024

finnbear commented Aug 31, 2024 • edited Loading

artegoser commented Aug 31, 2024

finnbear commented Aug 31, 2024

caibear commented Sep 1, 2024

artegoser commented Sep 1, 2024

finnbear commented Sep 1, 2024 • edited Loading

artegoser commented Sep 1, 2024 • edited Loading

caibear commented Sep 1, 2024 • edited Loading

DontBreakAlex commented Sep 13, 2024

finnbear commented Sep 13, 2024

finnbear commented Aug 31, 2024 •

edited

Loading

finnbear commented Sep 1, 2024 •

edited

Loading

artegoser commented Sep 1, 2024 •

edited

Loading

caibear commented Sep 1, 2024 •

edited

Loading