This document specifies the initial on-wire format for Smoosh™.
Unit is represented with zero bits.
Encoded as a single bit.
Bytes are encoded as-is.
Signed integers are stored using the Zig-Zag Encoding.
Conceptually: A positive integer n
is encoded as n*2
, and a negative value is encoded as abs(n*2) - 1
.
This is equivalent to: ((n << 1) ^ (n >> (numBits(n) - 1))
, and this equivalent function is significantly faster.
Decoding is then implemented as: ((n >> 1) ^ -(n & 1))
Values of the type uint16
, uint32
, and uint64
, without the [<Smoosh>]
attribute are encoded as-is.
For example, uint16
s, when Smoosh encoded will always end with a 0
bit, determining the end of the Smooshed integer.
Otherwise each byte will be prepended with a 1
bit, denoting an additional byte.
++---++-------------------------------++---++
|| H || P | P | P | P | P | P | P | P || T ||
++---++-------------------------------++---++
|| 1 || b | b | b | b | b | b | b | b || 0 ||
++---++-------------------------------++---++
++---++-------------------------------++---++-------------------------------++---++
|| H || P | P | P | P | P | P | P | P || H || P | P | P | P | P | P | P | P || T ||
++---++-------------------------------++---++-------------------------------++---++
|| 1 || b | b | b | b | b | b | b | b || 1 || b | b | b | b | b | b | b | b || 0 ||
++---++-------------------------------++---++-------------------------------++---++
Singles and doubles are encoded as is.
Decimals are to their 128 bit representation, and encoded as is.
char
s in .Net are UTF-16 characters, and are encoded as-is.
Support for UTF-8 to be determined.
Without the [<Utf8>]
attribute: Length is encoded with a Smooshed uint32
, then directly encoded as-is.
Before encoding, strings will be converted to their UTF8 representation, length is encoded as a Smooshed uint32
,
and then encoded as-is.
Noting the Limitations of Ticks
in
.Net, Smoosh makes no attempt to improve the situation.
If you are building applications that require time resolutions finer than 100ns, then consider not using .Net.
With that note: DateTime
and TimeSpan
have their Ticks
Zig-Zag encoded.
DateTimeOffset
has their Ticks
Zig-Zag encoded, and then it's Offset.Ticks
is immediately Zig-Zag encoded.
Encoded as-is.
Length is encoded with a Smooshed uint32
, then directly encoded as-is.
An array
has a Smooshed uint32
header, defining the number of elements the array contains, and it's members are immediately encoded.
A list
in F# is defined as a discriminated union, and has identical encoding.
An map
has Smooshed uint32
header, defining the number of elements the array contains, and then it's members are immediately encoded.
Records have their members immediately encoded in the order of their definition.
Discriminated unions are encoded by a variable length header.
The length of the header is defined by the number of bits required to encode the number of instances of that type, given as: floor (log2 n) + 1
Each instance's header is encoded by a bit sequence, truncated to this length: Expressing the index of the instance of the union.
If the instance contains additional values, they are then immediately encoded.
After encoding, the bit-level result is padded at the end with 0
s to some multiple of 8, for byte alignment.