-
Notifications
You must be signed in to change notification settings - Fork 997
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Little-endian vs big-endian (take 2) #1046
Comments
big-endian
This is crypto black-box that shouldn't affect us.
I understand this is your hunch, but is there any actual evidence of this? source or something?
crypto black-box. Not sure it's relevant
In general? Across languages? What big int implementations are you talking about?
Consistency with VM serialization (single-endianness from top to bottom) trumps backwards compatibility imo |
Serialisation of crypto black boxes may matter when those black boxes are implemented as MPC/SNARK/STARK circuits. One generally wants to reduce the depth and complexity of these circuits so avoiding unnecessary "flip" circuitry may help. Imagine an MPC circuit that does bigint arithmetic (naturally done using big-endian) and then does some hashing with the same bigint interpreted as data and SSZ little-endianness forces a bunch of unnecessary flipping. Imperceptible performance degradations in silicon circuits can turn out to be hugely significant in MPC or SNARK circuits.
Yes, just a hunch. The thought is literally from a few hours ago—arguably FUD at this point 😂 To be confirmed/denied by an MPC expert. (I will be meeting Muthu from Ligero at MIT mid-May and will discuss with him.)
My source is @mkalinin (see here). He writes: "all big number implementations in Java that I've seen uses big-endian to encode/decode numbers to/from byte arrays. So does Milagro, even in C implementation". I'm not qualified to comment beyond that.
Agreed. The good news that we may not have to enshrine VM serialisation inside consensus. |
Contrary to other debates like cough signed vs unsigned, I don't have any preference of big vs little endian but it is not true that big number implementations are big-endian:
The serialization of their internal representation is big-endian however (but even if it was little-endian you would need a specific serialization pass for most bigint/crypto libraries) |
From a performance point of view, most architectures support some instruction which efficiently reverses the byte order of a register value (see BSWAP on x86 or REV on ARM). This is specifically to support fast decoding of network packets which are received in big-endian, but could be applicable to passing and retrieving data from our WASM VM. |
very rare, in particular, x86 and wasm are not on the list - x86 has instructions to help with bigendian (
wasm in particular does not have a native byte-swapping instruction, meaning that whenever you have to "read" big-endian data and convert to an integer, you have to do so byte-by-byte which takes (a lot of) space - here's a trivial example of implementing a simple add function operating on data serialized as BE and LE: https://gcc.godbolt.org/z/Dlcd46 compare how, if the data arrives in bigendian format, we have to use a lot of (contract) space to convert to little-endian and back, to perform an the operation.
not sure what this means - big number implementations will typically use native endian internally for operations on the limbs (machine-word-size chunks) and compose these operations however is convenient - endianess doesn't come into play. for interoperability, big-endian is sometimes/often used during serialization, but that also depends. Here's GMP, popular bigint library supporting both endians in serialization: https://gmplib.org/manual/Integer-Import-and-Export.html - it supports both, and "native" for maximum efficiency when on a single platform. Most importantly, the serialization is completely arbitrary / disconnected from what goes on inside the library - it's up to the protocol to choose a serialization and be consistent. if/when we implement a bigint library for wasm, it will first convert everything to little-endian words (like the add example above), perform the operation and convert back to store it. less conversion = smaller code size. |
Would this conversion not happen on the native machine before handing off the data to the WASM runtime via the linear memory buffer? |
whoops, accidentally closed. Reopened! |
I guess that depends on where the data comes from and who does the interpreting of that data - a conversion de facto has to happen somewhere unless there's mechanical sympathy between the parts that make up the whole, something to consider when choosing the parts. I'd think we want as much as possible to be possible at the execution layer, where "possible" includes the notion of efficient - for example so we don't have to add to many black boxes and precompiles. |
Since the data will flow into and out of the the WASM runtime via the EEI, I figured a reverse operation could occur at that boundary in the native client. |
Speaking of big numbers endianness may be used to specify an order of limbs in its representation. That what I meant in the post about Java big number impl; it uses an order with most significant byte (limb = byte) first ~ big-endian with no other options. Why big numbers may matter? Cause, starting from For With regard to crypto black boxes. The best way for SSZ would be to serialise them as pure byte string which represents serialisation format of particular crypto black box, just in the same way we do it for BLS points.
The only way for WASM to benefit from keeping SSZ little-endian that I can see is avoiding flipping in SSZ implementation written in WASM. big-endian pros:
little-endian pros:
Comparing both pros makes me think of keeping little-endian for SSZ as a rational decision. P.S. A discussion on endianness in different systems that I've found an interesting one https://news.ycombinator.com/item?id=9451284 |
A couple more pros for big endian:
It's worth noting that most of the serialization is going to happen on layers higher than the base protocol; eg. in the phase 2 proposal the "transaction" that gets passed into the top-level execution is just a bytearray. So if WASM is little endian and SSZ is big endian there is going to be a lot of byte flipping code at many levels of the stack. And this is honestly a big pro for little endian especially given our goals with abstraction. So it does seem like WASM is forcing our hand.... Agree that BLS-12-381 and SHA256 are black boxes. |
Closed as phase 0 spec is frozen |
In the last endianness debate we decided to favour little-endianness. Since then there have been various developments (see points 1-4 in favour of big-endianness) that warrant revisiting that decision.
Pros of little-endian
parity-codec
Pros of big-endian
i
th bit corresponding to2**i
)All things considered, big-endianness may now to be favoured option. Brief counter-arguments to the little-endianness arguments:
parity-codec
" is an annoyance for the Parity team and can be worked around relatively easily.The text was updated successfully, but these errors were encountered: