Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Overhaul traits #16

Open
6 of 7 tasks
str4d opened this issue Dec 19, 2019 · 14 comments
Open
6 of 7 tasks

Overhaul traits #16

str4d opened this issue Dec 19, 2019 · 14 comments

Comments

@str4d
Copy link
Member

str4d commented Dec 19, 2019

This is a sub-task of zcash/librustzcash#41. We can discuss the trait refactor specifically here.

The current plan (as of 2020-09-04):

@ebfull
Copy link
Collaborator

ebfull commented Dec 20, 2019

Let's use this github issue to argue over how the trait should look. I'll start.

bls12_381 doesn't expose extension fields in its API anymore. Thus, the following are useless APIs for us

  1. SqrtField
  2. frobenius_map

@ebfull
Copy link
Collaborator

ebfull commented Dec 20, 2019

Field should probably provide associated constants useful for e.g. FFTs.

@ebfull
Copy link
Collaborator

ebfull commented Dec 20, 2019

Should there be a PrimeField and BinaryField?

@ebfull
Copy link
Collaborator

ebfull commented Dec 20, 2019

Field could have associated constants like DEGREE and CHARACTERISTIC.

@str4d
Copy link
Member Author

str4d commented Apr 24, 2020

Field::random is only useful for fields that are used as scalars; this is true of bls12_381::Scalar and jubjub::Fr (jubjub::Scalar after zkcrypto/jubjub#33), but not true for bls12_381::Fp. Perhaps we should add a ScalarField trait that encompasses this? It could also contain bounds like BitAnd<u64, Output = u64> and Shr<u32, Output = Self> (which I've added to PrimeField in zcash/librustzcash#223 because they are necessary in the windowed Pedersen hash implementation).

@str4d
Copy link
Member Author

str4d commented Sep 4, 2020

Updates:

  • We deployed most of the outlined refactoring in ff 0.7.0.
  • We don't expose bls12_381::Fp publicly, and Field is now effectively a scalar field trait.
  • BitAnd and Shr were removed as bounds from PrimeField prior to releasing ff 0.7.0.

@rvolgers
Copy link

rvolgers commented Feb 24, 2021

I have some questions wrt ff::PrimeField::ReprBits...

Currently a lot of curve implementations use 64 bit for the actual internal representation even on 32 bit platforms, but specify a ReprBits with 32 bit limbs. Which means they need to do conversion in to_le_bits on 32 bit platforms.

What's the idea behind this? Is this forward compatibility for a future where the internal logic is actually 32 bit on 32 bit platforms?

(Edit: this is possibly the wrong place to ask, perhaps I should ask in https://github.com/RustCrypto/elliptic-curves instead?)

@tarcieri
Copy link
Contributor

tarcieri commented Feb 24, 2021

I believe that's a restriction of bitvec. The RustCrypto/elliptic-curves code in question is here:

https://github.com/RustCrypto/elliptic-curves/blob/ebb7fe9/p256/src/arithmetic/scalar.rs#L828-L840

The p256 crate provides only a 64-bit field representation. FWIW the k256 crate provides a 32-bit backend so really this only impacts the current implementation of the p256 scalar field.

However, nothing in either of those crates or the downstream consumers thereof is leveraging the bits representation anyway, so I think it'd be helpful to at least make it optional as proposed in #42. It would allow us to get rid of 5 otherwise superfluous dependencies (which were implicated in some recent breakages which were tough for downstream users to sort out.

@rvolgers
Copy link

rvolgers commented Feb 24, 2021

Ah, thanks for the pointer, I would not have guessed that. For future reference, this is the limitation:

https://github.com/bitvecto-rs/bitvec/blob/028c33aadc4dbf3c0e7d043b75d7219000f3ef2f/src/mem.rs#L87-L91

(Edit: removed questioning of this limitation in bitvec. Apparently it's because it needs the components to be naturally aligned because of pointer encoding tricks it uses to permit bit references, and on 32 bit systems 64 bit values do not have 64 bit alignment.)

Agree on dropping bitvec as a required dependency, though it leaves me back at square one wrt getting anything resembling a defined endianness out of scalars.

Can we please just have from_repr_be and from_repr_le in addition to current from_repr, as well as associated to_ functions? The implementation overhead is minimal, as from_repr can just call from_repr_le or from_repr_be depending on which is considered the favored one, and the non-favored one can be implemented as a byte slice reversal combined with a call to the other one, as it's probably not performance critical (and maybe the compiler will figure it out anyway).

@rvolgers
Copy link

rvolgers commented Feb 24, 2021

I've spent some more time looking at the bitvec crate, and the more I look at it the less I think it's appropriate for PrimeField's public API... it uses dark unsafe magic so you can have normal-looking slices and references to bits in the bitvec. Some opinions about this:

  • I don't think that's something people will actually use with PrimeField elements. At least, I can't think of a use case.
  • The black magic means it cannot use u64 elements as storage on 32 bit systems, which means any PrimeField implementations which use u64 internally need to do annoying conversions on 32 bit systems.
  • In the current elliptic curve implementations this is implemented using #[cfg(target_pointer_width = "32")], which raises concerns about testing coverage for 32 bit systems which you would not otherwise have in a pure-rust implementation. I don't see any way to entirely avoid this either.
  • The bitvec is not aimed at bigint use cases, which makes endian-ness handling supremely awkward. It has a O generic parameter which can be Lsb0 or Msb0. However, this "order" only refers to mapping of bits numbers to bits in the logical (integer) limbs. Each limb is always stored as a native integer (meaning, in platform byte order) and the limbs themselves are always ordered in little endian order. All of which is to say, there is no good API for getting a big-endian representation, and the features which are present actively work against you if you try. And of course, reversing the bit vector will also do the wrong thing; it's not the same as reversing the byte order.
  • Worse, on little endian systems the layout of the bitvec returned by to_le_bits will be little-endian in every way, including the raw data whether interpreted as integers or bytes. But on (relatively rare) big endian systems, it will order limbs in little endian order but store each limb in native order. This seems ripe for confusion which only appears on big-endian systems, again raising completely unnecessary questions about test coverage.
  • In my opinion, the le in to_le_bits is a complete red herring. I guess it means bit 0 in the bitvec is the least significant? But to me, "endianness" refers to byte order, and bitvec has no such concept. In fact I think that's why bitvec uses only "ordering"?

It's possible I've misunderstood something by the way, in which case I apologize. I'm putting this up here for discussion (because it's very relevant to the API), as well as to check whether my understanding is accurate.

All in all the natural solution seems the APIs I describe in my previous comment, which all work on bytes directly. Fixed length bytes arrays with known endianness can easily be ingested by any other library, whether bigint or bitvec and are the basic unit in all cryptographic protocols which don't already work in the field's customary internal representation (for those you don't need any conversion anyway). Again, I may be missing something, please correct me if I'm wrong.

@rvolgers
Copy link

I feel like the above comment may benefit from a TLDR, so:

  • BitVec is inconvenient to convert to from curve internal representations, because it doesn't support 64 bit words on 32 bit systems.
  • BitVec is also inconvenient to convert to a byte representation, whether big or little endian.
  • What BitVec gives you in exchange for its limitations is being able to take native Rust references and slices into part of a BitVec. I don't think that will be used too often for field elements.
  • The naming of to_le_bits seems misleading, it has nothing to do with endianness as far as I can tell.

All of these concerns are in addition to it being a troublesome dependency like @tarcieri mentioned.

My proposal is to add functions which convert to and from fixed size byte buffers in big and little endian byte order, using the existing ReprBytes type.

@str4d
Copy link
Member Author

str4d commented May 21, 2021

Can we please just have from_repr_be and from_repr_le in addition to current from_repr, as well as associated to_ functions?

We used to have (the equivalent of) these, and removed them. The reason we removed them is because they almost caused us to deploy a mixed-endianness protocol (different endianness at byte- and bit-level) to a ~$2B ecosystem.

IMHO it is a mistake to attempt to encode field elements generically with a specific endianness. Putting aside the potential for accidental misuse (per above), it enables different users to serialize field elements in potentially-undetectably incompatible ways on disk, which is a nightmare comparable to ASN.1 encodings of curve points. I don't believe that we should enable this kind of unsafety in generic protocols. If you need to know the endianness of an encoding, I think it is perfectly fine to require that you know concretely the field you are operating on (at which point you can do what you want).

The main reason that to_le_bits exists is because we need to decompose scalars into booleans inside arithmetic circuits (in bellman and halo2). This exposes a low level of detail that isn't going to be necessary for pretty much any generic protocol outside of a circuit (as they can just use impl Mul<Scalar> for Group). Now that it has been extracted into the ff::PrimeFieldBits trait, I'd be happy to document it as a more advanced / careful-usage trait.

@str4d
Copy link
Member Author

str4d commented May 21, 2021

  • The naming of to_le_bits seems misleading, it has nothing to do with endianness as far as I can tell.

This confusion is part of my above reasoning: this API has everything to do with endianness! The types that bitvec uses are about how it builds its backing stores, and ideally would be completely opaque to users of this API (which might be possible once bitvec starts using const generics).

Bit endianness is a terrible thing to need to care about, but unfortunately we do need to care about it inside arithmetic circuits (where the native word is a field element rather than a byte).

@str4d
Copy link
Member Author

str4d commented May 21, 2021

And to cap, the main reason that to_le_bits depends on bitvec is because previously ff had a custom BitIterator struct that provided this same logic, but it had a weird API that took a little-endian byte array and exposed a big-endian bit stream, and it required caring about byte endianness generically, and wasn't particularly optimised. I looked at the available crates, and it seemed that bitvec was the most actively-developed crate that provided a reasonable replacement, and meant that we didn't need to care too much about how to get that bitstream, and could move the bitstream generation out of the generic API (per my concerns above).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants