Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Lenngren-based X25519 for non-alt ARM code #108

Merged
merged 4 commits into from
Feb 15, 2024
Merged

Lenngren-based X25519 for non-alt ARM code #108

merged 4 commits into from
Feb 15, 2024

Conversation

jargh
Copy link
Contributor

@jargh jargh commented Feb 9, 2024

Issue #, if available:

Description of changes:

By submitting this pull request, I confirm that you can use, modify, copy, and redistribute this contribution, under the terms of your choice.

Instead of having the postcondition guarded by an assumption of
coprimality, there is now a precise case analysis in the specification
asserting that it returns zero in the degenerate cases. For now, this
is worked around in other proofs, but it gives scope for actually
avoiding the special-case reasoning about zero in many such
applications.
The ABI only requires preservation of the low 64-bit parts of
Q8,...,Q15, and MAYCHANGE_REGS_AND_FLAGS_PERMITTED_BY_ABI now
reflects that with new entries Q8 :> tophalf; ...; Q15 :> tophalf.
The subroutine boilerplate tactic ARM_ADD_RETURN_STACK_TAC is now
updated (somewhat crudely) to accept save/restore of D8,...,D15
in the prologue and epilogue of functions.
@aqjune-aws
Copy link
Collaborator

Hi, since the updated functions now use Neon, should we update test/benchmark.c to run them only when neon is enabled?

Or should we simply assume that Arm always has Neon? I am not sure whether AWS-LC can rely on this assumption, though.

@Emill
Copy link

Emill commented Feb 12, 2024

Reference: https://developer.arm.com/documentation/102525/0100/Compiling-for-Neon-with-Arm-Compiler-6:

Neon is required in all standard Armv8-A implementations, so targeting any Armv8-A architecture or processor will allow the generation of Neon code.

Not sure about Armv9-A though.

This completely changes the implementation of ARM curve25519_x25519
and curve25519_x25519_byte (not the _alt forms, which remain faster
on their target microarchitectures) to a base-25.5 unsaturated version
with interleaved integer and SIMD operations, the inner loop closely
following Emil Lenngren's implementation described in the paper

  https://github.com/Emill/X25519-AArch64/blob/master/X25519_AArch64.pdf

and available here:

  https://github.com/Emill/X25519-AArch64

A version of this code was generated by SLOTHY from the reorganized
implementation by Abdulrahman, Becker, Kannwischer and Klein here:

 https://github.com/slothy-optimizer/slothy/blob/main/paper/clean/neon/X25519-AArch64-simple.s

as described in the associated paper

  https://eprint.iacr.org/2022/1303.pdf

with some additional annotations for use in the formal proof. The
final modular inverse computation reverts to the usual saturated
representation and s2n-bignum's divstep-based inverse function.
@aqjune-aws
Copy link
Collaborator

I checked these things:

  • The updates to the main specifications CURVE25519_X25519_SUBROUTINE_CORRECT and CURVE25519_X25519_BYTE_SUBROUTINE_CORRECT contained minor changes such as used stack size/code size/modifiable register set, and I thought these were okay
  • The update to the precise definition of callee-save registers regarding Q8~Q15 (thanks.)
  • The theorems printed in the proofs CI are okay (did not change compared to the master branch)
  • The update to the postcondition of BIGNUM_INV_P25519_SUBROUTINE_CORRECT which looks correct to me.
(\s. read PC s = returnaddress /\
                  (coprime(p_25519,n)
                   ==> bignum_from_memory(z,4) s = inverse_mod p_25519 n))

=>

(\s. read PC s = returnaddress /\
                  bignum_from_memory(z,4) s =
                  (if p_25519 divides n then 0 else inverse_mod p_25519 n))

I will create a follow-up pull request that removes the existing Neon check condition from test.c/benchmark.c .

I think the lemmas SIMD_SPLIT_JOIN_CLAUSES, EXTRA_SPLIT_JOIN_CLAUSES can probably be used in other places using SIMD registers in s2n-bignum as well.

@aqjune-aws
Copy link
Collaborator

If this is merged, AWS-LC might want to link these assembly files only when Neon is supported. In theory this might be not necessary, but wanted to let people know :)

// https://github.com/Emill/X25519-AArch64/blob/master/X25519_AArch64.pdf
// https://github.com/Emill/X25519-AArch64
//
// and the SLOTHY-based re-engineering of that code by Hanno Becker:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we change this to "Abdulrahman, Becker, Kannwischer, Klein"?

// In particular, the basic dataflow and the organization between integer
// and SIMD units is identical, with only a few minor changes to some
// individual instructions (for miscellaneous reasons). The scheduling
// was redone from scratch by SLOTHY starting from Hanno Becker's
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we say "... starting from the un-interleaved developed by Becker et. al in the SLOTHY paper", or just "... developed in the SLOTHY paper [ref]"?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you June for the additional followup work and the careful review!

I will fix those attributions as Hanno suggests now before we merge.

@hanno-becker
Copy link
Contributor

This is amazing, thank you @jargh.

@jargh
Copy link
Contributor Author

jargh commented Feb 14, 2024

I've tweaked the attributions, which will trigger a re-run of all checks.
Assuming nothing untoward happens, I'll merge after that.

@jargh jargh merged commit 57eb68a into awslabs:main Feb 15, 2024
6 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants