Building the code

This is the source code accompanying the paper "Fast polynomial multiplication using matrix multiplication accelerators with applications to NTRU on the Apple M1 SoC". This is a list of folders and their contents:

amx: AMX macros from Peter Cawley (aarch64.h), plus helper macros of our own (amx.h) and implementations of routines described in our paper;
ct_results_M1_dit: results and histogram plots for constant-time experiments in the M1, with the DIT (data-independent timing) bit set;
ct_results_M3_dit: results and histogram plots for constant-time experiments in the M3, also with the DIT bit set;
googletest: a copy of the Google Test library;
jupyter: Jupyter notebooks to generate some calculations and plots of the "Experimental results" section;
PQC_NEON: relevant files from the paper "Optimized Software Implementations of CRYSTALS-Kyber, NTRU, and Saber Using NEON-Based Special Instructions of ARMv8", obtained from the associated GitHub repository, with some modifications as described in our paper;
reference: relevant files from NTRU's reference implementation, based on the Round 3 submission package to the NIST PQC standardization effort;
rng_opt: an optimized implementation of the NIST randombytes routines, based on AES-256 CTR-DRBG, implemented using AES instructions available in the ARMv8-A Cryptographic Extensions;
speed: benchmarking harnesses for our implementation, constant-time experiments, our custom RNG, and some BLAS routines found in Apple's Accelerate framework.
speed_results_M1_dit: raw benchmark results in the M1, with the DIT bit set, and an Excel spreadsheet compiling them, generated using a Python script discussed below;
speed_results_M3_dit: raw benchmark results in the M3, with the DIT bit set,and an Excel spreadsheet compiling them, generated using a Python script discussed below;
test: tests (using the Google Test library) to validate various aspects of the implementation as well as our optimized RNG;
vector-polymul-ntru-ntrup: relevant files from the paper "Algorithmic Views of Vectorized Polynomial Multipliers -- NTRU", obtained from the associated GitHub repository, with some modifications as described in our paper.

The root folder also includes some files of note:

run_benchmarks_dit.sh and run_benchmarks_no_dit.sh: a helper script to run benchmarks (see instructions below);
run_ct_experiments_dit.sh and run_ct_experiments_no_dit.sh: a helper script to run constant-time experiments (see instructions below);
run_all.sh: a helper script to run both benchmarks and constant-time experiments.
consolidate_benchmarks.py: a Python 3 script to consolidate benchmark results from different systems into a single Microsoft Excel file.

Building the code

We use CMake as our build system. It can be installed using Homebrew with the command brew install cmake. A typical sequence of commands to build the code, starting from the root folder of the repository, is:

mkdir build
cd build
cmake -DCMAKE_BUILD_TYPE=Release ..
make

NOTE: for the tested compilers, there is a register allocation issue when the optimized randombytes routine is compiled in Debug mode (i.e. passing -DCMAKE_BUILD_TYPE=Debug to CMake), and the build fails. However, in RelWithDebInfo and Release mode, there is no issue.

Running tests

Compilation produces many test binaries in the build folder (build/test_* if using the directions in Building the code above). While it is possible to run each binary directly, we recommend using the ctest utility from CMake to run all available tests with a single invocation. ctest also runs additional tests that automate the process of comparing KATs using the PQCgenKAT_kem_* binaries.

Running benchmarks

Compilation produces many benchmarking binaries in the build folder (build/speed_* if using the directions in Building the code above). Each binary may be run directly, or a full benchmark set can be automatically run using the helper scripts described in Benchmarking helper scripts below.

Note that binaries must be run with sudo to allow access the cycle counter.

Helper scripts for benchmarking and constant-time experiments

We provide helper scripts to automatically run all available benchmarks (except for those related to the RNG), in the form of run_benchmarks_dit.sh and run_benchmarks_no_dit.sh, with the DIT (data-independent timing) bit set or unset, respectively. For the paper's results, only the DIT version was used, due to the GoFetch microarchitectural attack. It must be run from the root folder of the repository, and places their results in a folder called speed_results_Mx_dit, where Mx will be replaced by the CPU name in the machine where the script is run, e.g. M1, M2 or M3; this is obtained from sysctl -n machdep.cpu.brand_string. Each executable file that is run creates an associated text file containing the benchmark results, with a self-explanatory naming scheme.

A Python 3 script, consolidate_benchmarks.py, can be run afterwards (also from the root folder of the repository). It requires the pandas and xlsxwriter packages, which can be installed using pip.

This script reads all results from the speed_results_Mx_dit folder and generates a Microsoft Excel file displaying them in a tabular form, similar to the format of the results presented in our paper.

Similarly, we provide helper scripts to run constant-time experiments, run_ct_experiments_dit.sh and run_ct_experiments_no_dit.sh. Similar considerations as above, regarding the DIT bit, apply. It must be run from the root folder of the repository, and places their results in a folder called ct_results_Mx_dit (see earlier discussion regarding CPU name). Text files are created with the same naming scheme as in the benchmark script. A Jupyter notebook found in jupyter/histogram_heat_maps.ipynb is supplied to generate histogram plots, and an R script found in jupyter/equivalence_tests.R performs statistical hypothesis tests for equivalence.

License

Our work builds upon many other libraries and implementations, with different licenses for each. Any modifications that we make to an existing work is released under the same original license as that work. As for our original code, we release it under the Creative Commons CC0 1.0 Universal (CC0 1.0) Public Domain Dedication.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Building the code

Running tests

Running benchmarks

Helper scripts for benchmarking and constant-time experiments

License

About

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
.vscode		.vscode
PQC_NEON		PQC_NEON
amx		amx
ct_results_M1_dit		ct_results_M1_dit
ct_results_M3_dit		ct_results_M3_dit
googletest @ eff443c		googletest @ eff443c
jupyter		jupyter
reference		reference
rng_opt		rng_opt
speed		speed
speed_results_M1_dit		speed_results_M1_dit
speed_results_M3_dit		speed_results_M3_dit
test		test
vector-polymul-ntru-ntrup		vector-polymul-ntru-ntrup
.clang-format		.clang-format
.gitignore		.gitignore
.gitmodules		.gitmodules
BuildCaching.cmake		BuildCaching.cmake
CMakeLists.txt		CMakeLists.txt
CPUDetection.cmake		CPUDetection.cmake
CompareKATs.cmake		CompareKATs.cmake
CreateSymlink.cmake		CreateSymlink.cmake
LICENSE		LICENSE
README.md		README.md
consolidate_benchmarks.py		consolidate_benchmarks.py
run_all.sh		run_all.sh
run_benchmarks_dit.sh		run_benchmarks_dit.sh
run_benchmarks_no_dit.sh		run_benchmarks_no_dit.sh
run_ct_experiments_dit.sh		run_ct_experiments_dit.sh
run_ct_experiments_no_dit.sh		run_ct_experiments_no_dit.sh

License

dgazzoni/NTRU-AMX

Folders and files

Latest commit

History

Repository files navigation

Building the code

Running tests

Running benchmarks

Helper scripts for benchmarking and constant-time experiments

License

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages