docker build -t epfl_solution .
docker run -it epfl_solution
Samples are pre-processed by first applying a FCGR mapping, followed by a 2D DCTII [Lichtblau2019]. The top left h x h matrix of the DCTII (lowest frequencies) is extracted and set at the hash of the genome.
$ go run model/main.go
will process the samples of data/Challenge.fa
and output the processed samples in model/X.binary
model/Y.binary
(X being the processed samples and Y the labels).
The Python script model/training.py
will use model/X.binary
model/Y.binary
that can then be used to train the model.
The script will output the weights both in .npy
and .binary
as well as a .png
image of the weights/features with gradient color coding.
$ make debug NBGENOMES=2000
will compile and run DebugTest.go
which will process, encrypt, predict, decrypt the first 2000 samples located in data/Challenge.fa
.
$ make key
: generates the secret-key and stores it inkey/
.$ make pro NBGENOMES=2000
: processes the first 2000 samples located iindata/Challenge.fa
. Returns the result intemps/
.$ make enc
: Encrypts the processed samples. Returns the encrypted processed samples intemp/
.$ make pred
: unmarshals the encrypted samples intemp/
, evaluates the homomorphic prediction and marshals back the result intemp/
.$ make dec
: unmarshals the encrypted prediction intemp/
, decrypts and outputs the result inresults/prediction.csv
.
Processing and crypto parameters are located in lib/params
.
$ make clean
: clean all files inkeys/
,temps/
,results/
and all compiled binary files. Does not clean files inmodel/
.
The HE evaluation security is based on the R-LWE hardness. The used parameters are log(N)=10, log(Q)=29. Both the secret and the Gaussian error are sampled from a truncated discrete Gaussian distribution with standard deviation 3.19 and bound 19. The security is estimated to 128-bit according to https://homomorphicencryption.org/.
[Lichtblau2019] : Lichtblau Daniel. “Alignment-free genomic sequence comparison using FCGRand signal processing”, 2019.