This GitHub page contains projects related to my research, as well as some older projects that I have archived for reference.
I strongly believe that the future of genomics will depend on our ability to go beyond traditional approaches and adopt a more holistic view of complex problems. Current challenges include the privacy of genomic data and the lack of flexible tools for studying complex genetic effects. To address these challenges, I have developed several Python libraries that allow researchers to simulate realistic variant call data and study complex genetic effects using novel analytical techniques from post-modern algebra.
This is a cloud-native software framework for fast generation of genomic data simulations. One of the features I am currently working on is to make this project composable with the user's models for specific mutation profiles or genetic positioning.
So far, users have control on:
- population-specific LD decay profiles
- bottleneck parameters
- allele frequency spectra (and/or mutation rate profiles)
- LD block lengths
- genetic position values
In the near future, the HaploDynamics.Framework
module will include an arithmetic to manipulate the generated VCF files (or in fact any VCF file).
I have been working on this software for the last five years to fully implement the framework developed in CTG, CTGI & CTGII . I intend to publish a new version of this framework soon, inclduing new semigroup-based linear algebra features to find combinatorial genetic effects from variant call and phenotype data. Overall, the Pedigrad
library will provide a new paradigm for doing machine learning on genomic data and learn combinatorial relationships.
On the long term, the Pedigrad
library will also be used to complete the simulations of the HaploDynamics
library with complex genomic architecture generations (e.g. variant interactions, complex phenotypes, environment modeling etc.).
This project contains the code developed for a paper detailing the design of a fully homomorphic encryption scheme. The outcomes of this work bear significance in the domains of machine learning and privacy computing.