This repository contains a complete, open-source, end-to-end re-implementation of the Church Lab's eUniRep in silico protein engineering pipeline as presented in Biswas et al. Details on our implementation can be read here, with the supplementary information here. The original Church lab paper can be read here, with their repository here. The JAX-unirep re-implementation we use in our implementation can be found here.
Each in silico step in the protein engineering pipeline has a jupyter notebook that will execute that step as well as an individual README file. The pipeline steps have been broken down as follows:
- Training UniRep: either use the weights provided by the Church lab or use the JAX-unirep reimplementation to re-train from scratch, which is well documented here
- Generate input file of characterized mutants: seq_mutator
- Curating pre-training set for evotuning: pre-evotuning
- Evotuning: we pushed an example script to the jax-unirep repo.
- Top model selection and hyperparamter tuning: top-model
- Markov Chain Monte Carlo (MCMC) directed evolution: directed-evo
- Additional: scripts to do further analysis such as PCA and epistasis evaluation: analysis
If you want to request any modifications / additions or want to collaborate feel free to start an issue / PR!
All the model weights are licensed under the terms of Creative Commons Attribution-NonCommercial 4.0 International License. To view a copy of this license, visit http://creativecommons.org/licenses/by-nc/4.0/ or send a letter to Creative Commons, PO Box 1866, Mountain View, CA 94042, USA.
Otherwise the code in this repository is licensed under the terms of GPL v3.