In-Pero: Prediction of sub-peroxisomal localisation using deep learning embeddings and support vector machines.
To use the In-Pero.py script, you first need to download and install:
Instructions are available here: https://github.com/Rostlab/SeqVec.
or
pip install seqvec
https://pypi.org/project/seqvec/
https://github.com/churchlab/UniRep
Make sure you download the 1900_weights directory and place it together with the other files in this repository.
For example you can first install awscli:
pip install awscli
Then download the weights with
aws s3 sync --no-sign-request --quiet s3://unirep-public/1900_weights/ 1900_weights
3) A pre-computed model 'LR_model2.sav'/'SVM_model.sav' is also required (uncomment the selected model in the In-Pero.py script).
Suggested packages:
numpy 1.17.2
biopython 1.77
tensorflow 1.14
pandas 0.25.1
scikit-learn 0.22
seqvec 0.4.1
scipy 1.4.1
overrides 3.1.0
Usage:
./In-Pero.py <filename>.fasta
Outputs:
- Log file ('<filename>_output.txt') containing the entries subdivided in matrix and membrane proteins.
- The UniRep encoding
- The seqvec encoding
-
In-Pero_models : directory containing all the pre-computed models
-
Dataset : directory containing the training and validation fasta files
-
LR_model2.sav : The LR pre-computed model
-
SVM_model.sav : The SVM pre-computed model
-
PP_matrix.fasta : fasta file containg the matrix proteins used for building the dataset
-
PP_membrane.fasta : fasta file contain the membrane proteins used for building the dataset
-
useful scripts retrived from the UniRep Github repository (see above):
- data_utils.py
- unirep.py
- utils.py
The code in this repository is licensed under the terms of GPL v3 as specified by the LICENSE file.