Skip to content

MarcoAnteghini/In-Pero

Repository files navigation

In-Pero: Prediction of sub-peroxisomal localisation using deep learning embeddings and support vector machines.

To use the In-Pero.py script, you first need to download and install:

1) seqvec

Instructions are available here: https://github.com/Rostlab/SeqVec.

or

pip install seqvec

https://pypi.org/project/seqvec/

2) UniRep and the related weight files, in this case we used the 1900_weights.

https://github.com/churchlab/UniRep

Make sure you download the 1900_weights directory and place it together with the other files in this repository.

For example you can first install awscli:

pip install awscli

Then download the weights with

aws s3 sync --no-sign-request --quiet s3://unirep-public/1900_weights/ 1900_weights

3) A pre-computed model 'LR_model2.sav'/'SVM_model.sav' is also required (uncomment the selected model in the In-Pero.py script).

4) Additional requirements

Suggested packages:

  • numpy 1.17.2
  • biopython 1.77
  • tensorflow 1.14
  • pandas 0.25.1
  • scikit-learn 0.22
  • seqvec 0.4.1
  • scipy 1.4.1
  • overrides 3.1.0

Usage of In-Pero.py

Usage:

./In-Pero.py <filename>.fasta

Outputs:

  • Log file ('<filename>_output.txt') containing the entries subdivided in matrix and membrane proteins.
  • The UniRep encoding
  • The seqvec encoding

This repository contains:

  • In-Pero_models : directory containing all the pre-computed models

  • Dataset : directory containing the training and validation fasta files

  • LR_model2.sav : The LR pre-computed model

  • SVM_model.sav : The SVM pre-computed model

  • PP_matrix.fasta : fasta file containg the matrix proteins used for building the dataset

  • PP_membrane.fasta : fasta file contain the membrane proteins used for building the dataset

  • useful scripts retrived from the UniRep Github repository (see above):

    • data_utils.py
    • unirep.py
    • utils.py

The code in this repository is licensed under the terms of GPL v3 as specified by the LICENSE file.

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages