diff --git a/.gitattributes b/.gitattributes
deleted file mode 100644
index dfe0770..0000000
--- a/.gitattributes
+++ /dev/null
@@ -1,2 +0,0 @@
-# Auto detect text files and perform LF normalization
-* text=auto
diff --git a/Dockerfile b/Dockerfile
deleted file mode 100644
index 45bae3b..0000000
--- a/Dockerfile
+++ /dev/null
@@ -1,21 +0,0 @@
-# Set the base image to anaconda python 3.8
-FROM continuumio/miniconda3
-
-# File Author / Maintainer
-MAINTAINER Samuele Cancelleri
-
-ENV SHELL bash
-
-#update conda channel with bioconda and conda-forge
-RUN conda config --add channels defaults
-RUN conda config --add channels conda-forge
-RUN conda config --add channels bioconda
-
-#update packages of the docker system
-RUN apt-get update && apt-get install gsl-bin libgsl0-dev -y && apt-get install libgomp1 -y && apt-get clean
-
-#Install crispritz package (change the dafault version of python to 3.8)
-RUN conda update -n base -c defaults conda
-RUN conda install python=3.8 -y
-RUN conda install crisprme -y && conda clean --all -y
-RUN conda update crisprme -y
diff --git a/LICENSE b/LICENSE
deleted file mode 100644
index 456a677..0000000
--- a/LICENSE
+++ /dev/null
@@ -1,2 +0,0 @@
-CRISRPme has a dual license. It is made available for free to academic researchers under the Affero License (https://www.gnu.org/licenses/agpl-3.0.en.html).
-If you plan to use the CRISRPme for-profit, you will need to purchase a license. Please contact rosalba.giugno@univr.it and lpinello@mgh.harvard.edu for more information.
diff --git a/README.md b/README.md
deleted file mode 100644
index 0ebec71..0000000
--- a/README.md
+++ /dev/null
@@ -1,494 +0,0 @@
-# CRISPRme
-[![install with bioconda](https://img.shields.io/badge/install%20with-bioconda-brightgreen.svg?style=flat)](http://bioconda.github.io/recipes/crispritz/README.html)
-
-CRISPRme is a web-based tool developed to perform genome-wide CRISPR/Cas predictive analysis and result assessment supporting genetic variant and personal genomes.
-
-
-
-The search engine integrated in CRISPRme is based on CRISPRitz (Cancellieri, Samuele, et al. "CRISPRitz: rapid, high-throughput and variant-aware in silico off-target site identification for CRISPR genome editing." Bioinformatics 36.7 (2020): 2001-2008.) to exploit the powerful and fast search methods in the package, integrated with a user friendly and comprehensive GUI allowing the user to inspect and analyze results with ease.
-
-# CRISPRme Installation and Usage
-The two fastest way to use CRISPRme is through the installation of Docker or Conda.
-Here we summarize the steps to install CRISPRme with Docker and Conda.
-
-## Installation (Phase 1)
-**Conda installation (Linux and MacOS):**
-- Open a terminal window
-- Paste this command into the terminal (Linux):
- ```
- curl https://repo.anaconda.com/miniconda/Miniconda3-latest-Linux-x86_64.sh --output Miniconda3-latest-Linux-x86_64.sh
- ```
-- Paste this command into the terminal (MacOS):
- ```
- curl https://repo.anaconda.com/miniconda/Miniconda3-latest-MacOSX-x86_64.sh --output Miniconda3-latest-MacOSX-x86_64.sh
- ```
-- If the file is correctly downloaded you now need to execute it to complete the installation, so paste this command into the terminal:
- - Linux
- ```
- bash Miniconda3-latest-Linux-x86_64.sh
- ```
- - MacOS
- ```
- bash Miniconda3-latest-MacOSX-x86_64.sh
- ```
-- Press ENTER when requested and yes when an answer is requested, in this way you allow conda to set all the directories in your HOME path for an easy use
-- After the complete installation you will receive this message “Thank you for installing Miniconda3!” to certify the correct installation.
-- Now you need to close the terminal window you are using and open a new one, to allow the system to start conda.
-- In the new terminal window you should see something like this:
- ```
- (base) user@nameofPC:~$
- ```
- If you read the "(base)" like this, conda is loaded correctly and you can start using it.
-- Now you need to set the channels to allow conda to access different repositories and set the default version of python to version 3.8, so paste these commands into the terminal you just opened:
- ```
- conda config --add channels defaults
- conda config --add channels bioconda
- conda config --add channels conda-forge
- conda install python=3.8
- ```
-- Now, you can install CRISPRitz by typing the command:
- ```
- conda install crispritz
- ```
-- To test your installation, type the command:
- ```
- crispritz.py
- ```
-- After the execution of the command you should see a list of CRISPRitz tools.
-![crispritz.py_help](https://user-images.githubusercontent.com/40895152/63214203-8452be80-c115-11e9-88e2-4613ba8c3718.png)
-Now the software is installed and ready to be used.
-
-**Docker installation:
-Note: if you are using MasOS or Windows, you just need to download the installer file
-and follow the on screen instructions.
-https://docs.docker.com/docker-for-windows/install/ (Windows)
-https://docs.docker.com/docker-for-mac/install/ (MacOS)**
-
-**Ubuntu installation guide:**
-- Open a terminal window
-- Paste this command to update the index repository:
- ```
- sudo apt-get update
- ```
-- Paste this command to allow package installation over HTTPS:
- ```
- sudo apt-get install \
- apt-transport-https \
- ca-certificates \
- curl \
- gnupg-agent \
- software-properties-common
- ```
-- Paste this command to add the docker key:
- ```
- curl -fsSL https://download.docker.com/linux/ubuntu/gpg | sudo apt-key add -
- ```
-- Paste this command to set the correct version of docker for your system:
- ```
- sudo add-apt-repository \
- "deb [arch=amd64] https://download.docker.com/linux/ubuntu \
- $(lsb_release -cs) \
- Stable"
- ```
-- Paste this command to update the index repository another time, to make sure everything is ready and set to install docker:
- ```
- sudo apt-get update
- ```
-- Then paste this command to finally install docker:
- ```
- sudo apt-get install docker-ce docker-ce-cli containerd.io
- ```
-- Paste this last command to check if the installation is complete and functional:
- ```
- sudo docker run hello-world
- ```
-- If this message is printed, everything is perfectly installed
-![docker hello world](https://user-images.githubusercontent.com/40895152/63214349-769e3880-c117-11e9-8ee2-d754096b3aca.png)
-- Now, we need to do some more steps to complete the settings. Paste this command to create a user group for docker user:
- ```
- sudo groupadd docker
- ```
-- Paste this command to add your current user to the created group:
- ```
- sudo usermod -aG docker $USER
- ```
-- Now you need to restart your machine or the virtual environment, to re-evaluate the user groups.
-- One last command to test if the group is well configured. Paste this command:
- ```
- docker run hello-world
- ```
-- If the previous “hello from docker” message is printed, everything is perfectly set.
-
-## Post installation test (Phase 2):
-**Conda:**
-- Download and run this script if you have installed CRISPRitz with Conda:
- ```
- curl https://raw.githubusercontent.com/pinellolab/CRISPRitz/master/test_scripts/auto_test_crispritz_conda.sh --output auto_test_crispritz_conda.sh
- ```
-- Write this command to execute the script:
- ```
- bash auto_test_crispritz_conda.sh
- ```
-- Wait until this confirmation message appears:
-“EVERY TEST PASSED!!! ENJOY CRISPRitz”
-
-**Docker:**
-- Download and run this script if you have installed CRISPRitz with Docker:
- ```
- curl https://raw.githubusercontent.com/pinellolab/CRISPRitz/master/test_scripts/auto_test_crispritz_docker.sh --output auto_test_crispritz_docker.sh
- ```
-- Write this command to execute the script:
- ```
- bash auto_test_crispritz_docker.sh
- ```
-- Wait until this confirmation message appears:
-“EVERY TEST PASSED!!! ENJOY CRISPRitz”
-
-## Usage (Phase 3):
-Here is a brief guide to help use CRISPRitz, **if you already execute the post installation test
-(Phase [2](#phase2)), and you obtain a positive result, you have all the necessary file in the
-test_crispritz directory and you can skip this list of steps.**
-If you did not execute the test, follow these few steps to download the necessary files to try
-CRISPRitz.
-Download test files (ONE TIME STEP):
-- The script will download the chr22 from UCSC (hg19), the correspondent VCF file from
-the 1000 Genome Project, a directory containing some test guides, a directory
-containing some PAM sequences and a directory of pre-computed genomic annotations
-for the hg19 genome.
-- Download the script with this command:
- ```
- curl https://raw.githubusercontent.com/pinellolab/CRISPRitz/master/test_scripts/download_test_files.sh --output download_test_files.sh
- ```
-- Write this command to execute the script:
- ```
- bash download_test_files.sh
- ```
-- The script will download every necessary file to test the software, we download only one
-chromosome and one vcf file, to save time. All the examples can be run on an entire
-genome, if you want to use the entire hg19 genome, you only need to add chromosomes
-into the `hg19_ref` directory.
-- Write this command to enter the test directory:
- ```
- cd test_crispritz/
- ```
-- Now you are ready to execute the following example functions.
-
-**3.1 CRISPRitz Add-Variant Tool**
-This tool is created to insert variants in a fasta genome.
-Input:
-- Directory containing a genome in fasta format, need to be separated into single
-chromosome files.
-- Directory containing VCF files, need to be separated into single chromosome files
-(multi-sample files will be collapsed into one fake individual).
-
-Output:
-- Directory containing a duplicate of the original genome in fasta format, separated into
-single chromosome files with added SNPs in IUPAC notation
-- Directory containing a duplicate of the original genome in fasta format, separated into
-single chromosome files with added INDELs.
-
-Example call:
-- Conda
- ```
- crispritz.py add-variants hg19_1000genomeproject_vcf/ hg19_ref/
- ```
-- Docker
- ```
- docker run -v ${PWD}:/DATA -w /DATA -i pinellolab/crispritz crispritz.py add-variants hg19_1000genomeproject_vcf/ hg19_ref/
- ```
-
-Detailed input:
-`hg19_1000genomeproject_vcf/` is the directory containing the vcf files.
-`hg19_ref/` is the directory containing the fasta files.
-
-
-**3.2 CRISPRitz Index-Genome Tool**
-This tool is created to generate an index genome (similar to the bwa-index step). This step is
-time consuming (from 30 to 60 minutes) but helps to save a lot of execution time while
-searching with lot of guides and with the support of bulges (DNA and RNA). If do not need to
-search with bulges, skip this passage.
-Input:
-- Name of the genome to create (e.g. `hg19_ref`).
-- Directory containing a genome in fasta format, need to be separated into single
-chromosome files.
-- Text file containing the PAM (including a number of Ns equal to the guide length) and a
-space separated number indicating the length of the PAM sequence (e.g. Cas9 PAM is
-NNNNNNNNNNNNNNNNNNNNNGG 3). The sequence is composed by 20 Ns and
-NGG, followed by number 3, representing the length of the PAM sequence.
-- Number of bulges to include in the database to perform the following search (i.e. the max
-number bulges allowed for DNA and RNA when searching on the database)
-- Number of threads to use for the analysis (Optional)
-
-Output:
-- Directory containing an index genome in .bin format, separated into single chromosome
-files, containing all the candidate targets for a selected PAM, adding also characters to
-perform bulge search.
-
-Example call:
-- Conda
- ```
- crispritz.py index-genome hg19_ref hg19_ref/ pam/pamNGG.txt -bMax 2 -th 4
- ```
-- Docker
- ```
- docker run -v ${PWD}:/DATA -w /DATA -i pinellolab/crispritz crispritz.py index-genome hg19_ref hg19_ref/ pam/pamNGG.txt -bMax 2 -th 4
- ```
-Detailed input:
-`hg19_ref` is the name of the output directory containing the index genome.
-`hg19_ref/` is the directory containing the fasta files.
-`pam/pamNGG.txt/` is a text file containing the PAM sequence.
-`-bMax 2` is the max number of bulges to allow for following searches when creating the
-database (e.g if -bMax 2, all the following searches with the created index can be performed
-with max 2 RNA bulges and 2 DNA bulges)
-`-th 4` is the number of threads to use (Optional)
-
-
-**3.3 CRISPRitz Search Tool**
-This tool is created to search on a fasta genome or an index genome.
-There are two kinds of searches permitted with CRISPRitz;
-The first and simplest one uses a common fasta genome and it’s developed to perform fast,
-on-the-fly searches with mismatches only.
-The second search type, uses the before generated index genome (Phase [3.2](#Index-Genome)), to perform
-searches with lot of guides and also with bulges support.
-
-**3.3.1 Mismatches only search:**
-Input:
-- Directory containing a genome in fasta format, need to be separated into single
-chromosome files.
-- Text file containing the PAM sequence (including a number of Ns equal to the guide
-length) and a space separated number indicating the length of the PAM sequence (e.g.
-Cas9 PAM is NNNNNNNNNNNNNNNNNNNNNGG 3). The sequence is composed by
-20 Ns and NGG, followed by 3, representing the length of the PAM sequence.
-- Text file containing one or more guides (including a number of Ns equal to the length of
-the PAM sequence) (e.g. TCACCCAGGCTGGAATACAGNNN, the last 3 Ns represents
-the space occupied by the PAM in the real sequence)
-- Name of the output file (e.g. `emx1.hg19`)
-- Number of allowed mismatches (e.g. `-mm 4`)
-- Output type (-r off-targets list only, -p profile only, -t everything) (e.g `-t`)
-- Scores (-scores followed by the directory of the fasta genome, to perform the score after
-the search with score calculation based on Doench 2016 and CFD, the two scoring
-methods supports only NGG PAM and 23 long guides) (e.g `-scores hg19_ref/`)
-- Number of threads to use for the analysis (Optional)
-
-Output:
-- Set of result files, including:
- - Targets file, containing all genomic targets for the guide set
- - Profile file, containing a matrix-like representation of guides behaviour (bp/mm, total on-/off- target, targets per mismatch threshold)
- - Extended profile file, containing the motif matrix for each guide and each mismatch threshold, useful to create visual analysis of the guides behaviour
- - Targets file with associated CFD score
-
-Example call:
-- Conda
- ```
- crispritz.py search hg19_ref/ pam/pamNGG.txt guides/EMX1.txt emx1.hg19 -mm 4 -t -scores hg19_ref/
- ```
-- Docker
- ```
- docker run -v ${PWD}:/DATA -w /DATA -i pinellolab/crispritz crispritz.py search hg19_ref/ pam/pamNGG.txt guides/EMX1.txt emx1.hg19 -mm 4 -t -scores hg19_ref/
- ```
-
-Detailed input:
-`hg19_ref/` is the directory containing the fasta files.
-`pam/pamNGG.txt/` is a text file containing the PAM sequence.
-`guides/EMX1.txt` is a text file containing the EXM1 guide
-`emx1.hg19` is the output file name
-`-mm 4` to select the mismatch threshold
-`-t` to select the output type
-`-scores hg19_ref/` to activate the calculation of score (Doench 2016 and CFD)
-`-th 4` is the number of threads to use (Optional)
-
-**3.3.2 Mismatches + Bulges search:**
-Input:
-- Directory containing an index genome in .bin format, separated into single chromosome
-files (Phase [3.2](#Index-Genome)).
-- Text file containing the PAM sequence (including a number of Ns equal to the guide
-length) and a space separated number indicating the length of the PAM sequence (e.g.
-Cas9 PAM is NNNNNNNNNNNNNNNNNNNNNGG 3). The sequence is composed by
-20 Ns and NGG, followed by 3, representing the length of the PAM sequence.
-- Text file containing one or more guides (including a number of Ns equal to the length of
-the PAM sequence) (e.g. TCACCCAGGCTGGAATACAGNNN, the last 3 Ns represents
-the space occupied by the PAM in the real sequence)
-- Name of output file (e.g. `emx1.hg19`)
-- Tag to activate index search (`-index`)
-- Number of allowed mismatches (e.g. `-mm 4`)
-- Size of DNA bulges and/or RNA bulges (e.g. `-bDNA 1 -bRNA 1`)
-- Output type (-r off-targets list only, -p profile only, -t everything) (e.g `-t`)
-- Scores (-scores followed by the directory of the fasta genome, to perform the score after
-the search with score calculation based on Doench 2016 and CFD, the two scoring
-methods supports only NGG PAM and 23 long guides) (e.g `-scores hg19_ref/`)
-- Number of threads to use for the analysis (Optional)
-
-Output:
-- Set of result files, including:
- - Targets file, containing all genomic targets for the guides set
- - Profile file, containing a matrix-like representation of guides behaviour (bp/mm, total on-/off- target, targets per mismatch threshold)
- - Extended profile file, containing the motif matrix for each guide and each mismatch threshold, useful to create visual analysis of the guides behaviour
- - Targets file with associated CFD score
-
-Example call:
-- Conda
- ```
- crispritz.py search genome_library/NGG_hg19_ref/ pam/pamNGG.txt guides/EMX1.txt emx1.hg19 -index -mm 4 -bDNA 1 -bRNA 1 -t -scores hg19_ref/
- ```
-- Docker
- ```
- docker run -v ${PWD}:/DATA -w /DATA -i pinellolab/crispritz crispritz.py search genome_library/NGG_hg19_ref/ pam/pamNGG.txt guides/EMX1.txt emx1.hg19 -index -mm 4 -bDNA 1 -bRNA 1 -t -scores hg19_ref/
- ```
-
-Detailed input:
-`genome_library/NGG_hg19_ref/` is the directory containing the fasta files.
-`pam/pamNGG.txt/` is a text file containing the PAM sequence.
-`guides/EMX1.txt` is a text file containing the EXM1 guide
-`emx1.hg19` is the output file name
-`-index` tag to activate the index search
-`-bDNA 1` DNA bulges threshold
-`-bRNA 1` RNA bulges threshold
-`-mm 4` Mismatches threshold
-`-t` to select the output type
-`-scores hg19_ref/` to activate the calculation of score (Doench 2016 and CFD)
-`-th 4` is the number of threads to use (Optional)
-
-
-**3.4 CRISPRitz Annotation Tool:**
-This tool is created to perform genomic annotation on results obtained during the search phase.
-Input:
-- Targets file, containing all genomic targets for the guides set (Phase [3.3.1](#Search_mm) / [3.3.2](#Search_mm_bul))
-- Bed file containing the annotations
-- Name of output file
-- Samples ID file, containing the list of samples with their associated Population and Superpopulation (Optional)
-
-Output:
-- Set of files, including:
- - Targets file with annotation (identical file as the targets file in input) with an added column containing the annotations).
- - One summary file, counting all the annotations per mismatch number.
-
-Example call:
-- Conda
- ```
- crispritz.py annotate-results emx1.hg19.targets.txt annotations.bed emx1.hg19.annotated
- ```
-- Docker
- ```
- docker run -v ${PWD}:/DATA -w /DATA -i pinellolab/crispritz crispritz.py annotate-results emx1.hg19.targets.txt annotations.bed emx1.hg19.annotated
- ```
-
-Detailed input:
-`emx1.hg19.targets.txt` is the text file containing targets from previous search
-`annotations.bed` is the text file containing the genomic annotations
-`emx1.hg19.annotated` name of the output file
-`--change-ID samples_1000genomeproject.txt` file containing the samples and their associated Population and Superpopulation
-
-
-**3.5 CRISPRitz Generate-Report Tool**
-This tool is created to generate a visual representation of guide behaviour such as on-/off- target
-activity in specific genomic regions, total number of on-/off- targets in reference and
-variant genome and so on.
-Input:
-- A guide present in the analyzed set (Phase [3.3.1](#Search_mm) / [3.3.2](#Search_mm_bul))
-(e.g. `GAGTCCGAGCAGAAGAAGAANNN`)
-- Number of mismatches to analyze (e.g. `-mm 4`)
-- Annotation summary file, containing the counting of all the annotations per mismatch number (Phase [3.4](#Annotation))
-- Extended profile file (Phase [3.3.1](#Search_mm) / [3.3.2](#Search_mm_bul))
-- Tag to activate gecko dataset comparison (e.g. `-gecko`)
-- Annotation reference summary file, containing the counting of all the annotations per mismatch number (See Post-Process Phase for more informations)
-
-Output:
-- Pdf file containing the radar chart and motif logo for a guide, the radar chart shows how
-much the guide is similar, in terms of number of targets found, to all guides in its dataset
-(or the gecko dataset if selected).
-- Barplot with a distribution of on-/off- targets in each annotation and a comparison
-between variant and reference genome, in terms of total targets found.
-
-
-Example call:
-- Conda
- ```
- crispritz.py generate-report GAGTCCGAGCAGAAGAAGAANNN -mm 4 -annotation emx1.hg19.annotated.Annotation.summary.txt -extprofile emx1.hg19.extended_profile.xls -gecko
- ```
-
-- Docker
- ```
- docker run -v ${PWD}:/DATA -w /DATA -i pinellolab/crispritz crispritz.py generate-report GAGTCCGAGCAGAAGAAGAANNN -mm 4 -annotation emx1.hg19.annotated.Annotation.summary.txt -extprofile emx1.hg19.extended_profile.xls -gecko
- ```
-
-Detailed input:
-`GAGTCCGAGCAGAAGAAGAANNN` is a guide present in the result file, the one you want to analyze and print visualization files
-`-mm 4` Mismatches threshold
-`-annotation emx1.hg19.annotated.Annotation.summary.txt` is the file containing the counting of all the annotations per mismatch number
-`-extprofile emx1.hg19.extended_profile.xls` is the xls file containing information detailed information about guides, used to construct the motif logo
-`-gecko` tag to activate the gecko dataset comparison, the results of your test guide, will be
-compared with results from a previous computed analysis on gecko library.
-
-
-**3.6 CRISPRitz Process-Data Tool**
-This tools processes and compares the results obtained in the reference and enriched genomes, providing a more complete overview of the activity of
-the input guides. The first step aggregates similar off-targets based on their position on the chromosome, then a list of samples ID is associated to each off-target haplotype. Finally, summaries about the guide activity at general, population and sample level are produced.
-
-Input:
-- Targets file of the reference genome, containing all genomic targets for the guides set (Phase [3.3.1](#Search_mm) / [3.3.2](#Search_mm_bul))
-- Targets file of the enriched genome, containing all genomic targets for the guides set (Phase [3.3.1](#Search_mm) / [3.3.2](#Search_mm_bul))
-- Text file containing the PAM sequence (including a number of Ns equal to the guide
-length) and a space separated number indicating the length of the PAM sequence (e.g.
-Cas9 PAM is NNNNNNNNNNNNNNNNNNNNNGG 3). The sequence is composed by
-20 Ns and NGG, followed by 3, representing the length of the PAM sequence.
-- Text file containing one or more guides (including a number of Ns equal to the length of
-the PAM sequence) (e.g. TCACCCAGGCTGGAATACAGNNN, the last 3 Ns represents
-the space occupied by the PAM in the real sequence)
-- Bed file containing the annotations
-- Name of output file
-- Directory containing `.json` files (Dictionaries), used for sample ID association. The directory is generated using the `--sample-create` option, that takes in input the directory containing the VCF files.
-- Directory containing a genome in fasta format, need to be separated into single
-chromosome files.
-- Samples ID file, containing the list of samples with their associated Population and Superpopulation (Optional)
-
-Output:
-- Targets file, containing all post-processed genomic targets for each guide in the guides set (`targets.GUIDE.txt` files)
-- Summary at Superpopulation, Population and Sample level counting all the annotations per mismatch number (`sample_annotation` files)
-- Summary at guide level counting all the annotations per mismatch number (`sumref.Annotation.summary.txt` for reference genome, `Annotation.summary.txt` for variant genome)
-- Count of targets, for each mismatch + bulge value, divided by Population (`PopulationDistribution.txt`), and represented by a series of barplots (`populations_distribution_GUIDE.png` files)
-- Summaries of the guide activity based on Genomic Position (`summary_by_position.txt`), Samples (`summary_by_samples.txt`), Genome (`summary_by_guide`) ,and Guide (`general_target_count.txt`)
-
-
-Example Call:
-- Conda
- ```
- crispritz.py process-data -reftarget emx1.hg19.ref.targets.txt -vartarget emx1.hg19.var.targets.txt pam/pamNGG.txt guides/EMX1.txt annotations.bed emx1.hg19.final -sample dictionaryDirectory/ -refgenome hg19_ref/
- ```
-- Docker
- ```
- docker run -v ${PWD}:/DATA -w /DATA -i pinellolab/crispritz crispritz.py process-data -reftarget emx1.hg19.ref.targets.txt -vartarget emx1.hg19.var.targets.txt pam/pamNGG.txt guides/EMX1.txt annotations.bed emx1.hg19.final -sample dictionaryDirectory/ -refgenome hg19_ref/
- ```
-
-Detailed Input:
-
-`emx1.hg19.ref.targets.txt` is the text file containing targets from previous search on the reference genome
-`emx1.hg19.var.targets.txt` is the text file containing targets from previous search on the enriched genome
-`pam/pamNGG.txt` is a text file containing the PAM sequence.
-`guides/EMX1.txt` is a text file containing the EXM1 guide
-`annotations.bed` is the text file containing the genomic annotations
-`emx1.hg19.final` name of the output file
-`dictionaryDirectory/` directory containing the `.json` files used for sample ID association. If no directory is available, change the option `-sample dictionaryDirectory/` to `--sample-create ` in order to create the `.json` files. In this case, the `vcfFileDirectory` is the directory containing the vcf files used for genome enrichment (Phase [3.1](#Add-Variant)).
-`hg19_ref/` is the directory containing the fasta files
-
-
-**Output examples:**
-- Targets file, containing all genomic targets for the guides set
-![example_targets](https://user-images.githubusercontent.com/32717860/53101471-19d81180-352a-11e9-9ee3-69de580c5e3f.PNG)
-- Profile file, containing a matrix-like representation of guides behaviour (bp/mm, total on-/off- target, targets per mismatch threshold)
-![profile](https://user-images.githubusercontent.com/40895152/63215013-866e4a80-c120-11e9-8855-c63c2a6e991a.png)
-- Extended profile file, containing the motif matrix for each guide and each mismatch
-threshold, useful to create visual analysis of the guides behaviour
-![ex_profile](https://user-images.githubusercontent.com/40895152/63215020-a30a8280-c120-11e9-98ed-10a6ab47bc54.png)
-
-
-
-
-
-- Pdf file containing the radar chart and motif logo for a guide, the radar chart shows how
-much the guide is similar, in terms of number of targets found, to all guides in its dataset
-(or the gecko dataset if selected).
-![fig_medium_guide-1](https://user-images.githubusercontent.com/32717860/53101072-5b1bf180-3529-11e9-8c9a-cb5895f2c6c0.png)
-
-- Barplot with a distribution of on-/off- targets in each annotation and a comparison
-between variant and reference genome, in terms of total targets found.
-
diff --git a/auto_search_corrected.zip b/auto_search_corrected.zip
deleted file mode 100644
index eac6b6a..0000000
Binary files a/auto_search_corrected.zip and /dev/null differ
diff --git a/crisprme.py b/crisprme.py
deleted file mode 100644
index 518e26c..0000000
--- a/crisprme.py
+++ /dev/null
@@ -1,821 +0,0 @@
-#!/usr/bin/env python
-
-import sys
-import os
-
-script_path = os.path.dirname(os.path.abspath(__file__))
-
-# path where this file is located
-# origin_path = os.path.dirname(os.path.realpath(__file__))
-# conda path
-conda_path = "opt/crisprme/auto_search_corrected/"
-# path corrected to use with conda
-corrected_origin_path = script_path[:-3]+conda_path
-
-script_path = corrected_origin_path
-
-input_args = sys.argv
-
-
-def post_analysis_only():
- variant = True
-
- if "--help" in input_args:
- print("This is the post-analysis process that goes from targets generated by the search operation and generates the final results.")
- print("These are the flags that must be used in order to run this function:")
- print("\t--targetdir, used to specify the directory containing the results of the search step (also post-analysis results will be stored in this folder)")
- print("\t--genome, used to specify the reference genome folder")
- print("\t--vcf, used to specify the VCF folder [OPTIONAL!]")
- print("\t--guide, used to specify the file that contains guides used for the search")
- print("\t--pam, used to specify the file that contains the pam")
- print("\t--annotation, used to specify the file that contains some annotations of the reference genome")
- print("\t--samplesID, used to specify the file that contains the information about samples present in VCF files [OPTIONAL!]")
- print("\t--bMax, used to specify the number of bulges for the indexing of the genome(s)")
- print("\t--mm, used to specify the number of mismatches permitted in the search phase")
- print("\t--bDNA, used to specify the number of DNA bulges permitted in the search phase [OPTIONAL!]")
- print("\t--bRNA, used to specify the number of RNA bulges permitted in the search phase [OPTIONAL!]")
- print("\t--merge, used to specify the threshold used to merge close targets [OPTIONAL!]")
- print("\t--thread, used to set the number of thread used in the process (default is ALL available minus 2)")
- # print("\t--output, used to specify the output folder for the results")
- exit(0)
-
- if "--targetdir" not in input_args:
- print("--targetdir must be contained in the input")
- exit(1)
- else:
- try:
- targetdir = os.path.abspath(
- input_args[input_args.index("--targetdir")+1])
- except IndexError:
- print("Please input some parameter for flag --targetdir")
- exit(1)
- if not os.path.isdir(targetdir):
- print("The folder specified for --targetdir does not exist")
- exit(1)
-
- if "--genome" not in input_args:
- print("--genome must be contained in the input")
- exit(1)
- else:
- try:
- genomedir = os.path.abspath(
- input_args[input_args.index("--genome")+1])
- except IndexError:
- print("Please input some parameter for flag --genome")
- exit(1)
- if not os.path.isdir(genomedir):
- print("The folder specified for --genome does not exist")
- exit(1)
-
- if "--vcf" not in input_args:
- variant = False
- else:
- try:
- vcfdir = os.path.abspath(input_args[input_args.index("--vcf")+1])
- except IndexError:
- print("Please input some parameter for flag --vcf")
- exit(1)
- if not os.path.isdir(vcfdir):
- print("The folder specified for --vcf does not exist")
- exit(1)
-
- if "--guide" not in input_args:
- print("--guide must be contained in the input")
- exit(1)
- else:
- try:
- guidefile = os.path.abspath(
- input_args[input_args.index("--guide")+1])
- except IndexError:
- print("Please input some parameter for flag --guide")
- exit(1)
- if not os.path.isfile(guidefile):
- print("The folder specified for --guide does not exist")
- exit(1)
-
- if "--pam" not in input_args:
- print("--pam must be contained in the input")
- exit(1)
- else:
- try:
- pamfile = os.path.abspath(input_args[input_args.index("--pam")+1])
- except IndexError:
- print("Please input some parameter for flag --pam")
- exit(1)
- if not os.path.isfile(pamfile):
- print("The folder specified for --pam does not exist")
- exit(1)
-
- if "--annotation" not in input_args:
- print("--annotation must be contained in the input")
- exit(1)
- else:
- try:
- annotationfile = os.path.abspath(
- input_args[input_args.index("--annotation")+1])
- except IndexError:
- print("Please input some parameter for flag --annotation")
- exit(1)
- if not os.path.isfile(annotationfile):
- print("The folder specified for --annotation does not exist")
- exit(1)
-
- if variant and "--samplesID" not in input_args:
- print("--samplesID must be contained in the input")
- exit(1)
- elif not variant and "--samplesID" in input_args:
- print("--samplesID was in the input but no VCF directory was specified")
- exit(1)
- elif "--samplesID" in input_args:
- try:
- samplefile = os.path.abspath(
- input_args[input_args.index("--samplesID")+1])
- except IndexError:
- print("Please input some parameter for flag --samplesID")
- exit(1)
- if not os.path.isfile(samplefile):
- print("The folder specified for --samplesID does not exist")
- exit(1)
-
- if "--bMax" not in input_args:
- print("--bMax must be contained in the input")
- exit(1)
- else:
- try:
- bMax = input_args[input_args.index("--bMax")+1]
- except IndexError:
- print("Please input some parameter for flag --bMax")
- exit(1)
- try:
- bMax = int(bMax)
- except:
- print("Please input a number for flag bMax")
- exit(1)
- if bMax < 0 or bMax > 2:
- print("The range for bMax is from 0 to 2")
- exit(1)
-
- if "--thread" not in input_args:
- # print("--thread must be contained in the input")
- # exit(1)
- thread = len(os.sched_getaffinity(0))-2
- else:
- try:
- thread = input_args[input_args.index("--thread")+1]
- except IndexError:
- print("Please input some parameter for flag --thread")
- exit(1)
- try:
- thread = int(thread)
- except:
- print("Please input a number for flag bMax")
- exit(1)
- if thread <= 0 or thread > len(os.sched_getaffinity(0))-2:
- print("thread is set to default (ALL available minus 2)")
- thread = len(os.sched_getaffinity(0))-2
- # exit(1)
-
- if "--mm" not in input_args:
- print("--mm must be contained in the input")
- exit(1)
- else:
- try:
- mm = input_args[input_args.index("--mm")+1]
- except IndexError:
- print("Please input some parameter for flag --mm")
- exit(1)
- try:
- mm = int(mm)
- except:
- print("Please input a number for flag mm")
- exit(1)
-
- if "--bDNA" not in input_args:
- #print("--bDNA must be contained in the input")
- #exit(1)
- bDNA = 0
- else:
- try:
- bDNA = input_args[input_args.index("--bDNA")+1]
- except IndexError:
- print("Please input some parameter for flag --bDNA")
- exit(1)
- try:
- bDNA = int(bDNA)
- except:
- print("Please input a number for flag bDNA")
- exit(1)
- if bDNA > bMax:
- print("The number of bDNA must be equal or less than bMax")
- exit(1)
- elif bDNA < 0 or bDNA > 2:
- print("The range for bDNA is from 0 to", bMax)
- exit(1)
-
- if "--bRNA" not in input_args:
- #print("--bRNA must be contained in the input")
- #exit(1)
- bRNA = 0
- else:
- try:
- bRNA = input_args[input_args.index("--bRNA")+1]
- except IndexError:
- print("Please input some parameter for flag --bRNA")
- exit(1)
- try:
- bRNA = int(bRNA)
- except:
- print("Please input a number for flag bRNA")
- exit(1)
- if bRNA > bMax:
- print("The number of bRNA must be equal or less than bMax")
- exit(1)
- elif bRNA < 0 or bRNA > 2:
- print("The range for bRNA is from 0 to", bMax)
- exit(1)
-
- if "--merge" not in input_args:
- #print("--merge must be contained in the input")
- #exit(1)
- merge_t = 0
- else:
- try:
- merge_t = input_args[input_args.index("--merge")+1]
- except IndexError:
- print("Please input some parameter for flag --merge")
- exit(1)
- try:
- merge_t = int(merge_t)
- except:
- print("Please input a number for flag merge")
- exit(1)
- if merge_t < 0:
- print("Please specify a positive number for --merge")
- exit(1)
-
- os.chdir(script_path)
- if variant:
- os.system("./post_analysis_only.sh "+genomedir+" "+vcfdir+" "+guidefile+" "+pamfile+" "+annotationfile+" "+samplefile+" "+str(bMax) +
- " "+str(mm)+" "+str(bDNA)+" "+str(bRNA)+" "+str(merge_t)+" "+targetdir+" "+script_path+" "+str(thread))
- else:
- os.system("./post_analysis_only.sh "+genomedir+" _ "+guidefile+" "+pamfile+" "+annotationfile+" _ "+str(bMax)+" "+str(mm) +
- " "+str(bDNA)+" "+str(bRNA)+" "+str(merge_t)+" "+targetdir+" "+script_path+" "+str(thread))
-
-
-def search_only():
- variant = True
-
- if "--help" in input_args:
- print("This is the search process that goes from raw input up to the generation of targets.")
- print("These are the flags that must be used in order to run this function:")
- print("\t--genome, used to specify the reference genome folder")
- print("\t--vcf, used to specify the VCF folder [OPTIONAL!]")
- print("\t--guide, used to specify the file that contains guides used for the search")
- print("\t--pam, used to specify the file that contains the pam")
- print("\t--bMax, used to specify the number of bulges for the indexing of the genome(s)")
- print("\t--mm, used to specify the number of mismatches permitted in the search phase")
- print("\t--bDNA, used to specify the number of DNA bulges permitted in the search phase [OPTIONAL!]")
- print("\t--bRNA, used to specify the number of RNA bulges permitted in the search phase [OPTIONAL!]")
- print("\t--output, used to specify the output folder for the results")
- print("\t--thread, used to set the number of thread used in the process (default is ALL available minus 2)")
- exit(0)
-
- if "--genome" not in input_args:
- print("--genome must be contained in the input")
- exit(1)
- else:
- try:
- genomedir = os.path.abspath(
- input_args[input_args.index("--genome")+1])
- except IndexError:
- print("Please input some parameter for flag --genome")
- exit(1)
- if not os.path.isdir(genomedir):
- print("The folder specified for --genome does not exist")
- exit(1)
-
- if "--thread" not in input_args:
- # print("--thread must be contained in the input")
- # exit(1)
- thread = len(os.sched_getaffinity(0))-2
- else:
- try:
- thread = input_args[input_args.index("--thread")+1]
- except IndexError:
- print("Please input some parameter for flag --thread")
- exit(1)
- try:
- thread = int(thread)
- except:
- print("Please input a number for flag bMax")
- exit(1)
- if thread <= 0 or thread > len(os.sched_getaffinity(0))-2:
- print("thread is set to default (ALL available minus 2)")
- thread = len(os.sched_getaffinity(0))-2
- # exit(1)
-
- if "--vcf" not in input_args:
- variant = False
- else:
- try:
- vcfdir = os.path.abspath(input_args[input_args.index("--vcf")+1])
- except IndexError:
- print("Please input some parameter for flag --vcf")
- exit(1)
- if not os.path.isdir(vcfdir):
- print("The folder specified for --vcf does not exist")
- exit(1)
-
- if "--guide" not in input_args:
- print("--guide must be contained in the input")
- exit(1)
- else:
- try:
- guidefile = os.path.abspath(
- input_args[input_args.index("--guide")+1])
- except IndexError:
- print("Please input some parameter for flag --guide")
- exit(1)
- if not os.path.isfile(guidefile):
- print("The folder specified for --guide does not exist")
- exit(1)
-
- if "--pam" not in input_args:
- print("--pam must be contained in the input")
- exit(1)
- else:
- try:
- pamfile = os.path.abspath(input_args[input_args.index("--pam")+1])
- except IndexError:
- print("Please input some parameter for flag --pam")
- exit(1)
- if not os.path.isfile(pamfile):
- print("The folder specified for --pam does not exist")
- exit(1)
-
- if "--bMax" not in input_args:
- print("--bMax must be contained in the input")
- exit(1)
- else:
- try:
- bMax = input_args[input_args.index("--bMax")+1]
- except IndexError:
- print("Please input some parameter for flag --bMax")
- exit(1)
- try:
- bMax = int(bMax)
- except:
- print("Please input a number for flag bMax")
- exit(1)
- if bMax < 0 or bMax > 2:
- print("The range for bMax is from 0 to 2")
- exit(1)
-
- if "--mm" not in input_args:
- print("--mm must be contained in the input")
- exit(1)
- else:
- try:
- mm = input_args[input_args.index("--mm")+1]
- except IndexError:
- print("Please input some parameter for flag --mm")
- exit(1)
- try:
- mm = int(mm)
- except:
- print("Please input a number for flag mm")
- exit(1)
-
- if "--bDNA" not in input_args:
- #print("--bDNA must be contained in the input")
- #exit(1)
- bDNA = 0
- else:
- try:
- bDNA = input_args[input_args.index("--bDNA")+1]
- except IndexError:
- print("Please input some parameter for flag --bDNA")
- exit(1)
- try:
- bDNA = int(bDNA)
- except:
- print("Please input a number for flag bDNA")
- exit(1)
- if bDNA > bMax:
- print("The number of bDNA must be equal or less than bMax")
- exit(1)
- elif bDNA < 0 or bDNA > 2:
- print("The range for bDNA is from 0 to", bMax)
- exit(1)
-
- if "--bRNA" not in input_args:
- #print("--bRNA must be contained in the input")
- #exit(1)
- bRNA = 0
- else:
- try:
- bRNA = input_args[input_args.index("--bRNA")+1]
- except IndexError:
- print("Please input some parameter for flag --bRNA")
- exit(1)
- try:
- bRNA = int(bRNA)
- except:
- print("Please input a number for flag bRNA")
- exit(1)
- if bRNA > bMax:
- print("The number of bRNA must be equal or less than bMax")
- exit(1)
- elif bRNA < 0 or bRNA > 2:
- print("The range for bRNA is from 0 to", bMax)
- exit(1)
-
- if "--output" not in input_args:
- print("--output must be contained in the input")
- exit(1)
- else:
- try:
- outputfolder = os.path.abspath(
- input_args[input_args.index("--output")+1])
- except IndexError:
- print("Please input some parameter for flag --output")
- exit(1)
- if not os.path.isdir(outputfolder):
- print("The folder specified for --output does not exist")
- exit(1)
-
- os.chdir(script_path)
- if variant:
- os.system("./search_only.sh "+genomedir+" "+vcfdir+" "+guidefile+" "+pamfile+" "+str(bMax)+" "+str(mm) +
- " "+str(bDNA)+" "+str(bRNA)+" "+outputfolder+" "+script_path+" "+str(thread))
- else:
- os.system("./search_only.sh "+genomedir+" _ "+guidefile+" "+pamfile+" "+str(bMax)+" "+str(mm)+" " +
- str(bDNA)+" "+str(bRNA)+" "+outputfolder+" "+script_path+" "+str(thread))
-
-
-def complete_search():
- variant = True
- if "--help" in input_args:
- print("This is the automated search process that goes from raw input up to the post-analysis of results.")
- print("These are the flags that must be used in order to run this function:")
- print("\t--genome, used to specify the reference genome folder")
- print("\t--vcf, used to specify the VCF folder [OPTIONAL!]")
- print("\t--guide, used to specify the file that contains guides used for the search")
- print("\t--pam, used to specify the file that contains the pam")
- print("\t--annotation, used to specify the file that contains some annotations of the reference genome")
- print("\t--samplesID, used to specify the file that contains the information about samples present in VCF files [OPTIONAL!]")
- print("\t--bMax, used to specify the number of bulges for the indexing of the genome(s)")
- print("\t--mm, used to specify the number of mismatches permitted in the search phase")
- print("\t--bDNA, used to specify the number of DNA bulges permitted in the search phase [OPTIONAL!]")
- print("\t--bRNA, used to specify the number of RNA bulges permitted in the search phase [OPTIONAL!]")
- print("\t--merge, used to specify the threshold used to merge close targets (based on genetic position), use target with highest CFD as pivot [default 0 (ZERO)]")
- print("\t--output, used to specify the output folder for the results")
- print("\t--thread, used to set the number of thread used in the process (default is ALL available minus 2)")
- exit(0)
-
- if "--genome" not in input_args:
- print("--genome must be contained in the input")
- exit(1)
- else:
- try:
- genomedir = os.path.abspath(
- input_args[input_args.index("--genome")+1])
- except IndexError:
- print("Please input some parameter for flag --genome")
- exit(1)
- if not os.path.isdir(genomedir):
- print("The folder specified for --genome does not exist")
- exit(1)
-
- if "--thread" not in input_args:
- # print("--thread must be contained in the input")
- # exit(1)
- thread = len(os.sched_getaffinity(0))-2
- else:
- try:
- thread = input_args[input_args.index("--thread")+1]
- except IndexError:
- print("Please input some parameter for flag --thread")
- exit(1)
- try:
- thread = int(thread)
- except:
- print("Please input a number for flag bMax")
- exit(1)
- if thread <= 0 or thread > len(os.sched_getaffinity(0))-2:
- print("thread is set to default (ALL available minus 2)")
- thread = len(os.sched_getaffinity(0))-2
- # exit(1)
-
- if "--vcf" not in input_args:
- variant = False
- else:
- try:
- vcfdir = os.path.abspath(input_args[input_args.index("--vcf")+1])
- except IndexError:
- print("Please input some parameter for flag --vcf")
- exit(1)
- if not os.path.isdir(vcfdir):
- print("The folder specified for --vcf does not exist")
- exit(1)
-
- if "--guide" not in input_args:
- print("--guide must be contained in the input")
- exit(1)
- else:
- try:
- guidefile = os.path.abspath(
- input_args[input_args.index("--guide")+1])
- except IndexError:
- print("Please input some parameter for flag --guide")
- exit(1)
- if not os.path.isfile(guidefile):
- print("The folder specified for --guide does not exist")
- exit(1)
-
- if "--pam" not in input_args:
- print("--pam must be contained in the input")
- exit(1)
- else:
- try:
- pamfile = os.path.abspath(input_args[input_args.index("--pam")+1])
- except IndexError:
- print("Please input some parameter for flag --pam")
- exit(1)
- if not os.path.isfile(pamfile):
- print("The folder specified for --pam does not exist")
- exit(1)
-
- if "--annotation" not in input_args:
- print("--annotation must be contained in the input")
- exit(1)
- else:
- try:
- annotationfile = os.path.abspath(
- input_args[input_args.index("--annotation")+1])
- except IndexError:
- print("Please input some parameter for flag --annotation")
- exit(1)
- if not os.path.isfile(annotationfile):
- print("The folder specified for --annotation does not exist")
- exit(1)
-
- if variant and "--samplesID" not in input_args:
- print("--samplesID must be contained in the input")
- exit(1)
- elif not variant and "--samplesID" in input_args:
- print("--samplesID was in the input but no VCF directory was specified")
- exit(1)
- elif "--samplesID" in input_args:
- try:
- samplefile = os.path.abspath(
- input_args[input_args.index("--samplesID")+1])
- except IndexError:
- print("Please input some parameter for flag --samplesID")
- exit(1)
- if not os.path.isfile(samplefile):
- print("The folder specified for --samplesID does not exist")
- exit(1)
-
- if "--bMax" not in input_args:
- print("--bMax must be contained in the input")
- exit(1)
- else:
- try:
- bMax = input_args[input_args.index("--bMax")+1]
- except IndexError:
- print("Please input some parameter for flag --bMax")
- exit(1)
- try:
- bMax = int(bMax)
- except:
- print("Please input a number for flag bMax")
- exit(1)
- if bMax < 0 or bMax > 2:
- print("The range for bMax is from 0 to 2")
- exit(1)
-
- if "--mm" not in input_args:
- print("--mm must be contained in the input")
- exit(1)
- else:
- try:
- mm = input_args[input_args.index("--mm")+1]
- except IndexError:
- print("Please input some parameter for flag --mm")
- exit(1)
- try:
- mm = int(mm)
- except:
- print("Please input a number for flag mm")
- exit(1)
-
- if "--bDNA" not in input_args:
- #print("--bDNA must be contained in the input")
- #exit(1)
- bDNA = 0
- else:
- try:
- bDNA = input_args[input_args.index("--bDNA")+1]
- except IndexError:
- print("Please input some parameter for flag --bDNA")
- exit(1)
- try:
- bDNA = int(bDNA)
- except:
- print("Please input a number for flag bDNA")
- exit(1)
- if bDNA > bMax:
- print("The number of bDNA must be equal or less than bMax")
- exit(1)
- elif bDNA < 0 or bDNA > 2:
- print("The range for bDNA is from 0 to", bMax)
- exit(1)
-
- if "--bRNA" not in input_args:
- #print("--bRNA must be contained in the input")
- #exit(1)
- bRNA = 0
- else:
- try:
- bRNA = input_args[input_args.index("--bRNA")+1]
- except IndexError:
- print("Please input some parameter for flag --bRNA")
- exit(1)
- try:
- bRNA = int(bRNA)
- except:
- print("Please input a number for flag bRNA")
- exit(1)
- if bRNA > bMax:
- print("The number of bRNA must be equal or less than bMax")
- exit(1)
- elif bRNA < 0 or bRNA > 2:
- print("The range for bRNA is from 0 to", bMax)
- exit(1)
-
- if "--merge" not in input_args:
- # print("--merge must be contained in the input")
- # exit(1)
- merge_t = 0
- else:
- try:
- merge_t = input_args[input_args.index("--merge")+1]
- except IndexError:
- print("Please input some parameter for flag --merge")
- exit(1)
- try:
- merge_t = int(merge_t)
- except:
- print("Please input a number for flag merge")
- exit(1)
- if merge_t < 0:
- print("Please specify a positive number for --merge")
- exit(1)
-
- if "--output" not in input_args:
- print("--output must be contained in the input")
- exit(1)
- else:
- try:
- outputfolder = os.path.abspath(
- input_args[input_args.index("--output")+1])
- except IndexError:
- print("Please input some parameter for flag --output")
- exit(1)
- if not os.path.isdir(outputfolder):
- print("The folder specified for --output does not exist")
- exit(1)
-
- os.chdir(script_path)
- if variant:
- os.system("./automated_search_good_parallel_v2.sh "+genomedir+" "+vcfdir+" "+guidefile+" "+pamfile+" "+annotationfile+" "+samplefile+" " +
- str(bMax)+" "+str(mm)+" "+str(bDNA)+" "+str(bRNA)+" "+str(merge_t)+" "+outputfolder+" "+script_path+" "+str(thread))
- else:
- os.system("./automated_search_good_parallel_v2.sh "+genomedir+" _ "+guidefile+" "+pamfile+" "+annotationfile+" _ "+str(bMax)+" " +
- str(mm)+" "+str(bDNA)+" "+str(bRNA)+" "+str(merge_t)+" "+outputfolder+" "+script_path+" "+str(thread))
-
-
-def target_integration():
- if "--help" in input_args:
- print("This is the automated integration process that process the final result file to generate a usable target panel.")
- print("These are the flags that must be used in order to run this function:")
- print("\t--targets, used to specify the final result file to use in the panel creation process")
- print("\t--genome_version, used to specify the genome version used in the search phase (e.g. hg38)")
- print(
- "\t--guide, used to specify the file that contains guides used for the search")
- print("\t--gencode, used to specify the file that contains gencode annotation to find nearest gene to any target")
- print("\t--empirical_data, used to specify the file that contains gencode annotation to find nearest gene to any target")
- print("\t--output, used to specify the output folder for the results")
- exit(0)
-
- if "--targets" not in input_args:
- print("--targets must be contained in the input")
- exit(1)
- else:
- try:
- target_file = os.path.abspath(
- input_args[input_args.index("--targets")+1])
- except IndexError:
- print("Please input some parameter for flag --targets")
- exit(1)
- if not os.path.isfile(target_file):
- print("The file specified for --target_file does not exist")
- exit(1)
-
- if "--genome_version" not in input_args:
- print("--genome_version must be contained in the input")
- exit(1)
- else:
- try:
- genome_version = input_args[input_args.index(
- "--genome_version")+1]
- except IndexError:
- print("Please input some parameter for flag --genome")
- exit(1)
-
- if "--guide" not in input_args:
- print("--guide must be contained in the input")
- exit(1)
- else:
- try:
- guidefile = os.path.abspath(
- input_args[input_args.index("--guide")+1])
- except IndexError:
- print("Please input some parameter for flag --guide")
- exit(1)
- if not os.path.isfile(guidefile):
- print("The file specified for --guide does not exist")
- exit(1)
-
- if "--empirical_data" not in input_args:
- print("--empirical_data must be contained in the input")
- exit(1)
- else:
- try:
- empiricalfile = os.path.abspath(
- input_args[input_args.index("--empirical_data")+1])
- except IndexError:
- print("Please input some parameter for flag --empirical_data")
- exit(1)
- if not os.path.isfile(empiricalfile):
- print("The file specified for --empirical_data does not exist")
- exit(1)
-
- if "--gencode" not in input_args:
- print("--gencode must be contained in the input")
- exit(1)
- else:
- try:
- gencode_file = os.path.abspath(
- input_args[input_args.index("--gencode")+1])
- except IndexError:
- print("Please input some parameter for flag --gencode")
- exit(1)
- if not os.path.isfile(gencode_file):
- print("The file specified for --gencode does not exist")
- exit(1)
-
- if "--output" not in input_args:
- print("--output must be contained in the input")
- exit(1)
- else:
- try:
- outputfolder = os.path.abspath(
- input_args[input_args.index("--output")+1])
- except IndexError:
- print("Please input some parameter for flag --output")
- exit(1)
- if not os.path.isdir(outputfolder):
- print("The folder specified for --output does not exist")
- exit(1)
-
- os.chdir(script_path)
- os.system("./post_process.sh "+target_file+" "+gencode_file +
- " "+empiricalfile+" "+guidefile+" "+str(genome_version)+" "+outputfolder+" "+script_path)
-
-# HELP FUNCTION
-
-
-def callHelp():
- print("help:\n",
- "\nALL FASTA FILEs USED BY THE SOFTWARE MUST BE UNZIPPED AND CHROMOSOME SEPARATED, ALL VCFs USED BY THE SOFTWARE MUST BE ZIPPED AND CHROMOSOME SEPARATED",
- "\ncrisprime complete-search FUNCTION SEARCHING THE WHOLE GENOME (REFERENCE AND VARIANT IF REQUESTED) AND PERFORM CFD ANALYSIS AND TARGET SELECTION",
- "\ncrisprime search-only FUNCTION SEARCHING THE WHOLE GENOME (REFERENCE AND VARIANT IF REQUESTED) PRODUCING RESULTS FOR POST-ANALYSIS",
- "\ncrisprime post-analysis-only FUNCTION THAT PERFORMS CFD ANALYSIS AND TARGET SELECTION STARTING FROM SEARCH RESULTS",
- "\ncrisprime targets-integration FUNCTION THAT INTEGRATES IN-SILICO TARGETS WITH EMPIRICAL DATA GENERATING A USABLE PANEL",
- "\n\nADD help TO ANY FUNCTION TO VISUALIZE A BRIEF HELP PAGE (example: crisprime complete-search --help)\n")
-
-
-if len(sys.argv) < 2:
- callHelp()
-elif sys.argv[1] == 'complete-search':
- complete_search()
-elif sys.argv[1] == 'search-only':
- search_only()
-elif sys.argv[1] == 'post-analysis-only':
- post_analysis_only()
-elif sys.argv[1] == 'targets-integration':
- target_integration()
-else:
- print("ERROR! \"" + sys.argv[1] + "\" is not an allowed!")