Merge branch 'master' of https://github.com/CRG-CNAG/LehnerLab

lehner-lab · Jan 5, 2021 · 6f2bcc4 · 6f2bcc4
2 parents d32c2a4 + 358b16e
commit 6f2bcc4
Show file tree

Hide file tree

Showing 4 changed files with 15 additions and 15 deletions.
diff --git a/docs/ARGUMENTS.md b/docs/ARGUMENTS.md
@@ -45,7 +45,6 @@
 
 * **_--vsearchMinQual_** Minimum Phred base quality score required to retain read or read pair (default:30)
 * **_--vsearchMaxee_** Maximum number of expected errors tolerated to retain read or read pair (default:0.5)
-* **_--vsearchMinlen_** Discard read (or read pair) if its length is shorter than this (default:64)
 * **_--vsearchMinovlen_** Discard read pair if the alignment length is shorter than this (default:10)
 
 ## PROCESS Arguments
@@ -55,6 +54,7 @@
 * **_--permittedSequences_** Nucleotide sequence of IUPAC ambiguity codes (A/C/G/T/R/Y/S/W/K/M/B/D/H/V/N) with length matching the number of mutated positions (i.e upper-case letters) in '_--wildtypeSequence_' (default:N i.e. any substitution mutation allowed)
 * **_--sequenceType_** Coding potential of sequence: either 'noncoding', 'coding' or 'auto'. If the specified wild-type nucleotide sequence ('_--wildtypeSequence_') has a valid translation without a premature STOP codon, it is assumed to be 'coding' (default:'auto')
 * **_--mutagenesisType_** Whether mutagenesis was performed at the nucleotide or codon/amino acid level; either 'random' or 'codon' (default:'random')
+* **_--indels_** Indel variants to be retained: either 'all', 'none' or a comma-separated list of sequence lengths (default:'none')
 * **_--maxSubstitutions_** Maximum number of nucleotide or amino acid substitutions for coding or non-coding sequences respectively (default:2)
 * **_--mixedSubstitutions_** For coding sequences, are nonsynonymous variants with silent/synonymous substitutions in other codons allowed? (default:F)
 

diff --git a/docs/FILEFORMATS.md b/docs/FILEFORMATS.md
@@ -71,11 +71,11 @@ Primary output files:
 Additional output files:
 
 * **fitness_wildtype.txt** Wild-type fitness score and associated error.
-* **fitness_singles.txt** Single amino acid or nucleotide variant fitness scores and associated errors.
-* **fitness_doubles.txt** Double amino acid or nucleotide variant fitness scores and associated errors.
-* **fitness_silent.txt** Silent (synonymous) variant fitness scores and associated errors (for coding sequences only).
-* **fitness_singles_MaveDB.csv** [MaveDB](https://www.mavedb.org/) compatible .csv file with single amino acid or nucleotide variant fitness scores and associated errors.
+* **fitness_singles.txt** Single amino acid or nucleotide substitution variant fitness scores and associated errors.
+* **fitness_doubles.txt** Double amino acid or nucleotide substitution variant fitness scores and associated errors.
+* **fitness_silent.txt** Silent (synonymous) substitution variant fitness scores and associated errors (for coding sequences only).
+* **fitness_singles_MaveDB.csv** [MaveDB](https://www.mavedb.org/) compatible .csv file with single amino acid or nucleotide substitution variant fitness scores and associated errors.
 * **DiMSum_Project_variant_data_merge.tsv** Tab-separated plain text file with variant counts and statistics.
 * **DiMSum_Project_nobarcode_variant_data_merge.tsv** Tab-separated plain text file with sequenced barcodes that were not found in the variant identity file.
-* **DiMSum_Project_indel_variant_data_merge.tsv** Tab-separated plain text file with indel variants.
-* **DiMSum_Project_rejected_variant_data_merge.tsv** Tab-separated plain text file with rejected variants (internal constant region mutants, mutations inconsistent with the library design or variants with too many substitutions).
+* **DiMSum_Project_indel_variant_data_merge.tsv** Tab-separated plain text file with rejected indel variants.
+* **DiMSum_Project_rejected_variant_data_merge.tsv** Tab-separated plain text file with remaining rejected variants (internal constant region mutants, mutations inconsistent with the library design or variants with too many substitutions).
diff --git a/docs/INSTALLATION.md b/docs/INSTALLATION.md
@@ -26,7 +26,7 @@ curl -O https://repo.anaconda.com/miniconda/Miniconda3-latest-Linux-x86_64.sh
 sh Miniconda3-latest-Linux-x86_64.sh
 ```
 
-**IMPORTANT:** If in doubt, respond with "yes" when prompted during installation.
+**IMPORTANT:** If in doubt, respond with "yes" to the following question during installation: "Do you wish the installer to initialize Miniconda3 by running conda init?". In this case Conda will modify your shell scripts (*~/.bashrc* or *~/.bash_profile*) to initialize Miniconda3 on startup. Ensure that any future modifications to your *$PATH* variable in your shell scripts occur **before** this code to initialize Miniconda3.
 
 After installing Conda you will need to add the bioconda channel as well as the other channels bioconda depends on. Start a new console session (e.g. by closing the current window and opening a new one) and run the following:
 ```
@@ -35,7 +35,7 @@ conda config --add channels bioconda
 conda config --add channels conda-forge
 ```
 
-Next, optionally, create a dedicated environment for DiMSum and it's dependencies. This is recommended if you already have _R_ and/or _Python_ installations that you need to maintain separately.
+Next, optionally, create a dedicated environment for DiMSum and it's dependencies. This is recommended if you already have _R_ and/or _Python_ installations that you would like to maintain in a separate environment.
 ```
 conda create --name dimsum
 conda activate dimsum

diff --git a/docs/PIPELINE.md b/docs/PIPELINE.md
@@ -32,15 +32,15 @@ Align overlapping read pairs using *[VSEARCH](INSTALLATION.md)* and filter resul
 
 Combine sample-wise variant counts and statistics to produce a unified results data.table. After aggregating counts across technical replicates, variants are processed and filtered according to user specifications (see [stage-specific arguments](ARGUMENTS.md#process-arguments)):
 * **4.1** For [Barcoded library designs](ARGUMENTS.md#barcoded-library-design), read counts are aggregated at the variant level for barcode/variant mappings specified in the [Variant Identity File](FILEFORMATS.md#variant-identity-file). Undefined/misread barcodes are ignored.
-* **4.2** Indel variants (defined as those not matching the wild-type nucleotide sequence length) are removed.
-* **4.3** If internal constant region(s) are specified, these are excised from all variants if a perfect match is found (see ['_--wildtypeSequence_' argument](ARGUMENTS.md#process-arguments)).
-* **4.4** Variants with mutations inconsistent with the library design are removed (see ['_--permittedSequences_' argument](ARGUMENTS.md#process-arguments)).
-* **4.5** Variants with more substitutions than desired are also removed (see ['_--maxSubstitutions_' argument](ARGUMENTS.md#process-arguments)).
-* **4.6** Finally, nonsynonymous variants with synonymous substitutions in other codons are removed if necessary (see ['_--mixedSubstitutions_' argument](ARGUMENTS.md#process-arguments)).
+* **4.2** Indel variants (defined as those not matching the wild-type nucleotide sequence length) are removed if necessary (see ['_--indels_' argument](ARGUMENTS.md#process-arguments)).
+* **4.3** If internal constant region(s) are specified, these are excised from all substitution variants if a perfect match is found (see ['_--wildtypeSequence_' argument](ARGUMENTS.md#process-arguments)).
+* **4.4** Substitution variants with mutations inconsistent with the library design are removed (see ['_--permittedSequences_' argument](ARGUMENTS.md#process-arguments)).
+* **4.5** Substitution variants with more substitutions than desired are also removed (see ['_--maxSubstitutions_' argument](ARGUMENTS.md#process-arguments)).
+* **4.6** Finally, nonsynonymous substitution variants with synonymous substitutions in other codons are removed if necessary (see ['_--mixedSubstitutions_' argument](ARGUMENTS.md#process-arguments)).
 
 ## Stage 5: **ANALYSE** counts (_STEAM_)
 
-Calculate fitness and error estimates for a user-specified subset of substitution variants (see [stage-specific arguments](ARGUMENTS.md#analyse-arguments)):
+Calculate fitness and error estimates for a user-specified subset of variants (see [stage-specific arguments](ARGUMENTS.md#analyse-arguments)):
 * **5.1** Optionally remove low count variants according to user-specified soft/hard thresholds to minimise the impact of "fictional" variants from sequencing errors.
 * **5.2** Calculate replicate normalisation parameters (scale and shift) to minimise inter-replicate fitness differences.
 * **5.3** Fit the error model to a high confidence subset of variants to determine additive and multiplicative error terms.