Skip to content

Commit

Permalink
Browse files Browse the repository at this point in the history
  • Loading branch information
andrefaure committed Jan 5, 2021
2 parents d32c2a4 + 358b16e commit 6f2bcc4
Show file tree
Hide file tree
Showing 4 changed files with 15 additions and 15 deletions.
2 changes: 1 addition & 1 deletion docs/ARGUMENTS.md
Original file line number Diff line number Diff line change
Expand Up @@ -45,7 +45,6 @@

* **_--vsearchMinQual_** Minimum Phred base quality score required to retain read or read pair (default:30)
* **_--vsearchMaxee_** Maximum number of expected errors tolerated to retain read or read pair (default:0.5)
* **_--vsearchMinlen_** Discard read (or read pair) if its length is shorter than this (default:64)
* **_--vsearchMinovlen_** Discard read pair if the alignment length is shorter than this (default:10)

## PROCESS Arguments
Expand All @@ -55,6 +54,7 @@
* **_--permittedSequences_** Nucleotide sequence of IUPAC ambiguity codes (A/C/G/T/R/Y/S/W/K/M/B/D/H/V/N) with length matching the number of mutated positions (i.e upper-case letters) in '_--wildtypeSequence_' (default:N i.e. any substitution mutation allowed)
* **_--sequenceType_** Coding potential of sequence: either 'noncoding', 'coding' or 'auto'. If the specified wild-type nucleotide sequence ('_--wildtypeSequence_') has a valid translation without a premature STOP codon, it is assumed to be 'coding' (default:'auto')
* **_--mutagenesisType_** Whether mutagenesis was performed at the nucleotide or codon/amino acid level; either 'random' or 'codon' (default:'random')
* **_--indels_** Indel variants to be retained: either 'all', 'none' or a comma-separated list of sequence lengths (default:'none')
* **_--maxSubstitutions_** Maximum number of nucleotide or amino acid substitutions for coding or non-coding sequences respectively (default:2)
* **_--mixedSubstitutions_** For coding sequences, are nonsynonymous variants with silent/synonymous substitutions in other codons allowed? (default:F)

Expand Down
12 changes: 6 additions & 6 deletions docs/FILEFORMATS.md
Original file line number Diff line number Diff line change
Expand Up @@ -71,11 +71,11 @@ Primary output files:
Additional output files:

* **fitness_wildtype.txt** Wild-type fitness score and associated error.
* **fitness_singles.txt** Single amino acid or nucleotide variant fitness scores and associated errors.
* **fitness_doubles.txt** Double amino acid or nucleotide variant fitness scores and associated errors.
* **fitness_silent.txt** Silent (synonymous) variant fitness scores and associated errors (for coding sequences only).
* **fitness_singles_MaveDB.csv** [MaveDB](https://www.mavedb.org/) compatible .csv file with single amino acid or nucleotide variant fitness scores and associated errors.
* **fitness_singles.txt** Single amino acid or nucleotide substitution variant fitness scores and associated errors.
* **fitness_doubles.txt** Double amino acid or nucleotide substitution variant fitness scores and associated errors.
* **fitness_silent.txt** Silent (synonymous) substitution variant fitness scores and associated errors (for coding sequences only).
* **fitness_singles_MaveDB.csv** [MaveDB](https://www.mavedb.org/) compatible .csv file with single amino acid or nucleotide substitution variant fitness scores and associated errors.
* **DiMSum_Project_variant_data_merge.tsv** Tab-separated plain text file with variant counts and statistics.
* **DiMSum_Project_nobarcode_variant_data_merge.tsv** Tab-separated plain text file with sequenced barcodes that were not found in the variant identity file.
* **DiMSum_Project_indel_variant_data_merge.tsv** Tab-separated plain text file with indel variants.
* **DiMSum_Project_rejected_variant_data_merge.tsv** Tab-separated plain text file with rejected variants (internal constant region mutants, mutations inconsistent with the library design or variants with too many substitutions).
* **DiMSum_Project_indel_variant_data_merge.tsv** Tab-separated plain text file with rejected indel variants.
* **DiMSum_Project_rejected_variant_data_merge.tsv** Tab-separated plain text file with remaining rejected variants (internal constant region mutants, mutations inconsistent with the library design or variants with too many substitutions).
4 changes: 2 additions & 2 deletions docs/INSTALLATION.md
Original file line number Diff line number Diff line change
Expand Up @@ -26,7 +26,7 @@ curl -O https://repo.anaconda.com/miniconda/Miniconda3-latest-Linux-x86_64.sh
sh Miniconda3-latest-Linux-x86_64.sh
```

**IMPORTANT:** If in doubt, respond with "yes" when prompted during installation.
**IMPORTANT:** If in doubt, respond with "yes" to the following question during installation: "Do you wish the installer to initialize Miniconda3 by running conda init?". In this case Conda will modify your shell scripts (*~/.bashrc* or *~/.bash_profile*) to initialize Miniconda3 on startup. Ensure that any future modifications to your *$PATH* variable in your shell scripts occur **before** this code to initialize Miniconda3.

After installing Conda you will need to add the bioconda channel as well as the other channels bioconda depends on. Start a new console session (e.g. by closing the current window and opening a new one) and run the following:
```
Expand All @@ -35,7 +35,7 @@ conda config --add channels bioconda
conda config --add channels conda-forge
```

Next, optionally, create a dedicated environment for DiMSum and it's dependencies. This is recommended if you already have _R_ and/or _Python_ installations that you need to maintain separately.
Next, optionally, create a dedicated environment for DiMSum and it's dependencies. This is recommended if you already have _R_ and/or _Python_ installations that you would like to maintain in a separate environment.
```
conda create --name dimsum
conda activate dimsum
Expand Down
12 changes: 6 additions & 6 deletions docs/PIPELINE.md
Original file line number Diff line number Diff line change
Expand Up @@ -32,15 +32,15 @@ Align overlapping read pairs using *[VSEARCH](INSTALLATION.md)* and filter resul

Combine sample-wise variant counts and statistics to produce a unified results data.table. After aggregating counts across technical replicates, variants are processed and filtered according to user specifications (see [stage-specific arguments](ARGUMENTS.md#process-arguments)):
* **4.1** For [Barcoded library designs](ARGUMENTS.md#barcoded-library-design), read counts are aggregated at the variant level for barcode/variant mappings specified in the [Variant Identity File](FILEFORMATS.md#variant-identity-file). Undefined/misread barcodes are ignored.
* **4.2** Indel variants (defined as those not matching the wild-type nucleotide sequence length) are removed.
* **4.3** If internal constant region(s) are specified, these are excised from all variants if a perfect match is found (see ['_--wildtypeSequence_' argument](ARGUMENTS.md#process-arguments)).
* **4.4** Variants with mutations inconsistent with the library design are removed (see ['_--permittedSequences_' argument](ARGUMENTS.md#process-arguments)).
* **4.5** Variants with more substitutions than desired are also removed (see ['_--maxSubstitutions_' argument](ARGUMENTS.md#process-arguments)).
* **4.6** Finally, nonsynonymous variants with synonymous substitutions in other codons are removed if necessary (see ['_--mixedSubstitutions_' argument](ARGUMENTS.md#process-arguments)).
* **4.2** Indel variants (defined as those not matching the wild-type nucleotide sequence length) are removed if necessary (see ['_--indels_' argument](ARGUMENTS.md#process-arguments)).
* **4.3** If internal constant region(s) are specified, these are excised from all substitution variants if a perfect match is found (see ['_--wildtypeSequence_' argument](ARGUMENTS.md#process-arguments)).
* **4.4** Substitution variants with mutations inconsistent with the library design are removed (see ['_--permittedSequences_' argument](ARGUMENTS.md#process-arguments)).
* **4.5** Substitution variants with more substitutions than desired are also removed (see ['_--maxSubstitutions_' argument](ARGUMENTS.md#process-arguments)).
* **4.6** Finally, nonsynonymous substitution variants with synonymous substitutions in other codons are removed if necessary (see ['_--mixedSubstitutions_' argument](ARGUMENTS.md#process-arguments)).

## Stage 5: **ANALYSE** counts (_STEAM_)

Calculate fitness and error estimates for a user-specified subset of substitution variants (see [stage-specific arguments](ARGUMENTS.md#analyse-arguments)):
Calculate fitness and error estimates for a user-specified subset of variants (see [stage-specific arguments](ARGUMENTS.md#analyse-arguments)):
* **5.1** Optionally remove low count variants according to user-specified soft/hard thresholds to minimise the impact of "fictional" variants from sequencing errors.
* **5.2** Calculate replicate normalisation parameters (scale and shift) to minimise inter-replicate fitness differences.
* **5.3** Fit the error model to a high confidence subset of variants to determine additive and multiplicative error terms.
Expand Down

0 comments on commit 6f2bcc4

Please sign in to comment.