Skip to content

Latest commit

 

History

History
103 lines (78 loc) · 8.73 KB

ARGUMENTS.md

File metadata and controls

103 lines (78 loc) · 8.73 KB

< Table Of Contents

Command-line Arguments

General

  • --runDemo Run the DiMSum Demo (default:F)
  • --projectName Project name and directory where results are to be saved (default:'DiMSum_Project')
  • --experimentDesignPath Path to Experimental Design File (required if '--runDemo'=F)
  • --outputPath Path to directory to use for output files (default:'./' i.e. current working directory)
  • --retainIntermediateFiles Should intermediate files be retained? Intermediate files can be many gigabytes, but are required to rerun DiMSum starting at intermediate pipeline stages (default:F)
  • --startStage (Re-)Start DiMSum at a specific pipeline stage (default:0)
  • --stopStage Stop DiMSum at a specific pipeline stage (default:5)
  • --numCores Number of available CPU cores. All pipeline stages make use of parallel computing to decrease runtime if multiple cores are available (default:1)

TRIM Arguments

  • --cutadapt5First Sequence of 5' constant region to be trimmed from first (or only) read (optional). Alternatively, both 5' and 3' optional/required constant region sequences can be specified with this argument e.g. '--cutadapt5First'='ACGT;optional...GGCC;required'.
  • --cutadapt5Second Sequence of 5' constant region to be trimmed from second read in pair (optional). Alternatively, both 5' and 3' optional/required constant region sequences can be specified with this argument '--cutadapt5Second'='ACGT;optional...GGCC;required'.
  • --cutadapt3First Sequence of 3' constant region to be trimmed from first (or only) read (default: reverse complement of '--cutadapt5Second')
  • --cutadapt3Second Sequence of 3' constant region to be trimmed from second read in pair (default: reverse complement of '--cutadapt5First')
  • --cutadaptMinLength Discard reads shorter than LENGTH after trimming (default:50)
  • --cutadaptErrorRate Maximum allowed error rate for trimming constant regions (default:0.2)
  • --cutadaptOverlap Minimum overlap between read and constant region for trimming (default:3)
  • --cutadaptCut5First Remove fixed number of bases from start (5') of first (or only) read before constant region trimming (optional)
  • --cutadaptCut5Second Remove fixed number of bases from start (5') of second read in pair before constant region trimming (optional)
  • --cutadaptCut3First Remove fixed number of bases from end (3') of first (or only) read before constant region trimming (optional)
  • --cutadaptCut3Second Remove fixed number of bases from end (3') of second read in pair before constant region trimming (optional)

ALIGN Arguments

  • --vsearchMinQual Minimum Phred base quality score required to retain read or read pair (default:30)
  • --vsearchMaxQual Maximum Phred base quality score accepted when reading (and used when writing) FASTQ files; cannot be greater than 93 (default:41)
  • --vsearchMaxee Maximum number of expected errors tolerated to retain read or read pair (default:0.5)
  • --vsearchMinovlen Discard read pair if the alignment length is shorter than this (default:10)

PROCESS Arguments

  • --reverseComplement Reverse complement sequences before variant processing? (default:F)
  • --wildtypeSequence Wild-type nucleotide sequence (A/C/G/T). Lower-case bases (a/c/g/t) indicate internal constant regions to be removed (required if '--runDemo'=F)
  • --permittedSequences Nucleotide sequence of IUPAC ambiguity codes (A/C/G/T/R/Y/S/W/K/M/B/D/H/V/N) with length matching the number of mutated positions (i.e upper-case letters) in '--wildtypeSequence' (default:N i.e. any substitution mutation allowed)
  • --sequenceType Coding potential of sequence: either 'noncoding', 'coding' or 'auto'. If the specified wild-type nucleotide sequence ('--wildtypeSequence') has a valid translation without a premature STOP codon, it is assumed to be 'coding' (default:'auto')
  • --mutagenesisType Whether mutagenesis was performed at the nucleotide or codon/amino acid level; either 'random' or 'codon' (default:'random')
  • --indels Indel variants to be retained: either 'all', 'none' or a comma-separated list of sequence lengths (default:'none')
  • --maxSubstitutions Maximum number of nucleotide or amino acid substitutions for coding or non-coding sequences respectively (default:2)
  • --mixedSubstitutions For coding sequences, are nonsynonymous variants with silent/synonymous substitutions in other codons allowed? (default:F)

ANALYSE Arguments

  • --fitnessMinInputCountAll Minimum input read count (in all replicates) to be retained during fitness calculations (default:0). Alternatively, thresholds can be applied to variants with specific numbers of nucleotide substitutions as follows 'edit_distance:threshold' e.g. '--fitnessMinInputCountAll'='1:100,2:10,3:10' (unspecified variants are discarded).
  • --fitnessMinInputCountAny Minimum input read count (in any replicate) to be retained during fitness calculations (default:0). Alternatively, thresholds can be applied to variants with specific numbers of nucleotide substitutions as follows 'edit_distance:threshold' e.g. '--fitnessMinInputCountAny'='1:100,2:10,3:10' (unspecified variants are discarded).
  • --fitnessMinOutputCountAll Minimum output read count (in all replicates) to be retained during fitness calculations (default:0). Alternatively, thresholds can be applied to variants with specific numbers of nucleotide substitutions as follows: 'edit_distance:threshold' e.g. '--fitnessMinOutputCountAll'='1:100,2:10,3:10' (unspecified variants are discarded).
  • --fitnessMinOutputCountAny Minimum output read count (in any replicates) to be retained during fitness calculations (default:0). Alternatively, thresholds can be applied to variants with specific numbers of nucleotide substitutions as follows: 'edit_distance:threshold' e.g. '--fitnessMinOutputCountAny'='1:100,2:10,3:10' (unspecified variants are discarded).
  • --fitnessNormalise Normalise fitness values to minimise inter-replicate differences (default:T)
  • --fitnessErrorModel Fit fitness error model (default:T)
  • --fitnessDropoutPseudocount Pseudocount added to output replicates with dropout i.e. variants present in input but absent from output (default:0)
  • --retainedReplicates Comma-separated list of (integer) experiment replicates to retain or 'all' (default:'all')
  • --fastqFileDir Path to directory containing input FASTQ files (required for WRAP)
  • --fastqFileExtension FASTQ file extension (default:'.fastq')
  • --gzipped Are FASTQ files gzipped? (default:T)
  • --stranded Is the library design stranded? (default:T)
  • --paired Is the library design paired-end? (default:T)
  • --experimentDesignPairDuplicates Are multiple instances of FASTQ files in the Experimental Design File permitted? (default:F)

Multiplexed FASTQ Files

  • --barcodeDesignPath Path to Barcode Design File (tab-separated plain text file with barcode design)
  • --barcodeErrorRate Maximum allowed error rate for barcode to be matched (default:0.25)
  • --countPath Path to Variant Count File for analysis with STEAM only (tab-separated plain text file with sample counts for all variants)

Barcoded Library Design

  • --barcodeIdentityPath Path to Variant Identity File (tab-separated plain text file mapping barcodes to variants)

Alternative Reference Sequences

  • --synonymSequencePath Path to Synonym Sequences File (plain text file with one coding nucleotide sequence per line)

Trans Library Design

  • --transLibrary Paired-end reads correspond to distinct molecules? (default:F)
  • --transLibraryReverseComplement Reverse complement second read in pair (default:F)