Skip to content

Commit

Permalink
#46 #47 update documentation for custom references
Browse files Browse the repository at this point in the history
  • Loading branch information
priesgo committed Jun 15, 2023
1 parent ed34a17 commit 3e26e98
Showing 1 changed file with 21 additions and 8 deletions.
29 changes: 21 additions & 8 deletions docs/source/03_pipeline.md
Original file line number Diff line number Diff line change
Expand Up @@ -167,25 +167,38 @@ No additional parameter needs to be provided to use the default SARS-CoV-2 refer
These references can be customised to use a different SARS-CoV-2 reference or to analyse a different virus.
Two files need to be provided:
- Use a custom reference genome by providing the parameter `--reference your.fasta`.
- Gene annotation file in GFFv3 format `--gff your.gff`. This is only required to run iVar
- Gene annotation file in GFFv3 format `--gff your.gff`.

Additionally, the FASTA needs bwa indexes, .fai index and a .dict index.
Additionally, the FASTA needs bwa-mem2 indexes, .fai index and a .dict index.
These indexes can be generated with the following two commands:
```
bwa index reference.fasta
bwa-mem2 index reference.fasta
samtools faidx reference.fasta
gatk CreateSequenceDictionary --REFERENCE your.fasta
```

**NOTE**: beware that for Nextflow to find these indices the reference needs to be passed as an absolute path.

The SARS-CoV-2 specific annotations will be skipped when using a custom genome.
In order to have SnpEff functional annotations available you will need to prepare the new reference with SnpEff.
- Step 1. Create a file `snpEff.config` or edit an existing one and add the line `your_genome_name.genome : your_genome_name`.
- Step 2. Create the folder `your_genome_name` and copy the FASTA and GFF files there renaming them to `sequences.fa` and `genes.gff`.
- Step 3. Run `snpEff build -gff3 -v your_genome_name` to build the SnpEff index `your_genome_name/snpEffectPredictor.bin`.

In order to have SnpEff functional annotations available you will also need to provide three parameters:
- `--snpeff_organism`: organism to annotate with SnpEff (ie: as registered in SnpEff)
When running CoVigator you will also need to provide three parameters:
- `--snpeff_organism`: organism to annotate with SnpEff (eg: `your_genome_name`)
- `--snpeff_data`: path to the SnpEff data folder
- `--snpeff_config`: path to the SnpEff config file

**NOTE**: beware that for Nextflow to find these indices the reference needs to be passed as an absolute path.

**Limitations**

- The SARS-CoV-2 specific annotations (ie: ConsHMM conservation and SARS-CoV-2 protein domains) will be skipped when
using a custom genome.
- Pangolin lineage will be still available, but it will return no results for no SARS-CoV-2 references, hence it is
advisable to disable it with `--skip_pangolin` unless you are using an alternative SARS-CoV-2 reference.
- Custom references are supported for RNA or DNA viruses, single or double-stranded, but not for segmented viruses.
- Double-stranded viruses with overlapping genes may pose problems for the phasing of the mutations.


### Intrahost mutations

Some mutations may be observed in a subset of the virus sample, this may arise through intrahost virus evolution or
Expand Down

0 comments on commit 3e26e98

Please sign in to comment.