BrentLab/callvariants is a bioinformatics pipeline for general variant calling. It runs Freebayes, TIDDIT and CNVpytor for SNP/INDEL and structural variant calling.
This workflow has been developed with the following specific functionality in mind:
- Checking the genotype of KN99alpha samples
- This is performed by providing additional sequences to be appended to the genome prior to alignment in a per-smaple basis
- Processing c. neoformans samples for bulk segregant analysis
- The Freebayes step can optionally be used to jointly call variants on groups which are identified in the input samplesheet
But there is no reason why it is limited to these applications.
The pipeline, overall, runs the following processes:
- Prepare the Genome
- Concatenate additional sequences provided in the input samplesheet, if there are any
- Create indicies
- samtools faidx
- bwamem2 index
- bwa index -- this is for TIDDIT
- Create sequence maps
- build and create intervals. Both of these are from sarek
- GATK CreateSequenceDictionary
- Read QC
- Align reads
- bwamem2
- picard MarkDuplicates
- picard AddOrReplaceReadGroup
- samtools index, sort, stats, flatstats, idxstats
- Call Variants
- Collect and present QC
If you are new to Nextflow and nf-core, please refer to
this page on how to set-up
Nextflow. Make sure to
test your setup
with -profile test
before running the workflow on actual data.
If you are running this test on WUSTL HTCF or RIS, use one of the built-in
profiles, either htcf
or ris
. If you are running
the test on a different host, then you may consider including one of the
dependency manager profiles, eg singularity
or docker
.
A test run for ris
, for example, would look like this:
nextflow run BrentLab/callvariants -r main -profile ris,test
you will need to submit this appropriately, but no other input is necessary
to run the tests -- all input is taken care of by the test
profile
For detailed instructions on running your own data, please see the usage documentation
For a description of the output directory, please see the output documentation
A pernicious error is the character that symbolizes a "new line" in a file. We never see these characters, but they of course exist -- how else would the computer know where the new line is?
Mac, Windows and Linux operating systems use different carriage return characters, unfortunately. If you're using a cluster to process your data, you need to make sure that the files are linux compliant. All of the files you download from NCBI or fungiDB, for instance, will be, as will your fastq files from the sequencer centers. But, if you create an additional fasta file, you need to make sure that this hasn't been adulterated by Mac or Windows. I would expect that snapgene would by default output a linux compliant fasta file. But if you were to open the file in something like word, it would probably convert the characters.
One tool you can use to ensure that your input files are linux compliant is
dos2unix. Using dos2unix
looks like this:
dos2unix </path/to/file.<ext>>
and the file will be changed in-place.
BrentLab/callvariants was originally written by Chase Mateusiak. It is based on the BSA processing steps of Daniel Agustinhno.
If you would like to contribute to this pipeline, please see the contributing guidelines.
If you use BrentLab/callvariants for your analysis, please cite it using the following doi: 10.5281/zenodo.XXXXXX
An extensive list of references for the tools used by the pipeline can be found in the CITATIONS.md
file.
This pipeline uses code and infrastructure developed and maintained by the nf-core community, reused here under the MIT license.
The nf-core framework for community-curated bioinformatics pipelines.
Philip Ewels, Alexander Peltzer, Sven Fillinger, Harshil Patel, Johannes Alneberg, Andreas Wilm, Maxime Ulysse Garcia, Paolo Di Tommaso & Sven Nahnsen.
Nat Biotechnol. 2020 Feb 13. doi: 10.1038/s41587-020-0439-x.