-
Notifications
You must be signed in to change notification settings - Fork 23
Generating error and qscore models
Ryan Wick edited this page Feb 26, 2019
·
3 revisions
Badread comes with two error/qscore models: one that I built with Oxford Nanopore reads (MinION, R9.4 flowcell) and one that I built with PacBio reads (PacBio RS II, CLR). If you'd like to build your own model, keep reading!
Requirements:
- Long reads (at least a Gbp would be good)
- A high-quality reference FASTA (ideally an Illumina-polished assembly of the same genome as the reads came from)
- minimap2 (my favourite long read aligner).
First, you must align your long reads to your reference. Make sure to use minimap2's -c
option so it includes the CIGAR string in the output:
minimap2 -c -x map-ont reference.fasta.gz reads.fastq.gz | gzip > alignments.paf.gz
Now build the models with Badread (this can take a long time, especially for large read sets):
badread error_model --reference reference.fasta.gz --reads reads.fastq.gz --alignment alignments.paf.gz > new_error_model
badread qscore_model --reference reference.fasta.gz --reads reads.fastq.gz --alignment alignments.paf.gz > new_qscore_model
If it's taking too long or running out of RAM, try limiting the number of alignments used with the --max_alignments
option.