-
Notifications
You must be signed in to change notification settings - Fork 4
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Parallelize BWA across FASTQs #157
Comments
Hopefully something similar can be done for MOSAIK, but making sure separate alignment of FASTQs won't give different results: wanpinglee/MOSAIK#24 |
@smondet I think I'd rather create the workflow nodes for parallelization explicitly than rely on a compiler optimization (unless there's some way to enforce that this optimization has to run). I think that for larger sequencing runs it can't be thought of as optional (especially when we're under a tight time budget) |
@iskandr what I'm doing there is adding the option to fail loudly if the compiler pass does not happen: |
For BWA-MEM the sequence of commands for multiple tumor FASTQs should be:
Followed by merging into a |
Does BWA-MEM try 2-pass alignement (e.g. rescue alignment of unaligned R1 using the aligned R2, or v.v.) in a manner similar to a the gapped-aligners TopHat2/HISAT and STAR? I imagine this would fail if alignment was done independently on R1/R2. Relevant portion of HISAT paper:
Link: http://www.nature.com/nmeth/journal/v12/n4/full/nmeth.3317.html?WT.ec_id=NMETH-201504 |
@JPFinnigan BWA-MEM does take into account mate pairs, so every call to |
I don't think I clearly communicated my question/concern. BWA-MEM run in paired-end uses information about the successful alignment of one read in a mate pair (e.g. R1) to adjust or potentially rescue alignment of the other read (R2) as the case w/ gapped-aligners (STAR/HISAT). I don't think you produce the same alignment running a single instance of Relevant section from BWA-MEM paper (http://arxiv.org/pdf/1303.3997v2.pdf):
|
@JPFinnigan We're not going to run parallel |
Sure. NP. From your original example it seemed as though you were proposing to use `bwa aln`` for paired-reads. |
Sorry, that was confusing! On Thu, Mar 10, 2016 at 2:23 PM, JPFinnigan [email protected]
|
Anyway, this now implemented as an option. For any aligner, we can run it with a single pair of FASTQs or parallelized over pairs of FASTQs fragments. And we can choose just before submitting a given pipeline (it's a |
The alignments produced by BWA are independent across reads.
Each FASTQ can be aligned separately, for example:
(see: https://wikis.utexas.edu/display/CoreNGSTools/Alignment#Alignment-Performingthebwaalignment)
Once we have a collection of aligned
sai
files, each R1/R2sai
pair can be converted to a SAM file:The
sam
files for each R1/R2 pair can be converted tobam
files (withsamtools view
) and then merged into a singlebam
file with e.g.samtools merge tumor.bam tumor*.bam
I'm surprised that there isn't much documentation online about doing this but it does seem that aside from combining mate pairs, nothing done by
bwa
involves interactions between reads.A similar pipeline seems to be implemented in Snakemake: https://github.com/inodb/snakemake-parallel-bwa/blob/master/Snakefile
The text was updated successfully, but these errors were encountered: