-
Notifications
You must be signed in to change notification settings - Fork 82
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Module Order of DeDup Qualimap DamageProfiler #227
Comments
That's a very good question as I would also have the same logic. @apeltzer would be best to answer this! |
Actually, that is something that we had differently in EAGERv1 but I agree that it probably makes more sense to compute this AFTER dedup for the reasons mentioned above by @ktmeaton . Wouldn't be a huge change to do so, we would, however, have to have a switch when people turn off dedup that then the non-deduped output will be used for qualimap/damageprofiler nevertheless. If that is something we anyways tend to agree upon, I guess I could draft a PR for this. |
From the technical side: I'm writing a GATK genotyping section, and I encountered a similar issue that maybe worth considering here. How would the switch work, without duplicating processes (and thus code)? AFAIK/can tell you can't have optional/conditional input channels. So how would you tell a given process to use non-dedup vs dedup output? |
You can assume that you have either non-dedup or dedupped files that you want to take downstream for analysis, there are some options available using Channel Operators in Nextflow that we can apply/use here I believe. Could even be a simple |
The issue with your second suggestion is if you're mixing you would be running e.g. damage profiler on both pre-dedup and post-dedup. How would you exclude the pre-dedup channel files? Which is my main question basically. |
An attempt to address this is: #236 @ktmeaton if you're feeling adventurous, could you try it out to check it does what you want? run the nextflow command with EDIT: please hold off on this for the moment, Alex pointed out forcing everything through the bam filtering process is not nice computational resource-wise when someone doesn't want to run that step. I'm trying to re-write that now. |
@jfy133 Thanks for the update! I'm happy to help test out the new implementation, but I will hold off for now as it's rewritten. |
@ktmeaton OK done. I've already done extensive tests my self, but it would be great if someone else independently does this as well in case i've missed a certain use-test case. My branch is here: |
Describe the Question
Hi Eager Team, this isn't a bug so much as it is a question. I'm curious about why the qualimap and damageprofiler modules run on the output of samtools_filter rather than dedup? Perhaps it's a matter of personal preference, but I'd rather my coverage/depth estimates and allele frequency/substitution rates be calculated independent of PCR duplicates. For depth, it's to avoid overinflation of confidence and for damage calculation, it's to avoid duplicate molecules which are not independent allele observations. However, I'm not very familiar with either of these particular tools so any clarity you can provide is greatly appreciated!
Test Data
An ancient mitochondrial genome enrichment (low abundance, high duplication). By default, the pipeline output reports I have a ~1800x genome with very messy damage signatures. This sample was chosen because it's an extreme example to highlight the difference.
"Expected" behavior
Again, perhaps inexperience since I'm not familiar with these modules. I'd rather these run on the dedup output which reports that I have a ~18X genome with the expected terminal damage to match my library prep method.
To Reproduce
I attached my pipeline command here:
command.txt
Running revision: ace20a0 [dev]
DamageProfiler Comparison
Default EAGER Pipeline: DamagePlot_Default.pdf
Plotted from DeDup Output: DamagePlot_DeDup.pdf
Qualimap Comparison
Default EAGER Pipeline:
Plotted from DeDup Output:
Additional context
Runlog: nextflow.log
Thank you!
The text was updated successfully, but these errors were encountered: