Module Order of DeDup Qualimap DamageProfiler #227

ktmeaton · 2019-06-27T15:34:49Z

Describe the Question
Hi Eager Team, this isn't a bug so much as it is a question. I'm curious about why the qualimap and damageprofiler modules run on the output of samtools_filter rather than dedup? Perhaps it's a matter of personal preference, but I'd rather my coverage/depth estimates and allele frequency/substitution rates be calculated independent of PCR duplicates. For depth, it's to avoid overinflation of confidence and for damage calculation, it's to avoid duplicate molecules which are not independent allele observations. However, I'm not very familiar with either of these particular tools so any clarity you can provide is greatly appreciated!

Test Data
An ancient mitochondrial genome enrichment (low abundance, high duplication). By default, the pipeline output reports I have a ~1800x genome with very messy damage signatures. This sample was chosen because it's an extreme example to highlight the difference.

"Expected" behavior
Again, perhaps inexperience since I'm not familiar with these modules. I'd rather these run on the dedup output which reports that I have a ~18X genome with the expected terminal damage to match my library prep method.

To Reproduce
I attached my pipeline command here:
command.txt
Running revision: ace20a0 [dev]

DamageProfiler Comparison
Default EAGER Pipeline: DamagePlot_Default.pdf
Plotted from DeDup Output: DamagePlot_DeDup.pdf

Qualimap Comparison
Default EAGER Pipeline:

Plotted from DeDup Output:

Additional context
Runlog: nextflow.log

Thank you!

The text was updated successfully, but these errors were encountered:

jfy133 · 2019-06-27T16:12:40Z

That's a very good question as I would also have the same logic. @apeltzer would be best to answer this!

apeltzer · 2019-06-27T18:28:33Z

Actually, that is something that we had differently in EAGERv1 but I agree that it probably makes more sense to compute this AFTER dedup for the reasons mentioned above by @ktmeaton . Wouldn't be a huge change to do so, we would, however, have to have a switch when people turn off dedup that then the non-deduped output will be used for qualimap/damageprofiler nevertheless.

If that is something we anyways tend to agree upon, I guess I could draft a PR for this.

jfy133 · 2019-08-12T18:45:40Z

From the technical side: I'm writing a GATK genotyping section, and I encountered a similar issue that maybe worth considering here.

How would the switch work, without duplicating processes (and thus code)? AFAIK/can tell you can't have optional/conditional input channels. So how would you tell a given process to use non-dedup vs dedup output?

apeltzer · 2019-09-04T09:48:25Z

You can assume that you have either non-dedup or dedupped files that you want to take downstream for analysis, there are some options available using Channel Operators in Nextflow that we can apply/use here I believe. Could even be a simple mix as turning off dedup will end up having an empty channel for dedup output, whereas the other channel will contain data so mixing these won't harm at all and we can use the "mixed channel" as input for downstream analysis for example. Thats basically no duplication of any code..

jfy133 · 2019-09-04T11:03:55Z

The issue with your second suggestion is if you're mixing you would be running e.g. damage profiler on both pre-dedup and post-dedup. How would you exclude the pre-dedup channel files? Which is my main question basically.

jfy133 · 2019-09-14T19:52:28Z

An attempt to address this is: #236

@ktmeaton if you're feeling adventurous, could you try it out to check it does what you want? run the nextflow command with nextflow run jfy133/eager -r downstream_switch [...]

EDIT: please hold off on this for the moment, Alex pointed out forcing everything through the bam filtering process is not nice computational resource-wise when someone doesn't want to run that step. I'm trying to re-write that now.

ktmeaton · 2019-09-16T16:51:41Z

@jfy133 Thanks for the update! I'm happy to help test out the new implementation, but I will hold off for now as it's rewritten.

jfy133 · 2019-09-21T13:07:07Z

@ktmeaton OK done. I've already done extensive tests my self, but it would be great if someone else independently does this as well in case i've missed a certain use-test case.

My branch is here:

https://github.com/jfy133/eager/tree/skip_logic

jfy133 · 2019-10-02T14:37:05Z

@ktmeaton this has now been integrated into the dev branch with #242

In principle, by default if you turn on Deduplication, those bams will passed into all downstream processes.

I'll close this for now - but feel free to re-open (or comment again) if you have issues.

apeltzer self-assigned this Jun 27, 2019

apeltzer added easier-tasks enhancement New feature or request good first issue Good for newcomers labels Jun 27, 2019

apeltzer added this to the V2.1 "Ulm" milestone Sep 16, 2019

jfy133 mentioned this issue Sep 21, 2019

Skip logic #242

Merged

25 tasks

jfy133 self-assigned this Sep 21, 2019

jfy133 closed this as completed Oct 2, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Module Order of DeDup Qualimap DamageProfiler #227

Module Order of DeDup Qualimap DamageProfiler #227

ktmeaton commented Jun 27, 2019

jfy133 commented Jun 27, 2019

apeltzer commented Jun 27, 2019

jfy133 commented Aug 12, 2019

apeltzer commented Sep 4, 2019

jfy133 commented Sep 4, 2019 •

edited

Loading

jfy133 commented Sep 14, 2019 •

edited

Loading

ktmeaton commented Sep 16, 2019

jfy133 commented Sep 21, 2019

jfy133 commented Oct 2, 2019

Module Order of DeDup Qualimap DamageProfiler #227

Module Order of DeDup Qualimap DamageProfiler #227

Comments

ktmeaton commented Jun 27, 2019

jfy133 commented Jun 27, 2019

apeltzer commented Jun 27, 2019

jfy133 commented Aug 12, 2019

apeltzer commented Sep 4, 2019

jfy133 commented Sep 4, 2019 • edited Loading

jfy133 commented Sep 14, 2019 • edited Loading

ktmeaton commented Sep 16, 2019

jfy133 commented Sep 21, 2019

jfy133 commented Oct 2, 2019

jfy133 commented Sep 4, 2019 •

edited

Loading

jfy133 commented Sep 14, 2019 •

edited

Loading