Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

remove/exclude reads with densely un-converted non-CpG sites #61

Closed
avilella opened this issue May 31, 2018 · 3 comments
Closed

remove/exclude reads with densely un-converted non-CpG sites #61

avilella opened this issue May 31, 2018 · 3 comments

Comments

@avilella
Copy link

This is a feature request that comes from a wet lab member dealing with data that seems to suffer from a subpopulation of un-converted reads. Digging a bit, it seems like BS-Seeker2 first, and Bismark later, implemented a --filter_non_conversion flag. See:

FelixKrueger/Bismark#76

Given the work done on issue #56 , which seems related to what's needed to implement the feature described here, I wonder if this could be added to methyldackel?

Thanks in advance

@dpryan79
Copy link
Owner

dpryan79 commented May 31, 2018

This wouldn't work in the normal extract, since that essentially operates on a pileup (so looking at the entire length of a read is pretty annoying). I could write a filtering program, but since Felix already wrote one for Bismark I wonder if that'd just duplicate his work needlessly.

BTW, non-CpG methylation does vary in the human genome in accordance with TAD structure and A/B compartments, so I'm hesitant to filter too stringently and end up wiping out something that's actually biologically there (see this article from a collaborator of ours and MethylDackel user as an example).

@SamERoss
Copy link

Hello,

We currently are also suffering from a small percentage of reads being completely unconverted. In regards to the --filter_non_conversion from Bismark, this uses the methylation call strings from a Bismark generated Bam and so doesnt work for non Bismark Bams. So i am just commenting to also express interest in non_converstion filtering for MethylDackel, even if its just something as simple as removing any read with 100% non conversion. I have also commented on another thread on perRead, as if perRead could be given for CHG and CHH metrics, i could use this to filter the reads as well.

Thanks

@bwlang
Copy link
Contributor

bwlang commented Apr 15, 2019

@mattsoup built a simple tool to mark multiple-C reads in a bwameth bam. The tool can set the vendor-fail bit so MethylDackel will skip the marked reads.
It's not battle tested yet.

dpryan79 added a commit that referenced this issue Jul 17, 2021
… don't kill threads if there's a sequence fetch error.
dpryan79 added a commit that referenced this issue Jul 22, 2021
Implement #61, allow filtering by non-CpG conversion efficiency. Also…
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants